6,331 Matching Annotations
  1. Last 7 days
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The study by Lotonin et al. investigates correlates of protection against African swine fever virus (ASFV) infection. The study is based on a comprehensive work, including the measurement of immune parameters using complementary methodologies. An important aspect of the work is the temporal analysis of the immune events, allowing for the capture of the dynamics of the immune responses induced after infection. Also, the work compares responses induced in farm and SPF pigs, showing the latter an enhanced capacity to induce a protective immunity. Overall, the results obtained are interesting and relevant for the field. The findings described in the study further validate work from previous studies (critical role of virus-specific T cell responses) and provide new evidence on the importance of a balanced innate immune response during the immunization process. This information increases our knowledge on basic ASF immunology, one of the important gaps in ASF research that needs to be addressed for a more rational design of effective vaccines. Further studies will be required to corroborate that the results obtained based on the immunization of pigs by a not completely attenuated virus strain are also valid in other models, such as immunization using live attenuated vaccines.

      While overall the conclusions of the work are well supported by the results, I consider that the following issues should be addressed to improve the interpretation of the results:

      We thank Reviewer #1 for their thoughtful and constructive feedback, which significantly contributed to improving the clarity and quality of our manuscript. Below, we respond to each of the reviewer’s comments and describe the revisions that were incorporated.

      (1) An important issue in the study is the characterization of the infection outcome observed upon Estonia 2014 inoculation. Infected pigs show a long period of viremia, which is not linked to clinical signs. Indeed, animals are recovered by 20 days post-infection (dpi), but virus levels in blood remain high until 141 dpi. This is uncommon for ASF acute infections and rather indicates a potential induction of a chronic infection. Have the authors analysed this possibility deeply? Are there lesions indicative of chronic ASF in infected pigs at 17 dpi (when they have sacrificed some animals) or, more importantly, at later time points? Does the virus persist in some tissues at late time points, once clinical signs are not observed? Has all this been tested in previous studies?

      Tissue samples were tested for viral loads only at 17 dpi during the immunization phase, and long-term persistence of the virus in tissues has not been assessed in our previous studies. At 17 dpi, lesions were most prominently observed in the lymph nodes of both farm and SPF pigs. In a previous study using the Estonia 2014 strain (doi: 10.1371/journal.ppat.1010522), organs were analyzed at 28 dpi, and no pathological signs were detected. This finding calls into question the likelihood of chronic infection being induced by this strain.

      (2) Virus loads post-Estonia infection significantly differ from whole blood and serum (Figure 1C), while they are very similar in the same samples post-challenge. Have the authors validated these results using methods to quantify infectious particles, such as Hemadsorption or Immunoperoxidase assays? This is important, since it would determine the duration of virus replication post-Estonia inoculation, which is a very relevant parameter of the model.

      We did not perform virus titration but instead used qPCR as a sensitive and standardized method to assess viral genome loads. Although qPCR does not distinguish between infectious and non-infectious virus, it provides a reliable proxy for relative viral replication and clearance dynamics in this model. Unfortunately, no sample material remains from this experiment, but we agree that subsequent studies employing infectious virus quantification would be valuable for further refining our understanding of viral persistence and replication following Estonia 2014 infection.

      (3) Related to the previous points, do the authors consider it expected that the induction of immunosuppressive mechanisms during such a prolonged virus persistence, as described in humans and mouse models? Have the authors analysed the presence of immunosuppressive mechanisms during the virus persistence phase (IL10, myeloid-derived suppressor cells)? Have the authors used T cell exhausting markers to immunophenotype ASFV Estonia-induced T cells?

      We agree with the reviewer that the lack of long-term protection can be linked to immunosuppressive mechanisms, as demonstrated for genotype I strains (doi: 10.1128/JVI.00350-20). The proposed markers were not analyzed in this study but represent important targets for future investigation. We addressed this point in the discussion.

      (4) A broader analysis of inflammatory mediators during the persistence phase would also be very informative. Is the presence of high VLs at late time points linked to a systemic inflammatory response? For instance, levels of IFNa are still higher at 11 dpi than at baseline, but they are not analysed at later time points.

      While IFN-α levels remain elevated at 11 dpi, this response is typically transient in ASFV infection and likely not linked to persistent viremia. We agree that analyzing additional inflammatory markers at later time points would be valuable, and future studies should be designed to further understand viral persistence.

      (5) The authors observed a correlation between IL1b in serum before challenge and protection. The authors also nicely discuss the potential role of this cytokine in promoting memory CD4 T cell functionality, as demonstrated in mice previously. However, the cells producing IL1b before ASFV challenge are not identified. Might it be linked to virus persistence in some organs? This important issue should be discussed in the manuscript.

      We agree that identifying the cellular source of IL-1β prior to challenge is important, and this should be addressed in subsequent studies. We included a discussion on the potential link between elevated IL-1β levels and virus persistence in certain organs.

      (6) The lack of non-immunized controls during the challenge makes the interpretation of the results difficult. Has this challenge dose been previously tested in pigs of the age to demonstrate its 100% lethality? Can the low percentage of protected farm pigs be due to a modulation of memory T and B cell development by the persistence of the virus, or might it be related to the duration of the immunity, which in this model is tested at a very late time point? Related to this, how has the challenge day been selected? Have the authors analysed ASFV Estonia-induced immune responses over time to select it?

      In our previous study, intramuscular infection with ~3–6 × 10<sup>2</sup> TCID<sub>50</sub>/mL led to 100% lethality (doi: 10.1371/journal.ppat.1010522), which is notably lower than the dose used in the present study, although the route here was oronasal. The modulation of memory responses could be more thoroughly assessed in future studies using exhaustion markers. The challenge time point was selected based on the clearance of the virus from blood and serum. We agree that the lack of protection in some animals is puzzling and warrants further investigation, particularly to assess the role of immune duration, potential T cell exhaustion caused by viral persistence, or other immunological factors that may influence protection. Based on our experience, vaccine virus persistence alone does not sufficiently explain the lack-of-protection phenomenon. We incorporated these important aspects into the revised discussion.

      (7) Also, non-immunized controls at 0 dpc would help in the interpretation of the results from Figure 2C. Do the authors consider that the pig's age might influence the immune status (cytokine levels) at the time of challenge and thus the infection outcome?

      We support the view that including non-immunized controls at 0 dpc would strengthen the interpretation of cytokine dynamics and will consider this in future experimental designs. Regarding age, while all animals were within a similar age range at the time of challenge, we acknowledge that age-related differences in immune status could influence baseline cytokine levels and infection outcomes, and this is an important factor to consider.

      (8) Besides anti-CD2v antibodies, anti-C-type lectin antibodies can also inhibit hemadsorption (DOI: 10.1099/jgv.0.000024). Please correct the corresponding text in the results and discussion sections related to humoral responses as correlates of protection. Also, a more extended discussion on the controversial role of neutralizing antibodies (which have not been analysed in this study), or other functional mechanisms such as ADCC against ASFV would improve the discussion.

      The relevant text in the Results and Discussion sections was revised accordingly, and the discussion was extended to more thoroughly address the roles of antibodies.

      Reviewer #2 (Public review):

      Summary:

      In the current study, the authors attempt to identify correlates of protection for improved outcomes following re-challenge with ASFV. An advantage is the study design, which compares the responses to a vaccine-like mild challenge and during a virulent challenge months later. It is a fairly thorough description of the immune status of animals in terms of T cell responses, antibody responses, cytokines, and transcriptional responses, and the methods appear largely standard. The comparison between SPF and farm animals is interesting and probably useful for the field in that it suggests that SPF conditions might not fully recapitulate immune protection in the real world. I thought some of the conclusions were over-stated, and there are several locations where the data could be presented more clearly.

      Strengths:

      The study is fairly comprehensive in the depth of immune read-outs interrogated. The potential pathways are systematically explored. Comparison of farm animals and SPF animals gives insights into how baseline immune function can differ based on hygiene, which would also likely inform interpretation of vaccination studies going forward.

      Weaknesses:

      Some of the conclusions are over-interpreted and should be more robustly shown or toned down. There are also some issues with data presentation that need to be resolved and data that aren't provided that should be, like flow cytometry plots.

      We appreciate the feedback from the Reviewer #2 and acknowledge the concerns raised regarding data presentation. In the revised manuscript, we clarified our conclusions where needed and ensured that interpretations were better aligned with the data shown.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the Introduction, more details on the experimental model would be appreciated. A short summary of findings obtained with this model in previous works from the authors would help to better understand the context of the study.

      Basic information on the model was added in the Introduction section of the revised manuscript.

      (2) In Figure 1, the addition of more time points on the x-axes would help the interpretation of the figures.

      We agree and have added extra time points to the x-axes.

      (3) To better understand the results in Figure 2A, a figure showing cytokine levels post-Estonia infection of only challenged pigs would help, indicating protected and non-protected animals as in Figure 2C. This figure would be better linked to the corresponding dot plot (Figure 2B).

      Our statistical analyses in Figure 2A are based on using both challenged and non-challenged pigs to assess differences between SPF and farm pigs. We prefer not to remove the non-challenged pigs in order to avoid losing statistical power. Moreover, even when non-challenged and challenged pigs are displayed in the plots, upregulation of IFN-α and IL-8 can be visualized and remains consistent with the positive and negative correlates of protection shown in Figure 2C.

      (4) Dark red colour associated with SPF non-protected is difficult to differentiate from light red in some figures.

      We thank the reviewer for this remark. To preserve the color scheme across the paper, we changed the circle data points to squares for the non-protected SPF pig in the most crowded figures: Figures 1–3 and Supplementary Figures 2 and 8.

      (5) In Supplementary figures 12-16, grouping of the animal numbers (SPF vs farm) would facilitate the interpretation of the results.

      Information on the animal numbers for each group (SPF vs. farm) has been added to the figure captions.

      (6) Are the results shown in Figure 8 based on absolute scores as mentioned? Results from 0 dpc are not shown. Is that correct?

      That is correct. BTM expression values are absolute and could not be normalized, as RNA was not isolated either immediately before the challenge or on day 0 post-challenge. This information is now clarified in the figure captions.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors use the words "predicted" and "predicts" although they haven't used any methods to show that this is true, such as a multivariate analysis. I don't think correlation coefficients are sufficient to indicate prediction. This needs to be fixed.

      We agree with this and have made changes in the text to avoid this impression.

      (2) "Lower baseline immune activation was linked to increased protective immunity." Presumably, the authors mean prior to challenge, not prior to "vaccination"?

      In this sentence written in the Abstract, we refer to baseline immune activation in the steady state, i.e., prior to any infection, as demonstrated in a previous study by Radulovic et al. (2022). The sentence was adapted accordingly. This concept is further explored in the Discussion section.

      (3) The abstract mentioned the comparison between farm and SPF pigs, but didn't provide any context for those findings. It could be added here.

      In the new version, we have added information on this model in the Introduction section.

      (4) Figure legends need N to be indicated. For example, the viral load figures don't appear to be representative of all 9 or 5 animals. Is there a reason why not all were challenged, and how were those 5 challenged selected?

      Numbers of animals in each group were added to the figure captions. We have also provided details regarding the animals sacrificed at different time points of the experiment in the ‘Animal experiment’ section of the Methods.

      (5) 1A doesn't have a legend to indicate whether dark or light color indicates sampling.

      Fair point. We have added the information to the figure.

      (6) For Figure 3C, it's not clear how the correlation is presented. The legend indicates in writing that the color indicates the outcome it correlates with, but the legend suggests that it is r.

      The method of presenting correlation data is consistent across all figures, including Figure 3C. The color reflects the direction and strength of the correlation, corresponding to the r coefficient obtained from correlating immunological parameters with clinical scores. We have clarified this description in the figure caption to improve readability.

      (7) For some of the correlation data in 2D and 3C, it would be nice to provide the plots in the supplemental. Also, are there enough data points for a robust interpretation of correlation curves?

      We agree that providing the plots will improve clarity and have included them in the supplementary material. While we acknowledge that the number of data points is modest, we believe it is sufficient to support a robust interpretation of the correlation curves. Corresponding p-value cutoffs are noted in the figure captions.

      (8) The figure 2C method of indicating significance is confusing. There must be a clearer way to present this figure.

      Analyzing statistical significance for the dataset shown in Figure 2C is challenging due to the small number of animals. We carefully considered alternative ways of presenting statistical significance, however, given the limited group sizes, we believe that the current approach provides the most transparent and informative representation of the data.

      For clarity, we divided the animals into SPF and farm groups, as well as into protected (4 SPF, 2 farm pigs) and non-protected (1 SPF, 3 farm pigs) categories, and performed both group-based (unpaired t-test) and time-based (mixed-effects analysis) comparisons. All significant differences were added to the plots so that readers could directly visualize the observed trends and compare them with the correlation analysis presented in Figure 2D.

      (9) Please note that "viremia" means the presence of a virus specifically in the blood. Other descriptions of viral load should be used if this was not measured.

      We have clarified this in the text. When referring to organs, we use the term “viral loads.”

      (10) The way of putting a square around boxes that are significant can be misleading when a box is surrounded by other significant comparisons. Like for Figure 6B - probably all of these are really significant, but I can't tell for sure.

      Good point. We changed rectangles to circles for better readability of the figures.

      (11) There is a potential argument that these correlates of protection might only be valid for this specific vaccine. It should be noted that comparisons of multiple vaccines would be needed before assuming the correlates are broadly relevant.

      We agree with this statement and address it in the Discussion section.

      (12) For the circled pathways in Figure 9, it is not clear from the diagram if there is a directionality to the involvement of those pathways. Modulated or induced?

      When discussing pathways identified by transcriptome analysis, we are always referring to their induction, as this is based on the normalized enrichment score (NES). We have now specified this in the figure caption.

      (13) The authors speculate about NK cells, but this is based on transcriptional pathways identified and the literature. Is there any indication from the flow cytometry data whether activated NK cells versus NKT cells are associated with protection? Also, the memory phenotype of those cells?

      Regarding NK cells, the BTM analysis was corroborated by the flow cytometry data shown in Supplementary Figure 8. NK cells were defined as CD3<sup>-</sup>CD8α<sup>+</sup>. Specific markers to distinguish NKT cells or to assess memory phenotypes were not included in our panel.

      (14) In the discussion, "Our study demonstrates that T cell activation represents a robust correlate of protection against ASFV" doesn't indicate whether they mean after vaccination or after challenge. Re-using the same time points throughout the manuscript compounds this confusion.

      In this case, we mean that T cell activation upon immunization/vaccination and challenge correlates with protection. This information has been added to the sentence. Although some time points overlap between the immunization and challenge phases, we consistently use “dpi” and “dpc” to clearly distinguish them.

      (15) Flow cytometry gating strategies should be provided in the supplemental, particularly since this species is less frequently studied using flow cytometry; it would be helpful to understand gating and expression levels of key markers.

      We have provided the gating strategy in Supplementary Figure 7, which is also referenced in the “Flow cytometry and hematology analysis” section of the Methods.

      (16) Some of the discussion is a bit long and repetitive - e.g. the parts on antibodies and the last paragraph with multiple other parts of the discussion and manuscript.

      While we agree that some sections are extensive, we think that this level of detail is necessary to integrate the different datasets and to place our findings in the context of previous literature.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript investigates how herbivorous insects, specifically whiteflies and planthoppers, utilize salivary effectors to overcome plant immunity by targeting the RLP4 receptor.

      Thank you for your comments.

      Strengths:

      The authors present a strong case for the independent evolution of these effectors and provide compelling evidence for their functional roles.

      Thank you for your help in improving our manuscript

      Reviewer #2 (Public review):

      Summary:

      The authors tested an interesting hypothesis that white flies and planthoppers independently evolved salivary proteins to dampen plant immunity by targeting a receptor-like protein. Unlike previously reported receptor-like proteins with large ligand-binding domains, the NtRLP4 here has a malectin LRR domain. Interestingly, it also associates with the adaptor SOBIR1. While the function of this protein remains to be further explored, the authors provide strong evidence showing it's the target of salivary proteins as the insects' survival strategy.

      Thank you for your comments.

      Major points:

      The authors mixed the concepts of LRR-RLPs with malectin LRR-RLPs. These are two different type of receptors. While LRR-RLPs are well studied, little is known about malectin LRR-RLPs. The authors should not simply apply the mode of function of LRR-RLPs to RLP4 which is a malectin LRR-RLP. In addition, LRR-RLPs that function as ligand-binding receptors typically possess >20 LRRs, whereas RLP4 in this work has a rather small ectodomain. It remains unclear whether it will function as a PRR. I can't agree with the author's logic of testing uninfested plants for proving a PRR's function. The function of a pattern recognition receptor depends on perceiving the corresponding ligand. As shown by the data provided, RLP4-OE plants have altered transcriptional profile indicating activated defense, suggesting it's unlikely a PRR. An alternative explanation is needed. More work on BAK1 will also help to clarify the ideas proposed by the authors.

      We sincerely thank the reviewer for the insightful and constructive comments, which have helped us critically re-evaluate our interpretation of RLP4 function. In the revised manuscript, we have addressed this important point by adding a detailed discussion of an alternative explanation for RLP4’s role in plant defense. Specifically, we now explicitly distinguish between classical LRR-RLPs and malectin-domain-containing RLPs, and we acknowledge that RLP4 may not function as a canonical PRR. We also discuss the structural features of RLP4, including its malectin-like domain and relatively small LRR region, and the observation that NtRLP4 overexpression lines exhibit altered transcriptional profiles even in the absence of insect infestation. Based on these lines of evidence, we propose that RLP4 may instead act as a regulatory component within plant immune signaling networks, modulating defense outputs rather than functioning as a direct ligand receptor. The revised discussion now reads as follows: “Together, this study reveals that suppressing PRR-mediated plant immunity may be a conserved strategy employed by herbivorous insects for successful feeding. We demonstrate that whiteflies and planthoppers have independently evolved salivary effectors that facilitate the ubiquitin-dependent degradation of defensive RLP4 in host plants, thereby dampen RLP4-mediated plant immunity (Fig. 6). Nevertheless, the precise mechanism by which RLP4 contributes to plant defense warrants further consideration. While it may function as a canonical PRR that perceives insect-derived molecular patterns, several lines of evidence point to an alternative interpretation. Structurally, RLP4 differs from classical LRR-RLP: it contains a malectin-like domain and a relatively small LRR domain, contrasting with typical LRR-RLPs that often possess large LRRs dedicated to ligand binding. Functionally, NtRLP4 overexpression lines exhibit significantly altered transcriptional profiles and dysregulated SA/JA pathways even in the absence of insect infestation, a phenotype inconsistent with canonical PRRs, which typically remain quiescent until ligand perception. These findings point to an alternative explanation: rather than functioning as a classical PRR that recognizes insect-derived molecules, RLP4 may act as a regulatory component within plant immune signaling networks. Elucidating the precise mechanism of RLP4 in conferring plant defense against herbivorous insects will therefore be an important focus of future research” in Line 392-407.

      Reviewer #3 (Public review):

      Summary:

      In this study, Wang et al., investigate how herbivorous insects overcome plant receptor-mediated immunity by targeting plant receptor-like proteins. The authors identify two independently evolved salivary effectors, BtRDP in whiteflies and NlSP694 in brown planthoppers, that promote the degradation of plant RLP4 through the ubiquitin-dependent proteasome pathway. NtRLP4 from tobacco and OsRLP4 from rice are shown to confer resistance against herbivores by activating defense signaling, while BtRDP and NlSP694 suppress these defenses by destabilizing RLP4 proteins.

      Thank you for your comments.

      Strengths:

      This work highlights a convergent evolutionary strategy in distinct insect lineages and advances our understanding of insect-plant coevolution at the molecular level.

      Two minor comments:

      In line 140, yeast two-hybrid (Y2H) was used to screen for interacting proteins in plants. However, it is generally difficult to identify membrane receptors using Y2H. Please provide more methodological details to justify this approach, or alternatively, include a discussion explaining this.

      Thank you for pointing this out. It is true that Y2H is generally difficult to identify membrane receptors. To address this limitation, we used truncated versions of RLP4s lacking the signal peptide and transmembrane domains in point-to-point Y2H assays. In addition, the interactions between BtRDP and RLP4s were further validated by Co-IP and BiFC experiments. In the revised manuscript, we have clarified this methodological detail as follows: “Given that Y2H is generally difficult to identify membrane receptors, the truncated versions of NtRLP4/SlRLP4/OsRLP4 lacking the signal peptide and transmembrane domains were used” in Linr 636-638.

      In Figure S12C, the interaction between the two proteins appears to be present in the nucleus as well. Please provide a possible explanation for this observation.

      Thank you for pointing this out. During revision, we further examined the subcellular localization of NtRLP4 and found that NtRLP4-GFP could also be detected in the nucleus when expressed alone (Fig. S18), suggesting that NtRLP4 may have additional functions beyond serving as a cell surface pattern recognition receptor. In the revised manuscript, we discussed that NtRLP4 might play other roles in addition to PRRs in the discussion section as follow: “Together, this study reveals that suppressing PRR-mediated plant immunity may be a conserved strategy employed by herbivorous insects for successful feeding. We demonstrate that whiteflies and planthoppers have independently evolved salivary effectors that facilitate the ubiquitin-dependent degradation of defensive RLP4 in host plants, thereby dampen RLP4-mediated plant immunity (Fig. 6). Nevertheless, the precise mechanism by which RLP4 contributes to plant defense warrants further consideration. While it may function as a canonical PRR that perceives insect-derived molecular patterns, several lines of evidence point to an alternative interpretation. Structurally, RLP4 differs from classical LRR-RLP: it contains a malectin-like domain and a relatively small LRR domain, contrasting with typical LRR-RLPs that often possess large LRRs dedicated to ligand binding. Functionally, NtRLP4 overexpression lines exhibit significantly altered transcriptional profiles and dysregulated SA/JA pathways even in the absence of insect infestation, a phenotype inconsistent with canonical PRRs, which typically remain quiescent until ligand perception. These findings point to an alternative explanation: rather than functioning as a classical PRR that recognizes insect-derived molecules, RLP4 may act as a regulatory component within plant immune signaling networks. Elucidating the precise mechanism of RLP4 in conferring plant defense against herbivorous insects will therefore be an important focus of future research” in Line 392-407.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have addressed all my concerns.

      Thank you for your help in improving our manuscript

      Reviewer #2 (Recommendations for the authors):

      This work is quite interesting. It's not necessary to prove RLP4 as a PRR to show the merit of this discovery. The current logic is forced and thus the conclusion not convincing. Finding an alternative explanation will be more helpful.

      Thank you for your valuable suggestions. In the revised version, we discussed the alternative explanation as follow: “Together, this study reveals that suppressing PRR-mediated plant immunity may be a conserved strategy employed by herbivorous insects for successful feeding. We demonstrate that whiteflies and planthoppers have independently evolved salivary effectors that facilitate the ubiquitin-dependent degradation of defensive RLP4 in host plants, thereby dampen RLP4-mediated plant immunity (Fig. 6). Nevertheless, the precise mechanism by which RLP4 contributes to plant defense warrants further consideration. While it may function as a canonical PRR that perceives insect-derived molecular patterns, several lines of evidence point to an alternative interpretation. Structurally, RLP4 differs from classical LRR-RLP: it contains a malectin-like domain and a relatively small LRR domain, contrasting with typical LRR-RLPs that often possess large LRRs dedicated to ligand binding. Functionally, NtRLP4 overexpression lines exhibit significantly altered transcriptional profiles and dysregulated SA/JA pathways even in the absence of insect infestation, a phenotype inconsistent with canonical PRRs, which typically remain quiescent until ligand perception. These findings point to an alternative explanation: rather than functioning as a classical PRR that recognizes insect-derived molecules, RLP4 may act as a regulatory component within plant immune signaling networks. Elucidating the precise mechanism of RLP4 in conferring plant defense against herbivorous insects will therefore be an important focus of future research” in Line 392-407.

      Inappropriate descriptions still exist at multiple places across the manuscript and damages the merit of this work. I highly recommend the authors to consult an expert in plant PRR research for proof reading. The language editing service the authors used only provided limited help in this case. Here are a few examples:

      We sincerely thank the reviewer for the critical and constructive comments. We agree that precise language is essential for conveying scientific findings. In the revised version, we have refined the text with the help of colleagues who have expertise in plant immunity, aiming to ensure the descriptions are as precise and professional as possible.

      Line 16: Using "depend" ignores the fact that many biotic invaders are recognized by NLRs. The authors can simply use the word "use" or "utilize".

      Thank you for your suggestion. We corrected it in the revised version.

      Line 20:"target defensive RLP4, therefor minimizing the plant immunity" is a strange saying. "dampen RLP4-mediated plant immunity"will be better.

      Thank you for your suggestion. We corrected it in the revised version.

      Line 49: as far as I know, only LRR-RLPs use SOBIR1 as adaptor. The authors should introduce this specific point. The mode of action of other type of LRR-RLPs are less clear.

      Thank you for your suggestion. In the revised version, we re-introduce this as follow: “As RLPs lack the intracellular signaling domains, they are anticipated to associate with adaptor kinases to form the bimolecular receptor kinases. For example, suppressor of BAK1-interacting receptor-like kinase 1 (SOBIR1) is reported to act as a common adaptor for most, if not all, of the leucine-rich repeat RLP (LRR-RLP)” in Line 48-52, “The receptor-like kinase SOBIR1, which contained a kinase domain, has been widely reported to be required for the function of LRR-RLPs in the innate immunity. However, whether SOBIR1 interacted with malectin-LRR RLP remains largely unknown” in Line 170-173.

      Line 67: There are quite a few publications showing that insect salivary proteins dampen plant immunity.

      Sorry for the inaccurate description. We agree that an accumulated literature describes the suppression of plant immunity by insect salivary proteins. However, the specific molecular mechanism by which these proteins target plant PRRs is still poorly understood. In the revised version, we specified that “it remains largely unknown how insects cope with plant PRRs” in Line 68-69.

      Line 149: I don't understand what "point-to-point Y2H" is.

      Thank you for your comment. We agree that the term "pairwise Y2H" is more commonly used in the literature than "point-to-point Y2H." To avoid any confusion and to align with standard terminology, we have replaced "point-to-point Y2H" with "pairwise Y2H" throughout the revised manuscript.

      Line 179: Replace with "NtRLP4 and NtSOBIR1 confers resistance to B. tabaci". You don't say a protein is resistant to a insect infestation. The same applies for Line 209-210.

      Thank you for your suggestion. We corrected it in the revised version.

      Minor points:

      Line 91-92: Lengthy text for simple results.

      Line 98: "which was significantly different from the actin or ribosomal 18S rRNA" can be deleted. It's self-evident that actin and 18S rRNA are controls. The same applies to Line 101.

      Line 130: unnecessary sentence, delete.

      The use of verb forms needs further correction.

      Thank you for your valuable suggestion. In the revised manuscript, we have revised the text accordingly. We truly appreciate your help in improving our manuscript.

    1. Author response:

      eLife Assessment

      This study uses a Bayesian framework to characterize latent brain state dynamics associated with memory encoding and performance in children, as measured with functional magnetic resonance imaging. The novelty of the approach offers valuable insights into memory-related brain activity, but the consideration of developmental changes in memory and brain dynamics, and the evidence to support the proposed mapping between specific states and distinct aspects of memory, are incomplete. This work will be of interest to researchers interested in cognitive neuroscience and the development of memory.

      We are grateful to the editor and reviewers for their positive feedback and constructive evaluation. Their comments have identified important areas where the manuscript can be strengthened. Below, we outline our planned revisions.

      Reviewer #1 (Public review):

      Zeng et al. characterized the dynamic brain states that emerged during episodic encoding and the reactivation of these states during the offline rest period in children aged 8-13. In the study, participants encoded scene images during fMRI and later performed a memory recognition test. The authors adopted the BSDS approach and identified four states during encoding, including an "active-encoding" state. The occupancy rate of, and the state transition rates towards, this active-encoding state positively predicted memory accuracy across participants. The authors then decoded the brain states during pre- and post-encoding rests with the model trained on the encoding data to examine state reactivation. They found that the state temporal profile and transition structure shifted from encoding to post-encoding rest. They also showed that the mean lifetime and stability (measured with self-transition probability) of the "default-mode" state during post-encoding rest predict memory performance. How brain dynamics during encoding and offline rest support long-term memory remains understudied, particularly in children. Thus, this study addresses an important question in the field. The authors implemented an advanced computational framework to identify latent brain states during encoding and carefully characterized their spatiotemporal features. The study also showed evidence for the behavioral relevance of these states, providing valuable insights into the link between state dynamics and successful encoding and consolidation.

      We thank Reviewer #1 for the positive feedback on our study. And we would like to thank you for the reviewer's constructive feedback. We plan to incorporate detailed methodological justifications and a thorough limitation analysis. We also plan to enhance the overall logical coherence of the manuscript, ensuring a more robust and scientifically sound presentation.

      Weaknesses:

      (1) If applicable, please provide information on the decoding performance of states during pre- and post-encoding rests. The Methods noted that the authors applied a threshold of 0.1 z-scored likelihood, and based on Figure S2, it seems like most TRs were assigned a reinstated state during post-encoding rest. It would be useful to know, for the decodable TRs, how strong the evidence was in favor of one state over others. Further, was decoding performance better during post- vs. pre- encoding rest? This is critical for establishing that these states were indeed "reinstated" during rest. The authors showed individual-specific correlations between encoding and post-encoding state distribution, which is an important validation of the method, but this result alone is not sufficient to suggest that the states during encoding were the ones that occurred during rest. The authors found that the state dynamics vary substantially between encoding and rest, and it would be helpful to clarify whether these differences might be related to decoding performance. I am also curious whether, if the authors apply the BSDS approach to independently identify brain states during rest periods (instead of using the trained model from encoding), they find similar states during rest as those that emerged during encoding?

      We plan three additional analyses to strengthen the evidence for state reinstatement during rest: First, we will report quantitative decoding confidence metrics for each decoded time point, including the log-likelihood between the winning state and the next-best state. We will compare these distributions between pre- and post-encoding rest to test whether decoding quality differs between conditions, as the reviewer suggests. Second, we will provide a more detailed characterization of the decoding process, including the proportion of TRs that survive the log-likelihood threshold of 0.1 during pre- vs. post-encoding rest and whether this proportion relates to memory performance. Third, we will train an independent BSDS model directly on the rest data (rather than using the encoding-trained model) and assess the degree of correspondence between the independently discovered rest states and the encoding states in terms of amplitude profiles and covariance structures. Convergence between the two approaches would provide strong validation that the encoding-defined states genuinely re-emerge at rest. Together with our evidence from our previous analyses, these additional analyses will strengthen our claims.

      (2) During post-encoding rest, the intermediate activation state (S1) became the dominant state. Overall, the paper did not focus too much on this state. For example, when examining the relationship between state transitions and memory performance, the authors also did not include this state as a part of the analyses presented in the paper (lines 203-211). Could the author report more information about this state and/or discuss how this state might be relevant to memory formation and consolidation?

      We thank the reviewer for this suggestion. During encoding, S1 had the lowest occupancy (~10%) and showed no significant relationship with memory performance, which led us to interpret it as a non-essential transient configuration. In the revision, we will provide a more thorough characterization of S1, and conduct correlation analyses to probe whether its dynamic properties during post-encoding rest correlate with individual memory performance.

      (3) Two outcome measures from the BSDS model were the occupancy rate and the mean lifetime. The authors found a significant association with behavior and occupancy rate in some analyses, and mean lifetime in others. The paper would benefit from a stronger theoretical framing explaining how and why these two different measures provide distinct information about the brain dynamics, which will help clarify the interpretation of results when association with behavior was specific to one measure.

      We thank the reviewer for this suggestion. Occupancy rate and mean lifetime, while related, capture fundamentally different aspects of brain state dynamics. Occupancy rate reflects the total proportion of time the brain spends in a given state, capturing the overall prevalence of that configuration across the scanning session. Mean lifetime, by contrast, measures the average uninterrupted duration of each state visit, indexing the temporal stability or persistence of a given network configuration once it is entered. Critically, two states could have identical occupancy rates but very different mean lifetimes, a state visited frequently but briefly versus one visited rarely but sustained, implying distinct underlying neural dynamics. In the context of memory, high occupancy of the active-encoding state may reflect repeated engagement of encoding-optimal circuits, while long mean lifetime of the default-mode state during rest may reflect sustained consolidation-related processing. We will expand the theoretical framework in the revised manuscript to articulate these distinctions and connect them to extant findings suggesting that temporal stability versus frequency of state visits may have dissociable behavioral correlates in working memory and episodic memory (He et al., 2023; Stevner et al., 2019).

      (4) For performance on a memory recognition test, d' is a more common metric in the literature as it isolates the memory signal for the old items from response bias. According to Methods (line 451), the authors have computed a different metric as their primary behavioral measure (hits + correction rejections - misses - false alarms). Please provide a rationale for choosing this measure instead. Have the authors considered computing d' as well and examining brain-behavior relationships using d'?

      Our primary memory recognition metric computed as (hits + correct rejections − misses − false alarms) / total trials, provides an unbiased linear estimate of discrimination ability that is mathematically consistent with d' in directional effects. We selected this measure because it is particularly robust with limited trial counts per condition (Verde et al., 2006; Wickens, 2001). Nonetheless, we agree that reporting d' is important for comparability with the broader literature. In the revision, we will compute d' for each participant and conduct parallel brain–behavior correlation analyses to demonstrate that our findings are robust across both metrics.

      (5) While this study examined brain state dynamics in children, there was no adult sample to compare with. Therefore, it is hard to conclude whether the findings are specific to children (or developing brains). It would be helpful to discuss this point in the paper.

      We thank the reviewer for raising this point. While several studies have documented memory-related replay and reinstatement in adults at both the regional and systems levels(Tambini et al., 2017; Wimmer et al., 2020), few have examined whether analogous state-level reinstatement occurs in children. Our study was motivated by this gap: we sought to test whether children show dynamic brain state reinstatement mechanisms similar to those described in adults. However, we acknowledge that without a direct adult comparison, we cannot determine whether the observed patterns are unique to children or reflect general principles of episodic memory organization. In the revised manuscript, we will: (a) frame the study more carefully as examining whether established state-level consolidation mechanisms also operate during childhood, (b) discuss findings in relation to adult studies, and (c) include exploratory analyses of age-related variability in both memory performance and BSDS dynamics within our sample, while acknowledging that the narrow age range (8–13) and small sample size limit the power of such developmental analyses. We will clearly identify the absence of an adult comparison as a limitation.

      Reviewer #2 (Public review):

      This paper investigates the latent dynamic brain states that emerge during memory encoding and predict later memory performance in children (N = 24, ages: 8 -13 years). A novel computational approach (Bayesian Switching Dynamic Systems, BSDS) discovers latent brain states from fMRI data in an unsupervised and parameter-free manner that is agnostic to external stimuli, resulting in 4 states: an active-encoding state, a default-mode state, an inactive state, and an intermediate state. The key finding is that the percentage of time occupied in the active-encoding state (characterized by greater activity in hippocampal, visual, and frontoparietal regions), as well as greater transitions to this state, predicts memory accuracy. Memory accuracy was also predicted by the mean lifetime and transitions to the default-mode state (characterized by greater activity in medial prefrontal cortex and posterior cingulate cortex) during post-encoding rest. Together, the results provide insights into dynamic interactions between brain regions that may be optimal for encoding novel information and consolidating memories for long-term retention.

      We thank Reviewer #2 for recognizing the novelty and broader utility of our methodology and for noting that the manuscript is well-written and concise.

      Weaknesses:

      (1) The study focuses on middle childhood, but there is a lack of engagement in the Introduction or Discussion about what is known about memory development and the brain during this period. Many of the brain regions examined in this study, particularly frontoparietal regions, undergo developmental changes that could influence their involvement in memory encoding and consolidation. The paper would be strengthened by more directly linking the findings to what is already known about episodic memory development and the brain.

      We thank the reviewer for this suggestion. In response, we will substantially expand the Introduction and Discussion to situate our findings within the developmental cognitive neuroscience literature on episodic memory. In particular, we will address the protracted developmental trajectory of frontoparietal regions, the well-documented maturation of hippocampal–cortical connectivity during middle childhood, and how these developmental changes may influence the brain state configurations we observed (He et al., 2023; Ryali et al., 2016). This will provide the necessary developmental context for interpreting our state dynamics results.

      (2) A more thorough overview of the BSDS algorithm is needed, since this is likely a novel method for most readers. Although many of the nitty-gritty details can be referenced in prior work, it was unclear from the main text if the BSDS algorithm discovered latent states based on activation patterns, functional connectivity, or both. Figure 1F is not very informative (and is missing labels).

      We thank the reviewer for this suggestion. We agree that a more accessible overview of the BSDS algorithm (Lee et al., 2025; Taghia et al., 2018) is needed. In the revision, we will expand the Methods and provide a concise algorithmic overview in the main text that clarifies the following key points: (a) BSDS operates on multivariate time series from the ROIs and infers latent brain states defined jointly by their mean activation patterns (amplitude vectors) and inter-regional covariance matrices (functional connectivity); (b) it employs a hidden Markov model framework with Bayesian inference and automatic relevance determination to identify the number of states without manual specification; and (c) state assignments are made at each TR, yielding a temporal sequence that enables computation of occupancy rates, mean lifetimes, and transition probabilities. We will also revise Figure 1F to include appropriate labels and a clearer schematic of the model's inputs, latent structure, and outputs.

      (3) A further confusion about the BSDS algorithm was whether it necessarily had to work on the rest data. Figure 4A suggests that each TR was assigned one of the four states based on the maximum win from the log-likelihood estimation. Without more details about how this algorithm was applied to the rest data, it is difficult to evaluate the claim on page 14 about the spontaneous emergence of the states at rest.

      The key methodological point is that the BSDS model, once trained on encoding data, can be applied to new (rest) time series via log-likelihood estimation: for each TR during rest, the model computes the log-likelihood of each state given the observed multivariate signal, and the state with the maximum log-likelihood is assigned to that TR. This "decoding" approach tests whether the spatial configurations learned during encoding are present during rest, rather than fitting new states de novo. We applied a threshold to the log-likelihood values to exclude TRs where the evidence for any single state was weak, thus controlling for potential misassignment. We will substantially clarify this process in the revised Methods and main text, and as described in our response to Reviewer #1 point 1, we will also conduct additional analyses to address the concerns raised.

      (4) Although the BSDS algorithm was validated in prior simulations and task-based fMRI using sustained block designs in adults, it is unclear whether it is appropriate for the kind of event-related design used in the current study. Figure 1G shows very rapid state changes, which is quantified in the low mean lifetime of the states (between 1-3 TRs on average) in Figure 4C. On the one hand, it is a strength of the algorithm that it is not necessarily tied to external stimuli. On the other hand, it would be helpful to see simulations validating that rapid transitions between states in fMRI data are meaningful and not due to noise.

      This is an important methodological question. The rapid state changes observed in our event-related design (mean lifetimes of 1–3 TRs) differ from the longer state durations typically observed with block designs(He et al., 2023; Zeng et al., 2024), where sustained cognitive demands stabilize brain configurations. We believe these rapid transitions are consistent with the inherent dynamics of event-related encoding, where each trial involves rapid shifts between sensory processing, memory binding, and attentional engagement. Several considerations support the meaningfulness of these transitions: (a) the identified states have interpretable amplitude profiles consistent with well-established memory-related brain systems; (b) state dynamics show statistically significant, directionally consistent correlations with subsequent memory performance; and (c) the transition structure during encoding is distinct from that observed during rest, indicating sensitivity to task demands. Nonetheless, we acknowledge the concern about noise and will conduct additional analyses in the revision to address the concerns raised.

      (5) The Methods section mentions that participants actively imagined themselves within the encoded scenes and were instructed to memorize the images for a later test during the post-encoding rest scan. This detail needs to be included in the main text and incorporated into the interpretation of the findings, as there are likely mechanistic differences between spontaneous memory replay/reinstatement vs. active rehearsal.

      We thank the reviewer for this suggestion. We will include these experimental details in the main text and incorporate it into the interpretation of our findings in the context of spontaneous memory replay/reinstatement vs. active rehearsal (Liu et al., 2019; Wimmer et al., 2020).

      (6) Information about the general linear model used to discover the 16 ROIs that showed a subsequent memory effect are missing, such as: covariates in the model (motion, etc.), group analysis approach (parametric or nonparametric), whether and how multiple-comparisons correction was performed, if clusters were overlapping at all or distinct, if the total number of clusters was 16 or if this was only a subset of regions that showed the effect.

      We apologize for the missing methodological details. In the revised manuscript, we will provide complete information on the general linear model used to identify the 16 ROIs, including: the event regressors and parametric modulators included in the model, nuisance covariates (motion parameters, white matter and CSF regressors), the group-level analysis approach and statistical thresholding, the method for multiple-comparisons correction, whether the 16 ROIs represent all significant clusters or a subset, and whether any clusters were spatially overlapping. We will also clarify how peak voxels were selected for ROI definition.

      Reviewer #3 (Public review):

      This paper uses a novel method to look at how stable brain states and the transitions between them promote memory formation during encoding and post-encoding rest in children. I think the paper has some weaknesses (detailed below) that mean that the authors fall short of achieving their aims. Although the paper has an interesting methodological approach, the authors need better logic, and are potentially "double dipping" in their results - meaning their logic is circular. I think the method that they are using could be useful to the broader neuroimaging community, although they need to make this argument clearer in the paper.

      We thank Reviewer #3 for recognizing the novelty of our approach and its potential utility for the broader neuroimaging community.

      (1) The authors use children as their study subjects but fail to reconcile why children are used, if the same phenomena are expected to be seen in adults (or only children), and if and how their findings change with age across an age range that ranges from middle childhood into early adolescence. They need to include more consideration for the development of their subject population. The authors should make it clear why and how memory was tested in children and not adults. Are adults and children expected to encode and consolidate in a similar manner to children? Do the findings here also apply to adults? How was the age range of 8-13-year-old children selected? Why didn't the authors look at change with age? Does memory performance change with age? Do the BSDS dynamics change with age in the authors' sample?

      Our study was motivated by the observation that while adult studies have documented memory replay and reinstatement, very little is known about whether these dynamic state-level mechanisms operate during middle childhood, a period characterized by substantial improvements in episodic memory ability and ongoing maturation of frontoparietal and hippocampal–cortical circuits. The age range of 8–13 was defined a priori based on typical developmental classifications of middle childhood through early adolescence, representing a period when episodic memory abilities are developing rapidly.

      In response to the reviewer's specific questions: (a) we will conduct exploratory analyses testing whether memory accuracy, BSDS state dynamics (occupancy, mean lifetime, transitions), and brain–behavior correlations vary as a function of age within our sample; (b) we will clearly discuss whether adults are expected to show similar patterns, drawing on the extant adult literature; and (c) we will acknowledge as a limitation that our sample size (N = 24) and narrow age range provide limited statistical power for detecting continuous age-related changes, and that a dedicated cross-sectional or longitudinal developmental design would be needed to draw firm conclusions about developmental trajectories. Please also see responses to Reviewer #1 point 5 and Reviewer #2 point 1.

      (2) The authors look for brain state dynamics within a preselected set of ROIs that are selected because they display a subsequent memory effect. This is problematic because the state that is most associated with subsequent memory (S3, or State 3) is also the one that shows most activity in these regions (that have already been a priori selected due to displaying a subsequent memory effect). This logic is circular. It would be helpful if they could look at brain state dynamics in a more ROI agnostic whole brain approach so that we can learn something beyond what a subsequent memory analysis tells us. I think the authors are "double dipping" in that they selected regions for further analysis based on a subsequent memory association (remembered > forgotten contrast) and then found states within those regions showing a subsequent memory effect to further analyze for being associated with subsequent memory. Would it be possible instead to do a whole-brain analysis (something a bit more agnostic to findings) using the BSDS framework, and then, from a whole-brain perspective, look for particular brain states associated with subsequent memory? As it stands, it looks like S3 (state 3) has greater overall activation in all brain regions associated with subsequent memory, so it makes sense that this brain state is also most associated with subsequent memory. The BSDS analysis is therefore not adding anything new beyond what the authors find with the simple subsequent memory contrast that they show in Figure 1C. This particularly effects the following findings: (a) active-encoding state occupancy rate correlated positively with memory accuracy, (b) transitions to the active-encoding state were beneficial / Conversely, transitions toward the inactive state (S4) were detrimental, with incoming transitions showing negative correlations with memory accuracy / The active-encoding state serves as a "hub" configuration that facilitates memory formation, while pathways leading to this state enhance performance and transitions away from it impair encoding.

      We appreciate this critique, which raises an important concern about analytical circularity.

      a) Why BSDS adds information beyond the static subsequent memory contrast. The reviewer notes that S3 (the active-encoding state) shows high activation in the same regions selected by the subsequent memory contrast, and therefore questions whether BSDS provides new information. We respectfully argue that BSDS captures dimensions of neural organization that a static contrast cannot. Specifically: (a) the subsequent memory contrast identifies which regions are differentially active for remembered vs. forgotten items, averaged across the entire encoding session, it provides no temporal information about when or for how long these regions are co-active; (b) BSDS reveals the moment-to-moment temporal evolution of brain states, including the duration and stability of each configuration (mean lifetime), which independently predicts behavior; (c) BSDS uniquely captures transition dynamics, the rates and patterns of switching between states, which we show are predictive of memory in ways not derivable from the contrast map (e.g., transitions from S2→S3 positively predict memory, transitions toward S4 negatively predict memory); and (d) BSDS characterizes the full covariance structure among regions within each state, revealing distinct connectivity patterns (e.g., the high clustering coefficient and global efficiency of S3), which are not captured by univariate activation contrasts. Thus, while the ROI selection is informed by the subsequent memory effect, the information BSDS extracts from those regions, temporal dynamics, transition patterns, and multivariate covariance, is orthogonal to the information used for selection.

      b) Additional validation. To directly address the circularity concern empirically, we will conduct additional analysis using ROIs from previous studies (e.g. network templates) / meta-analyses/Neurosynth ROIs (He et al., 2023; Meer et al., 2020; Taghia et al., 2018), without resorting to selection based on the subsequent memory contrast.

      (3) The task used to test memory in children seems strange. Why should children remember arbitrary scenes? How this was chosen for encoding needs to be made clear. There needs to be more description of the memory task and why it was chosen. Why was scene encoding chosen? What does scene encoding have to do with the stated goal of (a) "Understanding how children's brains form lasting memories", (b) "optimizing education" and (c) "identifying learning disabilities"? What was the design of the recognition memory test? How many novel scenes were included in the test, and how were they chosen? How close were the "new" images to previously seen "old" images? Was this varied parametrically (i.e., was the similarity between new and old images assessed and quantified?)

      Scene encoding was chosen for several reasons: (a) scenes are rich, complex stimuli that engage the hippocampal–parahippocampal memory system, eliciting robust subsequent memory effects suitable for BSDS modeling; (b) scene encoding recruits distributed networks spanning visual cortex, MTL, and frontoparietal regions, enabling detection of multi-region brain states; and (c) scene encoding paradigms have been widely used in both adult and developmental studies of episodic memory and replay(Tambini et al., 2017; Tompary et al., 2017), facilitating comparison with prior work.

      Regarding the recognition test: participants viewed 200 images (100 old, 100 new), with novel scenes drawn from the same categories (buildings and natural scenes) but chosen to be perceptually distinct from studied images. Similarity between old and new images was not parametrically manipulated or quantified: we will note this limitation. We will also expand the main text to include full task details and have deleted claims about implications for educational optimization and learning disability identification (see also Reviewer #3 point 7).

      (4) They ultimately found four brain states during encoding. It would be helpful if they could make the logic and foundation for arriving at this number clear.

      The number of brain states is not predetermined by the user but is automatically determined by the BSDS algorithm through Bayesian automatic relevance determination (ARD). The model is initialized with a maximum number of possible states, and during inference, states that contribute minimally to explaining the data are effectively pruned, their associated parameters are driven to near-zero by the ARD prior. In our data, the model converged on four states. This is a key advantage of BSDS over conventional HMM approaches, which require the user to specify the state number a priori. We will clarify this process in the revised Methods and Results, referencing the original BSDS methodology paper (Taghia et al., 2018) for full mathematical details.

      (5) There is already extant work on whether brain states during post-encoding rest predict memory outcomes. This work needs to be cited and referred to. The present manuscript needs to be better situated within prior work. The authors should look at the work by Alexa Tompary and Lila Davachi. They have already addressed many of the questions that the authors seek to answer. The authors should read their papers (and the papers they cite and that cite them) and then situate their work within the prior literature.

      We agree that the manuscript must be better situated within the existing literature on post-encoding rest and memory consolidation. We will revise the Introduction and Discussion to further discuss with the foundational work in adults by Tompary & Davachi (2017, Neuron; 2024, eLife) on consolidation-related hippocampal–mPFC representational overlap, as well as Tambini & Davachi (2013, PNAS; 2019, Trends in Cognitive Sciences) on hippocampal persistence during post-encoding rest and awake reactivation(Tambini et al., 2019; Tambini et al., 2017; Tompary et al., 2017). We will explicitly discuss how our BSDS-based approach to state-level reinstatement complements and extends these earlier findings, which largely focused on region-specific pattern similarity or hippocampal–cortical connectivity, by characterizing reinstatement at the level of dynamic, whole-network configurations.

      (6) The authors should back up the claim that "successful episodic memory formation critically depends on the temporal coordination between these systems. Brain regions must coordinate their activity through dynamic functional interactions, rapidly reconfiguring their activity and connectivity patterns in response to changing cognitive demands and stimulus characteristics." Do they have any specific evidence supporting this claim?

      The claim that episodic memory depends on temporal coordination and dynamic functional interactions is supported by several lines of evidence: (a) within our study, the significant correlations between state transition rates and memory performance directly demonstrate that dynamic inter-state communication predicts memory outcomes; (b) studies showing that hippocampal–prefrontal theta coherence during encoding predicts subsequent memory (e.g., Zielinski et al., 2020)(Zielinski et al., 2020); and (c) recent work demonstrating that rapid reconfiguration of large-scale brain networks supports cognitive functions including working memory (Shine et al., 2018; Braun et al., 2015)(Braun et al., 2015; Shine et al., 2018) and episodic encoding (Phan et al., 2024)(Phan et al., 2024) We will revise this passage to include specific citations and to make clear that our own transition–behavior correlations constitute direct evidence for this claim.

      (7) These claims seem overstated: "this work has broad implications for understanding memory function in children, for developing educational interventions that enhance memory formation, and enabling early identification of children at risk for learning disabilities." Can the authors add citations that would support these claims, or if not, remove them?

      We thank the reviewer for raising this point. We agree that the current framing overstates the practical implications. We have now removed these claims and remark on future studies that are needed here.

      References

      (1) Braun, U., Schafer, A., Walter, H., Erk, S., Romanczuk-Seiferth, N., Haddad, L., . . . Bassett, D. S. (2015). Dynamic reconfiguration of frontal brain networks during executive cognition in humans. Proc Natl Acad Sci U S A, 112(37), 11678-11683.

      (2) He, Y., Liang, X., Chen, M., Tian, T., Zeng, Y., Liu, J., . . . Qin, S. (2023). Development of brain-state dynamics involved in working memory. Cerebral Cortex.

      (3) Lee, B., Young, C. B., Cai, W., Yuan, R., Ryman, S., Kim, J., . . . Menon, V. (2025). Dopaminergic modulation and dosage effects on brain state dynamics and working memory component processes in Parkinson’s disease. Nature Communications, 16(1), 2433.

      (4) Liu, Y., Dolan, R. J., Kurth-Nelson, Z., & Behrens, T. E. J. (2019). Human Replay Spontaneously Reorganizes Experience. Cell, 178(3), 640-652.e614.

      (5) Meer, J. N. v. d., Breakspear, M., Chang, L. J., Sonkusare, S., & Cocchi, L. (2020). Movie viewing elicits rich and reliable brain state dynamics. Nature Communications, 11(1), 5004.

      (6) Phan, A. T., Xie, W., Chapeton, J. I., Inati, S. K., & Zaghloul, K. A. (2024). Dynamic patterns of functional connectivity in the human brain underlie individual memory formation. Nature Communications, 15(1), 8969.

      (7) Ryali, S., Supekar, K., Chen, T., Kochalka, J., Cai, W., Nicholas, J., . . . Menon, V. (2016). Temporal Dynamics and Developmental Maturation of Salience, Default and Central-Executive Network Interactions Revealed by Variational Bayes Hidden Markov Modeling. PLoS Comput Biol, 12(12), e1005138.

      (8) Shine, J. M., & Poldrack, R. A. (2018). Principles of dynamic network reconfiguration across diverse brain states. Neuroimage, 180, 396-405.

      (9) Stevner, A. B. A., Vidaurre, D., Cabral, J., Rapuano, K., Nielsen, S. F. V., Tagliazucchi, E., . . . Kringelbach, M. L. (2019). Discovery of key whole-brain transitions and dynamics during human wakefulness and non-REM sleep. Nature Communications, 10(1), 1035.

      (10) Taghia, J., Cai, W., Ryali, S., Kochalka, J., Nicholas, J., Chen, T., & Menon, V. (2018). Uncovering hidden brain state dynamics that regulate performance and decision-making during cognition. Nature Communications, 9(1), 2505.

      (11) Tambini, A., & Davachi, L. (2019). Awake Reactivation of Prior Experiences Consolidates Memories and Biases Cognition. Trends in Cognitive Sciences, 23(10), 876-890.

      (12) Tambini, A., Rimmele, U., Phelps, E. A., & Davachi, L. (2017). Emotional brain states carry over and enhance future memory formation. Nature Neuroscience, 20(2), 271-278.

      (13) Tompary, A., & Davachi, L. (2017). Consolidation Promotes the Emergence of Representational Overlap in the Hippocampus and Medial Prefrontal Cortex. Neuron, 96(1), 228-241.e225.

      (14) Verde, M. F., Macmillan, N. A., & Rotello, C. M. (2006). Measures of sensitivity based on a single hit rate and false alarm rate: The accuracy, precision, and robustness of′, A z, and A’. Perception & psychophysics, 68(4), 643-654.

      (15) Wickens, T. D. (2001). Elementary signal detection theory: Oxford university press.

      (16) Wimmer, G. E., Liu, Y., Vehar, N., Behrens, T. E. J., & Dolan, R. J. (2020). Episodic memory retrieval success is associated with rapid replay of episode content. Nature Neuroscience, 23(8), 1025-1033.

      (17) Zeng, Y., Xiong, B., Gao, H., Liu, C., Chen, C., Wu, J., & Qin, S. (2024). Cortisol awakening response prompts dynamic reconfiguration of brain networks in emotional and executive functioning. Proceedings of the National Academy of Sciences, 121(52), e2405850121.

      (18) Zielinski, M. C., Tang, W., & Jadhav, S. P. (2020). The role of replay and theta sequences in mediating hippocampal-prefrontal interactions for memory and cognition. Hippocampus, 30(1), 60-72.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Lin et al. studies the role of EXOC6A in ciliogenesis and its relationship with the interactor myosin-Va using a range of approaches based on the RPE1 cell line model. They establish its spatio-temporal organization at centrioles, the forming ciliary vesicle and ciliary sheath using ExM, various super-resolution techniques, and EM, including correlative light and electron microscopy. They also perform live imaging analyses and functional studies using RNAi and knockout. They establish a role of EXOC6A together with myosin-Va in Golgi-derived, microtubule- and actin-based vesicle trafficking to and from the ciliary vesicle and sheath membranes. Defects in these functions impair robust ciliary shaft and axoneme formation due to defective transition zone assembly.

      Strengths:

      The study provides very high-quality data that support the conclusions. In particular, the imaging data is compelling. It also integrates all findings in a model that shows how EXOC6A participates in multiple stages of ciliogenesis and how it cooperates with other factors.

      Weaknesses:

      The precise role of EXOC6A remains somewhat unclear. While it is described as a component of the exocyst, the authors do not address its molecular functions and whether it indeed works as part of the exocyst complex during ciliogenesis.

      We sincerely thank Reviewer 1 for the thoughtful evaluation of our manuscript and the constructive comments provided. We are especially grateful for the recognition of the quality and significance of our imaging data and the comprehensive model we propose regarding EXOC6A’s role in ciliogenesis. We did not address the function of other components of the exocyst complex during ciliogenesis. However, in our biochemical analyses, Myosin‑Va specifically co‑immunoprecipitated with EXOC6A but not with other exocyst subunits tested (EXOC5 and EXOC7) (Fig. 4E) indicating a selective interaction between EXOC6A and the Myo‑Va transport machinery.

      Reviewer #2 (Public review):

      Summary:

      The molecular mechanisms underlying ciliogenesis are not well understood. Previously, work from the same group (Wu et al., 2018) identified myosin-Va as an important protein in transporting preciliary vesicles to the mother vesicles, allowing for initiation of ciliogenesis. The exocyst complex has previously been implicated in ciliogenesis and protein trafficking to cilia. Here, Lin et al. investigate the role of exocyst complex protein EXOC6A in cilia formation. The authors find that EXOC6A localizes to preciliary vesicles, ciliary vesicles, and the ciliary sheath. EXOC6A colocalizes with Myo-Va in the ciliary vesicle and the ciliary sheath, and both proteins are removed from fully assembled cilia. EXOC6A is not required for Myo-Va localization, but Myo-VA and EHD1 are required for EXOC6A to localize in ciliary vesicles. The authors propose that EXOC6A vesicles continually remodel the cilium: FRAP analysis demonstrates that EXOC6A is a dynamic protein, and live imaging shows that EXOC6A fuses with and buds off from the ciliary membrane. Loss of EXOC6A reduces, but does not eliminate, the number of cilia formed in cells. Any cilia that are still present are structurally abnormal, with either bent morphologies or the absence of some transition zone proteins. Overall, the analyses and imaging are well done, and the conclusions are well supported by the data. The work will be of interest to cell biologists, especially those interested in centrosomes and cilia.

      Strengths:

      The TEM micrographs are of excellent quality. The quality of the imaging overall is very good, especially considering that these are dynamic processes occurring in a small region of the cell. The data analysis is well done and the quantifications are very helpful. The manuscript is well-written and the final figure is especially helpful in understanding the model.

      Weaknesses:

      Additional information about the functional and mechanistic roles of EXOC6A would improve the manuscript greatly.

      We sincerely thank Reviewer 2 for the thoughtful and encouraging evaluation of our work. We are grateful for the recognition of the strengths of our study, including the quality of the TEM micrographs, the rigor of our imaging and data analysis, and the clarity of our manuscript and proposed model.

      We have expanded our analyses in the revised manuscript to better define EXOC6A’s contribution to ciliary function. Specifically, we examined the trafficking of two critical ciliary membrane-associated proteins: GPR161, a G-protein-coupled receptor involved in Sonic hedgehog (Shh) signaling, and BBS9, a core component of the BBSome complex essential for ciliary membrane protein transport. Our new data (Fig. 7C) show that both GPR161 and BBS9 fail to localize to the cilium in EXOC6A knockout cells, in contrast to wild-type controls where their ciliary localization is robust. This new evidence significantly strengthens the understanding of EXOC6A’s role.

      Reviewer #3 (Public review):

      Summary:

      Lin et al report on the dynamic localization of EXOC6A and Myo-Va at pre-ciliary vesicles, ciliary vesicles, and ciliary sheath membrane during ciliogenesis using three-dimensional structured illumination microscopy and ultrastructure expansion microscopy. The authors further confirm the interaction of EXOC6A and Myo-Va by co-immunoprecipitation experiments and demonstrated the requirement of EHD1 for the EXOC6A-labeled ciliary vesicles formation. Additional experiments using gene-silencing by siRNA and pharmacological tools identified the involvement of dynein-, microtubule-, and actin in the transport mechanism of EXOC6A-labeled vesicles to the centriole, as they have previously reported for Myo-Va. Notably, loss of EXOC6A severely disrupts ciliogenesis, with the majority of cells becoming arrested at the ciliary vesicle (CV) stage, highlighting the involvement of EXOC6A at later stages of ciliogenesis. As the authors observe dynamic EXOC6A-positive vesicle release and fusion with the ciliary sheath, this suggests a role in membrane and potentially membrane protein delivery to the growing cilium past the ciliary vesicle stage. While CEP290 localization at the forming cilium appears normal, the recruitment of other transition zone components, exemplified by several MKS and NPHP module components, was also impaired in EXOC6A-deficient cells.

      Strengths:

      (1) By applying different microscopy approaches, the study provides deeper insight into the spatial and temporal localization of EXOC6A and Myo-Va during ciliogenesis.

      (2) The combination of complementary siRNA and pharmacological tools targeting different components strengthens the conclusions.

      (3) This study reveals a new function of EXOC6A in delivering membrane and membrane proteins during ciliogenesis, both to the ciliary vesicle as well as to the ciliary sheath.

      (4) The overall data quality is high. The investigation of EXOC6A at different time points during ciliogenesis is well schematized and explained.

      Weaknesses:

      (1) Since many conclusions are based on EXOC6A immunostaining, it would strengthen the study to validate antibody specificity by demonstrating the absence of staining in EXOC6A-deficient cells.

      (2) While the authors generated an EXOC6A-deficient cell line, off-target effects can be clone-specific. Validating key experiments in a second independent knockout clone or rescuing the phenotype of the existing clone by re-expressing EXOC6A would ensure that the observed phenotypes are due to EXOC6A loss rather than unintended off-target effects.

      (3) Some experimental details are lacking from the materials and methods section. No information on how the co-immunoprecipitation experiments have been performed can be found. The concentrations of pharmacological agents should be provided to allow proper interpretation of the results, as higher or lower doses can produce nonspecific effects. For example, the concentrations of ciliobrevin and nocodazole used to treat RPE1 cells are not specified and should be included. More precise settings for the FRAP experiments would help others reproduce the presented data. Some details for the siRNA-based knockdowns, such as incubation times, can only be found in the figure legends.

      Taken together, the authors achieved their goal of elucidating the role of EXOC6A in ciliogenesis, demonstrating its involvement in vesicle trafficking and membrane remodeling in both early and late stages of ciliogenesis. Their findings are supported by experimental evidence. This work is likely to have an impact on the field by expanding our understanding of the molecular machinery underlying cilia biogenesis, particularly the coordination between the exocyst complex and cytoskeletal transport systems. The methods and data presented offer valuable tools for dissecting vesicle dynamics and cilium formation, providing a foundation for future research into ciliary dysfunction and related diseases. By connecting vesicle trafficking to structural maturation of an organelle, the study adds important context to the broader description of cellular architecture and organelle biogenesis.

      We sincerely thank Reviewer 3 for the thorough and thoughtful assessment of our manuscript. We greatly appreciate the recognition of the strengths of our study, including the use of advanced microscopy techniques, complementary functional tools, and the conceptual contributions regarding EXOC6A's role in vesicle trafficking and membrane remodeling during ciliogenesis.

      Below, we detail how we have addressed the specific suggestions for improvement:

      (1) Validation of EXOC6A Immunostaining Specificity

      To directly address the reviewer’s concern regarding antibody specificity, we have included new control immunofluorescence panels in Figure S3E-F, which show a complete loss of EXOC6A signal in two independent knockout (KO) clones. These data confirm the specificity of the EXOC6A antibody used throughout the study and reinforce the accuracy of our localization analyses at different stages of ciliogenesis.

      (2) Addressing Potential Clone-Specific or Off-Target Effects

      To ensure that the observed phenotypes are attributable to EXOC6A loss and not due to off-target effects, we performed parallel analyses using two independent KO clones, all of which exhibited identical defects in ciliogenesis, including arrest at the ciliary vesicle stage and impaired cilia assembly (Fig. S3C-D).

      In addition, we conducted rescue experiments by re-expressing EXOC6A in the KO background, which effectively restored ciliogenesis. Quantitative analysis of the rescue data has been added to the revised manuscript (Figure S6B), providing further support that the observed phenotype is specifically due to EXOC6A deficiency.

      (3) Expanded Methodological Details

      - A detailed protocol for co-immunoprecipitation experiments, including lysis conditions, antibody concentrations, and washing steps.

      - The precise concentrations and treatment durations for all pharmacological agents used, including ciliobrevin and nocodazole.

      - Comprehensive details on the siRNA-mediated knockdowns, including oligonucleotide sequences, transfection reagents, and incubation durations.

      Recommendations for the authors:

      Reviewing Editor Comments:

      After further consultation, all 3 reviewers agreed that this is an important study with highquality data, in particular the imaging data. They also considered most of the evidence convincing, but overall they termed it "solid" for two main reasons: first, they would have liked to see a validation of the EXOC6A antibody specificity, and second, they suggest that you demonstrate for at least key experiments the phenotypes with a second KO clone, to exclude clonal effects. In principle, rescue would be suited to address this, but the issue here is that the presented rescue is not very robust.

      We sincerely thank the Editor and all reviewers for their constructive and thoughtful evaluation of our manuscript. We are especially grateful for the recognition of the highquality imaging data, the experimental rigor, and the significance of our findings to the field of ciliogenesis.

      We fully acknowledge the two principal concerns raised during further consultation: (1) the need for validation of EXOC6A antibody specificity, and (2) the importance of confirming the phenotypes in an independent knockout clone to exclude clonal artifacts. We have taken both of these points seriously and have now addressed them through additional experiments and analyses, as detailed below:

      (1) Validation Using Independent Knockout Clones

      To rigorously validate antibody specificity and eliminate the possibility of clonal variation, we have characterized a second independent EXOC6A knockout (KO) clone. We confirmed complete loss of EXOC6A expression in both clones using three orthogonal approaches: genotyping, immunoblotting, and immunofluorescence (Fig. S3). Both KO clones exhibit indistinguishable phenotypes, including arrest at the ciliary vesicle stage and impaired cilia formation (Fig. S3D). 

      (2) Rescue Phenotype Validation with Statistical Significance

      In response to concerns about the robustness of the rescue, we have now included statistical analysis of the rescue experiments. A two-tailed Student’s t-test comparing ciliogenesis between the EXOC6A KO and rescue (GFP-EXOC6A re-expression) conditions shows a statistically significant improvement (p = 0.0041) (Fig. S6B). While we acknowledge that the rescue is partial—likely due to limitations of overexpression systems—the statistically significant recovery provides strong genetic evidence that the phenotypes are specific and reversible. These data are now included in the revised Figure S6.

      (3) Functional Consequences of EXOC6A Loss on Ciliary Membrane Protein Trafficking

      To further strengthen the mechanistic conclusions, we expanded our study to include the trafficking of two functional ciliary membrane proteins. We show that in EXOC6A KO cells, both BBS9 (a component of the BBSome complex) and GPR161 (a GPCR involved in Shh signaling) fail to enter the cilium. These results suggest that EXOC6A is required not only for early structural events in ciliogenesis, but also for establishing a competent transition zone, critical for ciliary membrane protein recruitment. These findings are detailed in the revised Figure 7C and corresponding Results.

      We believe that these additional experiments and clarifications directly address the concerns and significantly strengthen the robustness and impact of our study.

      The reviewers also made additional suggestions regarding functional and mechanistic insights that would strengthen the manuscript even further.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should include control IF panels for the specificity of the EXOC6A stainings at the various ciliogenesis stages using the KO cell line.

      We thank the reviewer for this important suggestion. We have now included the requested immunofluorescence (IF) control panels to validate the specificity of the EXOC6A antibody. As shown in the newly added Figure S3, EXOC6A immunofluorescence signal is completely absent in EXOC6A knockout (KO) cells at CV (Fig. S3E) and cilia membrane (Fig. S3F) stages, whereas robust and stage-specific signals are observed in wild-type cells. These results confirm the specificity of the endogenous EXOC6A staining used throughout the study and validate the spatiotemporal localization patterns reported in the main figures.

      (2) It would be informative to compare EXOC6A KO and RNAi to determine whether the only partially impaired ciliogenesis phenotype may be a consequence of cellular adaptation.

      We appreciate the reviewer’s concern regarding potential cellular adaptation or clonespecific effects. To address this, we examined the ciliogenesis phenotype in two independent EXOC6A KO clones generated using distinct sgRNA targeting strategies. As shown in Figure S3, two independent KO clones displayed a highly consistent phenotype characterized by a pronounced arrest at the ciliary vesicle (CV) stage and a significant reduction in mature cilium formation.

      The reproducibility of this phenotype across multiple independently derived clones strongly argues against clonal variability or long-term adaptive compensation as the underlying cause. Instead, these results support the conclusion that the observed ciliogenesis defects are a direct and specific consequence of EXOC6A loss.

      (3) It remains unclear whether EXOC6A's function in ciliogenesis is part of the exocyst complex. This is currently implied by the context in which it is introduced and discussed, although the authors avoid any direct statement about this. Do the authors observe similar phenotypes by knocking down any other exocyst subunit? In any case, this issue should be discussed.

      We thank the reviewer for raising this conceptual point. This study did not explore the functions of other components of the exocytosis complex during ciliogenesis, which warrants further investigation in the future. However, in our biochemical analyses, Myosin ‑Va specifically co‑immunoprecipitated with EXOC6A but not with other exocyst subunits tested (EXOC5 and EXOC7) (Fig. 4E) indicating a selective interaction between EXOC6A and the Myo‑Va transport machinery.

      Reviewer #2 (Recommendations for the authors):

      To clarify the roles of EXOC6A in ciliogenesis, I suggest the following:

      (1) Myo-Va is involved in both the intracellular and extracellular ciliogenesis pathways. The authors show that EXOC6A has a role in the intracellular ciliogenesis pathway. Does it also participate in the extracellular pathway?

      We thank the reviewer for this insightful question. Given that Myo-Va functions in both intracellular and extracellular ciliogenesis pathways, it is indeed plausible that EXOC6A may also participate in the extracellular pathway. However, the current study was specifically focused on elucidating the molecular mechanisms of intracellular ciliogenesis using RPE1 cells, which exclusively undergo this pathway. Assessing EXOC6A’s role in the extracellular pathway would require the use of specialized models (e.g., polarized epithelial cells such as MDCK or IMCD3), which fall beyond the scope of this manuscript.

      (2) In the live imaging movies (Fig 3C, 3D, supp movie 4 and 5), the authors observe tubular structures and puncta with EXOC6A and conclude that these are dynamic vesicles/membranes. While the movies are suggestive of membrane-like behavior, it would be helpful to show that these puncta and tubules have membrane, perhaps by astaining with a membrane dye.

      We appreciate the reviewer’s suggestion to validate the membrane identity of EXOC6Apositive structures. While we did not perform membrane dye staining in the current study, we agree this approach would provide additional confirmation. Nevertheless, the dynamic behaviors observed in our live-cell imaging—including membrane-like tubulation, fusion, and fission—strongly support the interpretation that EXOC6A puncta and tubules

      (3) It is unclear how the EXOC6A tubules and vesicles are delivered, and the extent to which MyoVa plays a role. The authors co-label EXOC6A and MyoVa in Supp Fig 2, but EXOC6A dynamics seem very different here, as compared to Fig 3D - there are fewer tubules and puncta and less movement of either tubules or puncta between time points. Does expression of MyoVa decrease EXOC6A membrane dynamics? Or is it required for EXOC6A membrane dynamics?

      We thank the reviewer for this observation. The apparent differences in EXOC6A dynamics between Supplementary Figure 2 and Figure 3D most likely reflect cell-to-cell variability in dynamic behavior, which is common in live-cell imaging. Both figures were derived from the same stable cell line co-expressing EXOC6A and Myo-Va-GTD. Moreover, our analysis shows that Myo-Va-GTD overexpression does not suppress EXOC6A dynamics, nor is it required for membrane remodeling per se. However, Myo-Va is essential for EXOC6A recruitment to the ciliary vesicle, as shown by the loss of EXOC6A localization in Myo-Va KO cells (Fig. 4A).

      (4) The authors show that loss of EXOC6A affects the localization of some transition zone proteins. Does this subsequently lead to defects in transition zone function?

      We agree with the reviewer that structural defects in the transition zone (TZ) should be linked to its function. To address this, we examined the localization of two wellcharacterized ciliary membrane-associated proteins: BBS9 and GPR161. Both proteins failed to localize to the cilia in EXOC6A knockout cells, despite proper recruitment in wildtype controls (Fig. 7C). Although we did not examine the exact functions of GPR161 and BBS9, our results suggest that the loss of EXOC6A may impair TZ function, particularly its gating capacity for membrane protein trafficking.

      (5) Additional information about how the MKS proteins are regulated by EXOC6A would be helpful to understand the mechanisms by which EXOC6A builds the transition zone. Does EXOC6A directly bind to MKS proteins, or are the MKS proteins delivered by EXOC6A-containing vesicles during ciliogenesis?

      We appreciate the reviewers' questions regarding the mechanistic relationship between EXOC6A and MKS module proteins. In this study, we did not explore the mechanism by which EXOC6A constructs the transition zone. This is an interesting topic worthy of further investigation in the future.

      Reviewer #3 (Recommendations for the authors):

      Recommended modifications:

      (1) The co-immunoprecipitation experiments suggest an interaction between EXOC6A and Myo-Va; however, the presence of a faint band in the IgG control raises some uncertainty. To reinforce this conclusion, the authors could demonstrate that the interaction is absent in the EXOC6A knockout cell line.

      We thank the reviewer for this careful observation. We acknowledge the presence of a faint Myo‑Va signal in the IgG control lane. Myosin‑Va is a highly abundant cytoskeletal motor protein and can occasionally exhibit low‑level nonspecific binding to agarose beads during immunoprecipitation assays. Importantly, the Myo‑Va signal co‑immunoprecipitated with endogenous EXOC6A is substantially stronger and specifically enriched compared with the IgG control, supporting a specific interaction.

      (2) Figure S5: The partial rescue of the EXOC6A phenotype is not entirely convincing. A statistical test to assess the significance of the observed differences may help to strengthen the authors' conclusion.

      We appreciate the reviewer’s suggestion to validate the rescue experiment. We have now performed a pairwise two‑tailed Student’s t‑test comparing ciliogenesis efficiency between EXOC6A knockout cells and rescue cells expressing GFP‑EXOC6A. As shown in the revised Figure S6 (original Figure S5), re‑expression of EXOC6A resulted in a statistically significant recovery of ciliogenesis (p = 0.0041). While the rescue is partial—likely due to inherent limitations of plasmid‑based expression systems, including variable transfection efficiency and imperfect restoration of endogenous protein levels—the statistically significant improvement confirms that the ciliogenesis defect is specifically caused by EXOC6A loss. Figure S6 and its legend have been updated accordingly.

      (3) A detailed description of the EXOC6A knockout strategy should be included.

      The Method section has been expanded to include a comprehensive description of the CRISPR/Cas9 ‑ mediated EXOC6A knockout strategy, including sgRNA sequences, genomic target sites, and validation approaches. Additionally, we now include Figure S3, demonstrating complete loss of EXOC6A protein expression in two independent knockout clones, confirming the efficiency and specificity of the gene‑editing strategy.

      (4) The labeling in Figure 6 is confusing; assigning a separate letter to each panel would improve clarity.

      Figure 6 has been reorganized for clarity: the original panels have been subdivided and relabeled as 6A/6A’ and 6B/6B’, respectively. The figure legend and all corresponding references in the main text have been updated accordingly.

      (5) Lines 109-112: The cell line used is not well described. While experts might understand that Dox is used to induce expression of the transgenes, this should be better explained for non-expert readers.

      We have revised the text to clearly explain that doxycycline (Dox) is used to induce transgene expression via a Tet‑On inducible system. This clarification has been added to the main text.

      (6) Line 180: replace "labels" with "structures".

      We have revised the text as suggested.

      (7) Line 189: the EXOC6A recruitment to the membrane structures seems to be occurring on a short timescale that should be specified. In this context, "immediately" appears unscientific.

      We have revised the sentence to specify that EXOC6A recruitment occurs within seconds, based on our live‑cell imaging data, providing a more accurate temporal description.

      (8) Lines 280-282: We recommend rewording to soften this statement. Actin and microtubule inhibitors affect the entire cytoskeletal network; more specific experiments would be required to assess whether the transport of vesicles is defective.

      We have reworded the statement to indicate that the accumulation of these vesicles at the mother centrioles is highly sensitive to disruption of dynein or microtubules, suggesting that efficient transport of these vesicles may depend on the integrity of the microtubule network. However, more experiments are required to confirm this conclusion. 

      (9) Lines: 428-433: Similarly, we recommend rewording this statement as it presents the authors' current model, which is in line with the presented data but would require more rigorous investigation.

      We have revised this section to describe the mechanism as a working model supported by our data, while acknowledging that further investigation will be required to fully establish the proposed hierarchy and molecular details.

      Questions and comments to consider:

      (1) 15-30% of cells can form cilia-like structures in the EXOC6A KO cells, although membrane transport should be reduced. It would be interesting to investigate whether these cilia are only formed intracellularly and fail to reach the cell surface.

      We thank the reviewer for this insightful question. Using both immunofluorescence and electron microscopy, we observed that a subset of ciliary membranes in EXOC6A KO cells do appear to fuse with the plasma membrane. However, due to the low frequency and heterogeneous morphology of these structures, we were unable to reliably quantify this population. 

      (2) In the Western blot shown in Figure 4, EXOC6A appears at multiple molecular weights when detected with the anti-EXOC6A antibody. Providing a possible explanation for this shift would be helpful.

      We clarify that the apparent molecular weight shift likely results from gel distortion during electrophoretic separation. Importantly, the specificity of the major EXOC6A band was rigorously validated by its complete absence in EXOC6A knockout lysates, confirming that the detected signal corresponds to EXOC6A.

      (3) The Western blot in Figure 5B is not fully convincing; including additional independent blots would be nice.

      We thank the reviewer for this suggestion. Figure 5B has been replaced with a blot from an independent experiment, improving clarity and reproducibility.

      (4) According to the materials and methods section, siRNA-mediated knockdown of targets was performed using a single siRNA per gene, which could result in off-target effects. It would be advised to use several different siRNAs for a single target to exclude off-target effects, cite references or, in case this has been done.

      We appreciate this concern. The siRNAs used in this study were previously validated in our earlier work (Wu et al., Nat Cell Biol 2018), where both specificity and efficiency were rigorously tested. We have now explicitly cited this reference in the Materials and Methods section to justify the selection of these reagents.

      (5) The abbreviation CFLEM is uncommon for correlative (fluorescence) light and electron microscopy; the authors should consider using the standard abbreviation CLEM.

      We have replaced “CFLEM” with the standard term CLEM (Correlative Light and Electron Microscopy) throughout the manuscript and figure legends.

      (6) The term "M-centriole" is uncommon and should at least be introduced. The use of the term "mother centriole" is recommended.

      We have replaced “M‑centriole” with the standard term “mother centriole” throughout the manuscript and figures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Lipid transfer proteins (LTPs) play a crucial role in the intramembrane lipid exchange within cells. However, the molecular mechanisms that govern this activity remain largely unclear. Specifically, the way in which LTPs surmount the energy barrier to extract a single lipid molecule from a lipid bilayer is not yet fully understood. This manuscript investigates the influence of membrane properties on the binding of Ups1 to the membrane and the transfer of phosphatidic acid (PA) by the LTP. The findings reveal that Ups1 shows a preference for binding to membranes with positive curvature. Moreover, coarse-grained molecular dynamics simulations indicate that positive curvature decreases the energy barrier associated with PA extraction from the membrane. Additionally, lipid transfer assays conducted with purified proteins and liposomes in vitro demonstrate that the size of the donor membrane significantly impacts lipid transfer efficiency by Ups1-Mdm35 complexes, with smaller liposomes (characterized by high positive curvature) promoting rapid lipid transfer.

      This study offers significant new insights into the reaction cycle of phosphatidic acid (PA) transfer by Ups1 in mitochondria. Notably, the authors present compelling evidence that, alongside negatively charged phospholipids, positive membrane curvature enhances lipid transfer - an effect that is particularly relevant at the mitochondrial outer membrane. The experiments are technically robust, and my primary feedback pertains to the interpretation of specific results.

      (1) The authors conclude from the lipid transfer assays (Figure 5) that lipid extraction is the rate-limiting step in the transfer cycle. While this conclusion seems plausible, it should be noted that the authors employed high concentrations of Ups1-Mdm35 along with less negatively charged phospholipids in these reactions. This combination may lead to binding becoming the rate-limiting factor. The authors should take this point into consideration. In this type of assay, it is challenging to clearly distinguish between binding, lipid extraction, and membrane dissociation as separate processes.

      We have included a detailed consideration of this issue on page 11 of the revised manuscript.

      (2) The authors should discuss that variations in the size of liposomes will also affect the distance between them at a constant concentration, which may affect the rate of lipid transfer. Therefore, the authors should determine the average size and size distribution of liposomes after sonication (by DLS or nanoparticle analyzer, etc.)

      We have included DLS measurements for all lipid sizes (page 6) (SupFig. 2A). Due to the sensitivity of the intensity distribution in DLS measurements by larger particles, we also conducted cryo-EM analysis of vesicles with different sizes (page 6) (SupFig. 2B).

      We also now discuss the challenges posed by a fixed membrane-binding surface, which can lead to variations in vesicle spacing when using liposomes of different sizes and its possible influence on the interpretation of results (page 10-11).

      (3) The authors use NBD-PA in the lipid transfer assays. Does the size of the donor liposomes affect the transfer of NBD-PA and DOPA similarly? Since NBD-labeled lipids are somewhat unstable within lipid bilayers (as shown by spontaneous desorption in Figure 5B), monitoring the transfer of unlabeled PA in at least one setting would strengthen the conclusion of the swap experiments.

      To experimentally address this comment, we explored several different approaches. We first performed transfer experiments using unlabelled lipids, following the general procedures described in the manuscript. After the transfer reaction, we attempted to separate donor and acceptor vesicles by centrifugation and subsequently analyzed the samples by high-resolution mass spectrometry and thin-layer chromatography. Despite considerable effort, we were not able to reliably separate the differently sized liposomes. In particular, small liposomes proved difficult to handle during centrifugation, which is a well-known challenge (Kučerka et al. 1994, BBA; Boucrot et al. 2012, Cell). In addition, liposomes exhibited a tendency to cross-link in the presence of protein, further complicating the separation. Even if this separation step were straightforward, an important limitation of such an approach is that it is very difficult to monitor lipid transfer with sufficient time resolution. Much of the relevant activity occurs within the first 20–30 seconds, and precise interruption at defined time points would be essential.

      We therefore set out to establish a fluorescence-based assay that would allow us to follow lipid transfer in real time. For this, we adapted a dequenching-type assay based on a PE coupled fluorescein dye, whose fluorescence is quenched in the proximity of negative charges (e.g., negatively charged lipid headgroups). In principle, this assay should allow us to monitor the movement of negatively charged PA lipids away from donor membranes. Although a fluorescein-based passive lipid-transfer assay has been described previously (Richens et al., 2017), it is used only rarely in the lipid-transfer field. While establishing this assay, we encountered several technical challenges. For example, immediately after protein addition, fluorescence intensity changed in unexpected ways that could not be attributed to lipid transfer. Such effects have been reported in the literature (Wall et al., 1995) and are most likely caused by changes in membrane charge density upon protein binding. After extensive fine -tuning of the experimental conditions and careful evaluation of the data, we were ultimately able to demonstrate that lipid-transfer rates are significantly higher with smaller than with larger liposomes. These results confirm our initial observations, and importantly, they were obtained using unlabelled PA.

      The revised manuscript now includes this independent lipid-transfer assay demonstrating the transfer of non-labelled PA (page 11) (SupFig. 4).

      (4) The present study suggests that membrane domains with positive curvature at the outer membrane may serve as starting points for lipid transport by Ups1-Mdm35. Is anything known about the mechanisms that form such structures? This should be discussed in the text.

      We included a detailed consideration of this interesting point in the discussion section on page 13-14.

      Reviewer #2 (Public review):

      Summary:

      Lipid transfer between membranes is essential for lipid biosynthesis across different organelle membranes. Ups1-Mdm35 is one of the best-characterized lipid transfer proteins, responsible for transferring phosphatidic acid (PA) between the mitochondrial outer membrane (OM) and inner membrane (IM), a process critical for cardiolipin (CL) synthesis in the IM. Upon dissociation from Mdm35, Ups1 binds to the intermembrane space (IMS) surface of the OM, extracts a PA molecule, re-associates with Mdm35, and moves through the aqueous IMS to deliver PA to the IM. Here, the authors analyzed the early steps of this PA transfer - membrane binding and PA extraction - using a combination of in vitro biochemical assays with lipid liposomes and purified Ups1-Mdm35 to measure liposome binding, lipid transfer between liposomes, and lipid extraction from liposomes. The authors found that membrane curvature, a previously overlooked property of the membrane, significantly affects PA extraction but not PA insertion into liposomes. These findings were further supported by MD simulations.

      Strengths:

      The experiments are well-designed, and the data are logically interpreted. The present study provides an important basis for understanding the mechanism of lipid transfer between membranes.

      Weaknesses:

      The physiological relevance of membrane curvature in lipid extraction and transfer still remains open.

      We thank the reviewer for the constructive feedback on our work. We agree that the physiological relevance of membrane curvature in lipid extraction and transfer remains an open question. Our data show that Ups1 binding to native-like OM membranes under physiological pH conditions is curvature-dependent, supporting the idea that this mechanism may optimize lipid transfer in vivo. While the intricate biophysical basis of this behaviour can only be dissected in vitro, these findings offer valuable insight into how curvature may functionally regulate Ups1 activity in the cellular context. To directly test this, it will be important in future studies to identify Ups1 mutants that lack curvature sensitivity and assess their performance in vivo, which will help clarify the physiological importance of this mechanism.

      Reviewer #3 (Public review):

      The manuscript by Sadeqi et al. studies the interactions between the mitochondrial protein Ups1 and reconstituted membranes. The authors apply synthetic liposomal vesicles to investigate the role of pH, curvature, and charge on the binding of Ups1 to membranes and its ability to extract PA from them. The manuscript is well written and structured. With minor exceptions, the authors provide all relevant information (see minor points below) and reference the appropriate literature in their introduction. The underlying question of how the energy barrier for lipid extraction from membranes is overcome by Ups1 is interesting, and the data presented by the authors could offer a valuable new perspective on this process. It is also certainly a challenging in vitro reconstitution experiment, as the authors aim to disentangle individual membrane properties (e.g., curvature, charge, and packing density) to study protein adsorption and lipid transfer. I have one major suggestion and a few minor ones that the authors might want to consider to improve their manuscript and data interpretation:

      Major Comments:

      The experiments are performed with reconstituted vesicles, which are incubated with recombinant protein variants and quantitatively assessed in flotation and pelleting assays. According to the Materials and Methods section, the lipid concentration in these assays is kept constant at 5 µM. However, the authors change the size of the vesicles to tune their curvature. Using the same lipid concentration but varying vesicle sizes results in different total vesicle concentrations. Moreover, larger vesicles (produced by freeze-thawing and extrusion) tend to form a higher proportion of multilamellar vesicles, thus also altering the total membrane area available for binding. Could these differences in the experimental system account for the variation in binding? To address this, the authors would need to perform the experiments either under saturated (excess protein) conditions or find an experimental approach to normalize for these differences.

      To experimentally address this comment, we have conducted a detailed structural analysis of liposomes of different sizes using cryo-EM to determine the degrees of multi-lamellarity and to estimate how much membrane surface is available for protein binding. We found that while indeed as expected liposomes extruded through a 400 nm sized filter showed about 75 % of the initially calculated membrane surface is still available (SupFig. 3A). For 50 nm extruded liposomes, this number went up to about 93 % and for sonicated liposomes the number was about 94 %. Given the fact that we found about 70 % binding of Ups1 to sonicated liposomes, while this number went down to about 40 % with 50 nm liposomes and to about 30 % for 400 nm extruded liposomes, we can rule out that the effects we observe are due to an increased or decreased available membrane binding area.

      Additionally, we performed experiments with increasing amounts of lipids to analyse the impact of lipid concentration on Ups1 membrane binding, when comparing 400 nm extruded liposomes with sonicated liposomes. Interestingly, while we do observe an increased binding of Ups1 to sonicated liposomes with concentrations varying between 2.5 mM to 10 mM no major increase in binding was observed with 400 nm extruded liposomes. Ups1 membrane binding to sonicated liposomes highly exceeded binding to 400 nm extruded liposomes under all tested conditions (page 7) (SupFig. 3B).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors:):

      (1) Figures 1, 2, and 3 - In the flotation assays, the Ups1-containing fractions differ between experiments. The presence of liposomes in these fractions should be confirmed, for example, by fluorescence measurements. In relation to this, the broad low MW bands in Supplementary Figure 3 may reflect liposomes (mixed micelles of lipids and SDS?), as their fractionation patterns coincide with those of Ups1 at pH 5.5 -6.7 but deviate at pH 7.0 and 7.5. Could the authors clarify this discrepancy?

      Flotation profiles vary with changing conditions of the experiment. We have included a picture of a gel showing the Coomassie staining and the fluorescence of the used lipids side by side to show that the protein bands co-migrate together with liposomes (SupFig. 5). 

      (2) Figures 2, 3, and 5 - The sizes of the liposomes (400 nm and 50 nm) should be experimentally confirmed, e.g., by dynamic light scattering (DLS).

      We have included DLS measurements confirming the differences of liposome sizes. Please see answer to point 2 of Reviewer 1.

      (3) Figure 4C - The free energy landscape for different phospholipids is interesting. What about other acidic phospholipids, such as PS?

      This is indeed an interesting point. Our molecular dynamics simulations show that PE has a similar free energy landscape to PA while PC is significantly different. This might point into the direction that the headgroup size plays a major role. For intra-mitochondrial PS transport a specific protein complex consisting of Ups2/Mdm35 has been identified, and it will be an interesting question for future studies if PS transfer is regulated by similar factors.

      (4) Supplementary Figure 2 - The deformation of liposomes by Ups1 is interesting. Does this depend on the presence of PA or other acidic phospholipids?

      We asked ourself the same question throughout the project. As pointed out in the manuscript, the membrane-deforming activity of Ups1 is relatively mild when compared to proteins found for example in endocytosis. This made a proper static analysis challenging. We weren’t able to unambiguously show whether other acidic phospholipids showed comparable effects to PA.

      (5) It may not be easy to assess experimentally, but the OM in mitochondria should have scramblase activity. Then, such scramblase activity could influence the observed effects of membrane curvature on Ups1-mediated PA transfer.

      (6) It would be helpful to discuss this possibility in the manuscript.

      In the revised version of the manuscript, we now discuss the existence of scramblases, such as Sam50 and VDAC, in the outer mitochondrial membrane with regard to their likely effect on membrane packing (page 13 - 14). As for a co-reconstitution experiment we considered the in vitro analysis of the impact that a scramblase in liposomes might have on lipid transfer outside the scope of this study. 

      (7) Figure 6 is not referenced in the main text.

      Thank you, this oversight was corrected.

      (8) The non-abbreviated forms of LUV and SUV should be defined in the text upon first use.

      We now include a definition in the manuscript.

      (9) The term "transfer velocity" would be better expressed as "transfer rate".

      We agree, and we changed the wording accordingly.

      Reviewer #3 (Recommendations for the authors):

      (1) As flotation assays are a central technique of the study, readers who are not familiar with this method could benefit from a few explanatory sentences and appropriate references in the introduction section.

      Figure 1B now contains an updated version of a cartoon outlining the flotation assay and a description in the manuscript (page 4) that should make it easier to understand the assay. We have also included a direct reference within the methods section to a paper describing this assay in more detail.

      (2) Related to the major point, but also to improve the manuscript overall, the authors could add DLS (for size distribution and zeta potential) and cryo-EM (for multilamellarity analysis) data. This would aid future efforts to reproduce their observations.

      In the revised version of the manuscript we include DLS and zeta potential measurements as well as a detailed analysis of liposome multilamellarity by cryo-EM (also see answer to point 2 by Reviewer 1) (SupFig. 2A & B; SupFig. 3E).

      (3) Could the authors state the specific zeta potentials of the negatively charged (under varying pH) and neutral liposomes and relate these to natural membranes?

      We have included zeta potential measurements of differently charged liposomes in and changed the text accordingly (page 8) (SupFig. 3E).

      (4) Changes in pH affect several characteristics of membranes (including lipid dipoles, charge, packing density, fluidity, and phase separation), particularly charge density. This experimental system does not allow all of these factors to be disentangled and studied separately. Some of the observations presented in Figures 2 and 5 could also be explained by these effects.

      The effects of pH on various membrane properties, such as lipid headgroup dipoles, lipid packing, interfacial tension, and others, are well described in the literature. For example, it was implied that increasing pH leads to phosphatidic acid (PA) becoming more negatively charged when in proximity to phosphatidylethanolamine (PE). We already discuss this effect in the manuscript, as our observation that Ups1 binding to membranes depends on negatively charged lipids but nevertheless increases with decreasing pH is unexpected.

      As pointed out, many of the parameters mentioned above are beyond control in our assays, and a systematic analysis of each of these factors with respect to Ups1 membrane binding and lipid transfer would be well beyond the scope of this manuscript. We have therefore included a passage discussing this issue in more detail (page 4-5).

      (5) Is the curvature simulated in the theoretical models comparable to the curvature of the liposome systems (e.g., a sphere of 100 nm diameter)?

      The simulated curvature spans a defined range, with the highest curvature corresponding to vesicles with diameters of approximately 15 nm. This corresponds reasonably well to the vesicle size distribution as analyzed by cryo-EM.

      Reference

      Connerth, M., Tatsuta, T., Haag, M., Klecker, T., Westermann, B., & Langer, T. (2012). Intramitochondrial transport of phosphatidic acid in yeast by a lipid transfer protein. Science, 338(6108), 815-818. https://doi.org/10.1126/science.1225625

      Lu, J., Chan, C., Yu, L., Fan, J., Sun, F., & Zhai, Y. (2020). Molecular mechanism of mitochondrial phosphatidate transfer by Ups1. Commun Biol, 3(1), 468. https://doi.org/10.1038/s42003-020-01121-x

      Miliara, X., Garnett, J. A., Tatsuta, T., Abid Ali, F., Baldie, H., Perez-Dorado, I., Simpson, P., Yague, E., Langer, T., & Matthews, S. (2015). Structural insight into the TRIAP1/PRELI-like domain family of mitochondrial phospholipid transfer complexes. EMBO Rep, 16(7), 824-835. https://doi.org/10.15252/embr.201540229

      Miliara, X., Tatsuta, T., Berry, J. L., Rouse, S. L., Solak, K., Chorev, D. S., Wu, D., Robinson, C. V., Matthews, S., & Langer, T. (2019). Structural determinants of lipid specificity within Ups/PRELI lipid transfer proteins. Nat Commun, 10(1), 1130. https://doi.org/10.1038/s41467-019-09089-x

      Miliara, X., Tatsuta, T., Eiyama, A., Langer, T., Rouse, S. L., & Matthews, S. (2023). An intermolecular hydrogen-bonded network in the PRELID-TRIAP protein family plays a role in lipid sensing. Biochim Biophys Acta Proteins Proteom, 1871(1), 140867. https://doi.org/10.1016/j.bbapap.2022.140867

      Potting, C., Tatsuta, T., Konig, T., Haag, M., Wai, T., Aaltonen, M. J., & Langer, T. (2013). TRIAP1/PRELI complexes prevent apoptosis by mediating intramitochondrial transport of phosphatidic acid. Cell Metab, 18(2), 287-295. https://doi.org/10.1016/j.cmet.2013.07.008

      Richens, J. L., Tyler, A. I. I., Barriga, H. M. G., Bramble, J. P., Law, R. V., Brooks, N. J., Seddon, J. M., Ces, O., & O'Shea, P. (2017). Spontaneous charged lipid transfer between lipid vesicles. Sci Rep, 7(1), 12606. https://doi.org/10.1038/s41598-017-12611-0

      Wall, J., Golding, C. A., Van Veen, M., & O'Shea, P. (1995). The use of fluoresceinphosphaCdylethanolamine (FPE) as a real-time probe for peptide-membrane interactions. Mol Membr Biol, 12(2), 183-192. https://doi.org/10.3109/09687689509027506

      Watanabe, Y., Tamura, Y., Kawano, S., & Endo, T. (2015). Structural and mechanistic insights into phospholipid transfer by Ups1-Mdm35 in mitochondria. Nat Commun, 6, 7922. https://doi.org/10.1038/ncomms8922

    1. Author response:

      Reviewer 1 (Public review):

      (1) Figure 1B shows the PREDICTED force-extension curve for DNA based on a worm-like chain model. Where is the experimental evidence for this curve? This issue is crucial because the F-E curve will decide how and when a catch-bond is induced (if at all it is) as the motor moves against the tensiometer. Unless this is actually measured by some other means, I find it hard to accept all the results based on Figure 1B.

      The Worm-Like-Chain model for the elasticity of DNA was established by early work from the Bustamante lab (Smith et al., 1992)  and Marko and Siggia (Marko and Siggia, 1995), and was further validated and refined by the Block lab (Bouchiat et al., 1999; Wang et al., 1997). The 50 nm persistence length is the consensus value, and was shown to be independent of force and extension in Figure 3 of Bouchiat et al (Bouchiat et al., 1999). However, we would like to stress that for our conclusions, the precise details of the Force-Extension relationship of our dsDNA are immaterial. The key point is that the motor stretches the DNA and stalls when it reaches its stall force. Our claim of the catch-bond character of kinesin is based on the longer duration at stall compared to the run duration in the absence of load. Provided that the motor is indeed stalling because it has stretched out the DNA (which is strongly supported by the repeated stalling around the predicted extension corresponding to ~6 pN of force), then the stall duration depends on neither the precise value for the extension nor the precise value of the force at stall.

      (2) The authors can correct me on this, but I believe that all the catch-bond studies using optical traps have exerted a load force that exceeds the actual force generated by the motor. For example, see Figure 2 in reference 42 (Kunwar et al). It is in this regime (load force > force from motor) that the dissociation rate is reduced (catch-bond is activated). Such a regime is never reached in the DNA tensiometer study because of the very construction of the experiment. I am very surprised that this point is overlooked in this manuscript. I am therefore not even sure that the present experiments even induce a catch-bond (in the sense reported for earlier papers).

      It is true that Kunwar et al measured binding durations at super-stall loads and used that to conclude that dynein does act as a catch-bond (but kinesin does not) (Kunwar et al., 2011). However, we would like to correct the reviewer on this one. This approach of exerting super-stall forces and measuring binding durations is in fact less common than the approach of allowing the motor to walk up to stall and measuring the binding duration. This ‘fixed trap’ approach has been used to show catch-bond behavior of dynein (Leidel et al., 2012; Rai et al., 2013) and kinesin (Kuo et al., 2022; Pyrpassopoulos et al., 2020). For the non-processive motor Myosin I, a dynamic force clamp was used to keep the actin filament in place while the myosin generated a single step (Laakso et al., 2008). Because the motor generates the force, these are not superstall forces either.

      (3) I appreciate the concerns about the Vertical force from the optical trap. But that leads to the following questions that have not at all been addressed in this paper:

      (i) Why is the Vertical force only a problem for Kinesins, and not a problem for the dynein studies?

      Actually, we do not claim that vertical force is not a problem for dynein; our data do not speak to this question. There is debate in the literature as to whether dynein has catch bond behavior in the traditional single-bead optical trap geometry - while some studies have measured dynein catch bond behavior (Kunwar et al., 2011; Leidel et al., 2012; Rai et al., 2013), others have found that dynein has slip-bond or ideal-bond behavior (Ezber et al., 2020; Nicholas et al., 2015; Rao et al., 2019). This discrepancy may relate to vertical forces, but not in an obvious way.

      (ii) The authors state that "With this geometry, a kinesin motor pulls against the elastic force of a stretched DNA solely in a direction parallel to the microtubule". Is this really true? What matters is not just how the kinesin pulls the DNA, but also how the DNA pulls on the kinesin. In Figure 1A, what is the guarantee that the DNA is oriented only in the plane of the paper? In fact, the DNA could even be bending transiently in a manner that it pulls the kinesin motor UPWARDS (Vertical force). How are the authors sure that the reaction force between DNA and kinesin is oriented SOLELY along the microtubule?

      We acknowledge that “solely” is an absolute term that is too strong to describe our geometry. We will soften this term in our revision to “nearly parallel to the microtubule”. In the Geometry Calculations section of Supplementary Methods, we calculate that if the motor and streptavidin are on the same protofilament, the vertical force will be <1% of the horizontal force. We also note that if the motor is on a different protofilament, there will be lateral forces and forces perpendicular to the microtubule surface, except they are oriented toward rather than away from the microtubule. The DNA can surely bend due to thermal forces, but because inertia plays a negligible role at the nanoscale (Howard, 2001; Purcell, 1977), any resulting upward forces will only be thermal forces, which the motor is already subjected to at all times.

      (4) For this study to be really impactful and for some of the above concerns to be addressed, the data should also have included DNA tensiometer experiments with Dynein. I wonder why this was not done?

      As much as we would love to fully characterize dynein here, this paper is about kinesin and it took a substantial effort. The dynein work merits a stand-alone paper.

      While I do like several aspects of the paper, I do not believe that the conclusions are supported by the data presented in this paper for the reasons stated above.

      The three key points the reviewer makes are the validity of the worm-like-chain model, the question of superstall loads, and the role of DNA bending in generating vertical forces. We hope that we have fully addressed these concerns in our responses above.

      Reviewer #2 (Public review):

      Major comments:

      (1) The use of the term "catch bond" is misleading, as the authors do not really mean consistently a catch bond in the classical sense (i.e., a protein-protein interaction having a dissociation rate that decreases with load). Instead, what they mean is that after motor detachment (i.e., after a motor protein dissociating from a tubulin protein), there is a slip state during which the reattachment rate is higher as compared to a motor diffusing in solution. While this may indeed influence the dynamics of bidirectional cargo transport (e.g., during tug-of-war events), the used terms (detachment (with or without slip?), dissociation, rescue, ...) need to be better defined and the results discussed in the context of these definitions. It is very unsatisfactory at the moment, for example, that kinesin-3 is at first not classified as a catch bond, but later on (after tweaking the definitions) it is. In essence, the typical slip/catch bond nomenclature used for protein-protein interaction is not readily applicable for motors with slippage.

      We appreciate the reviewer’s point and we will work to streamline and define terms in our revision.

      (2) The authors define the stall duration as the time at full load, terminated by >60 nm slips/detachments. Isn't that a problem? Smaller slips are not detected/considered... but are also indicative of a motor dissociation event, i.e., the end of a stall. What is the distribution of the slip distances? If the slip distances follow an exponential decay, a large number of short slips are expected, and the presented data (neglecting those short slips) would be highly distorted.

      The reviewer brings up a good point that there may be undetected slips. To address this question, we plotted the distribution of slip distances for kinesin-3, which by far had the most slip events. As the reviewer suggested, it is indeed an exponential distribution. Our preliminary analysis suggests that roughly 20% of events are missed due to this 60 nm cutoff. This will change our unloaded duration numbers slightly, but this will not alter our conclusions.\

      (3) Along the same line: Why do the authors compare the stall duration (without including the time it took the motor to reach stall) to the unloaded single motor run durations? Shouldn't the times of the runs be included?

      The elastic force of the DNA spring is variable as the motor steps up to stall, and so if we included the entire run duration then it would be difficult to specify what force we were comparing to unloaded. More importantly, if we assume that any stepping and detachment behavior is history independent, then it is mathematically proper to take any arbitrary starting point (such as when the motor reaches stall), start the clock there, and measure the distribution of detachments durations relative to that starting point.

      More importantly, what we do in Fig. 3 is to separate out the ramps from the stalls and, using a statistical model, we compute a separate duration parameter (which is the inverse of the off-rate) for the ramp and the stall. What we find is that the relationship between ramp, stall, and unloaded durations is different for the three motors, which is interesting in itself.

      (4) At many places, it appears too simple that for the biologically relevant processes, mainly/only the load-dependent off-rates of the motors matter. The stall forces and the kind of motor-cargo linkage (e.g., rigid vs. diffusive) do likely also matter. For example: "In the context of pulling a large cargo through the viscous cytoplasm or competing against dynein in a tug-of-war, these slip events enable the motor to maintain force generation and, hence, are distinct from true detachment events." I disagree. The kinesin force at reattachment (after slippage) is much smaller than at stall. What helps, however, is that due to the geometry of being held close to the microtubule (either by the DNA in the present case or by the cargo in vivo) the attachment rate is much higher. Note also that upon DNA relaxation, the motor is likely kept close to the microtubule surface, while, for example, when bound to a vesicle, the motor may diffuse away from the microtubule quickly (e.g., reference 20).

      We appreciate the reviewer’s detailed thinking here, and we offer our perspective. As to the first point, we agree that the stall force is relevant and that the rigidity of the motor-cargo linkage will play a role. The goal of the sentence on pulling cargo that the reviewer highlights is to set up our analysis of slips, which we define as rearward displacements that don’t return to the baseline before force generation resumes. We agree that force after slippage is much smaller than at stall, and we plan to clarify that section of text. However, as shown in the model diagram in Fig. 5, we differentiate between the slip state (and recovery from this slip state) and the detached state (and reattachment from this detached state). This delineation is important because, as the reviewer points out, if we are measuring detachment and reattachment with our DNA tensiometer, then the geometry of a vesicle in a cell will be different and diffusion away from the microtubule or elastic recoil perpendicular to the microtubule will suppress this reattachment.

      Our evidence for a slip state in which the motor maintains association with the microtubule comes from optical trapping work by Tokelis et al (Toleikis et al., 2020) and Sudhakar et al (Sudhakar et al., 2021). In particular, Sudhakar used small, high index Germanium microspheres that had a low drag coefficient. They showed that during ‘slip’ events, the relaxation time constant of the bead back to the center of the trap was nearly 10-fold slower than the trap response time, consistent with the motor exerting drag on the microtubule. (With larger beads, the drag of the bead swamps the motor-microtubule friction.) Another piece of support for the motor maintaining association during a slip is work by Ramaiya et al. who used birefringent microspheres to exert and measure rotational torque during kinesin stepping (Ramaiya et al., 2017). In most traces, when the motor returned to baseline following a stall, the torque was dissipated as well, consistent with a ‘detached’ state. However, a slip event is shown in S18a where the motor slips backward while maintaining torque. This is best explained by the motor slipping backward in a state where the heads are associated with the microtubule (at least sufficiently to resist rotational forces). Thus, we term the resumption after slip to be a rescue from the slip state rather than a reattachment from the detached state.

      To finish the point, with the complex geometry of a vesicle, during slip events the motor remains associated with the microtubule and hence primed for recovery. This recovery rate is expected to be the same as for the DNA tensiometer. Following a detachment, however, we agree that there will likely be a higher probability of reattachment in the DNA tensiometer due to proximity effects, whereas with a vesicle any elastic recoil or ‘rolling’ will pull the detached motor away from the microtubule, suppressing reattachment. We plan to clarify these points in the text of the revision.

      (5) Why were all motors linked to the neck-coil domain of kinesin-1? Couldn't it be that for normal function, the different coils matter? Autoinhibition can also be circumvented by consistently shortening the constructs.

      We chose this dimerization approach to focus on how the mechoanochemical properties of kinesins vary between the three dominant transport families. We agree that in cells, autoinhibition of both kinesins and dynein likely play roles in regulating bidirectional transport, as will the activity of other regulatory proteins. The native coiled-coils may act as as ‘shock absorbers’ due to their compliance, or they might slow the motor reattachment rate due to the relatively large search volumes created by their long lengths (10s of nm). These are topics for future work. By using the neck-coil domain of kinesin-1 for all three motors, we eliminate any differences in autoinhibition or other regulation between the three kinesin families and focus solely on differences in the mechanochemistry of their motor domains.

      (6) I am worried about the neutravidin on the microtubules, which may act as roadblocks (e.g. DOI: 10.1039/b803585g), slip termination sites (maybe without the neutravidin, the rescue rate would be much lower?), and potentially also DNA-interaction sites? At 8 nM neutravidin and the given level of biotinylation, what density of neutravidin do the authors expect on their microtubules? Can the authors rule out that the observed stall events are predominantly the result of a kinesin motor being stopped after a short slippage event at a neutravidin molecule?

      We will address these points in our revision.

      (7) Also, the unloaded runs should be performed on the same microtubules as in the DNA experiments, i.e., with neutravidin. Otherwise, I do not see how the values can be compared.

      We will address this point in our revision.

      (8) If, as stated, "a portion of kinesin-3 unloaded run durations were limited by the length of the microtubules, meaning the unloaded duration is a lower limit." corrections (such as Kaplan-Meier) should be applied, DOI: 10.1016/j.bpj.2017.09.024.

      (9) Shouldn't Kaplan-Meier also be applied to the ramp durations ... as a ramp may also artificially end upon stall? Also, doesn't the comparison between ramp and stall duration have a problem, as each stall is preceded by a ramp ...and the (maximum) ramp times will depend on the speed of the motor? Kinesin-3 is the fastest motor and will reach stall much faster than kinesin-1. Isn't it obvious that the stall durations are longer than the ramp duration (as seen for all three motors in Figure 3)?

      The reviewer rightly notes the many challenges in estimating the motor off-rates during ramps. To estimate ramp off-rates and as an independent approach to calculating the unloaded and stall durations, we developed a Markov model coupled with Bayesian inference methods to estimate a duration parameter (equivalent to the inverse of the off-rate) for the unloaded, ramp, and stall duration distributions. With the ramps, we have left censoring due to the difficulty in detecting the start of the ramps in the fluctuating baseline, and we have right censoring due to reaching stall (with different censoring of the ramp duration for the three motors due to their different speeds). The Markov model assumes a constant detachment probability and history independence, and thus is robust even in the face of left and right censoring (details in the Supplementary section). This approach is preferred over Kaplan-Meier because, although these non-parametric methods make no assumptions for the distribution, they require the user to know exactly where the start time is.

      Regarding the potential underestimate of the kinesin-3 unloaded run duration due to finite microtubule lengths. The first point is that the unloaded duration data in Fig. 2C are quite linear up to 6 s and are well fit by the single-exponential fit (the points above 6s don’t affect the fit very much). The second point is that when we used our Markov model (which is robust against right censoring) to estimate the unloaded and stall durations, the results agreed with the single-exponential fits very well (Table S2). For instance, the single-exponential fit for the kinesin-3 unloaded duration was 2.74 s (2.33 – 3.17 s 95% CI) and the estimate from the Markov model was 2.76 (2.28 – 3.34 s 95% CI). Thus, we chose not to make any corrections due to finite microtubule lengths.

      (10) It is not clear what is seen in Figure S6A: It looks like only single motors (green, w/o a DNA molecule) are walking ... Note: the influence of the attached DNA onto the stepping duration of a motor may depend on the DNA conformation (stretched and near to the microtubule (with neutravidin!) in the tethered case and spherically coiled in the untethered case).

      In Figure S6A kymograph, the green traces are GFP-labeled kinesin-1 without DNA attached (which are in excess) and the red diagonal trace is a motor with DNA attached. There are also two faint horizontal red traces, which are labeled DNA diffusing by (smearing over a large area during a single frame). Panel S6B shows run durations of motors with DNA attached. We agree that the DNA conformation will differ if it is attached and stretched (more linear) versus simply being transported (random coil), but by its nature this control experiment is only addressing random coil DNA.

      (11) Along this line: While the run time of kinesin-1 with DNA (1.4 s) is significantly shorter than the stall time (3.0 s), it is still larger than the unloaded run time (1.0 s). What do the authors think is the origin of this increase?

      Our interpretation of the unloaded kinesin-DNA result is that the much slower diffusion constant of the DNA relative to the motor alone enables motors to transiently detach and rebind before the DNA cargo has diffused away, thus extending the run duration. In contrast, such detachment events for motors alone normally result in the motor diffusing away from the microtubule, terminating the run. This argument has been used to reconcile the longer single-motor run lengths in the gliding assay versus the bead assay (Block et al., 1990). Notably, this slower diffusion constant should not play a role in the DNA tensiometer geometry because if the motor transiently detaches, then it will be pulled backward by the elastic forces of the DNA and detected as a slip or detachment event. We will address this point in the revision.

      (12) "The simplest prediction is that against the low loads experienced during ramps, the detachment rate should match the unloaded detachment rate." I disagree. I would already expect a slight increase.

      Agreed. We will change this text to: “The prediction for a slip bond is that against the low loads experienced during ramps, the detachment rate should be equal to or faster than the unloaded detachment rate.”

      (13) Isn't the model over-defined by fitting the values for the load-dependence of the strong-to-weak transition and fitting the load dependence into the transition to the slip state?

      Essentially, yes, it is overdefined, but that is essentially by design and it is still very useful. Our goal here was to make as simple a model as possible that could account for the data and use it to compare model parameters for the different motor families. Ignoring the complexity of the slip and detached states, a model with a strong and weak state in the stepping cycle and a single transition out of the stepping cycle is the simplest formulation possible. And having rate constants (k<sub>S-W</sub> and k<sub>slip</sub> in our case) that vary exponentially with load makes thermodynamic sense for modeling mechanochemistry (Howard, 2001). Thus, we were pleasantly surprised that this bare-bones model could recapitulate the unloaded and stall durations for all three motors (Fig. 5C-E).

      (14) "When kinesin-1 was tethered to a glass coverslip via a DNA linker and hydrodynamic forces were imposed on an associated microtubule, kinesin-1 dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (37)." This statement appears not to be true. In reference 37, very similar to the geometry reported here, the microtubules were fixed on the surface, and the stepping of single kinesin motors attached to large beads (to which defined forces were applied by hydrodynamics) via long DNA linkers was studied. In fact, quite a number of statements made in the present manuscript have been made already in ref. 37 (see in particular sections 2.6 and 2.7), and the authors may consider putting their results better into this context in the Introduction and Discussion. It is also noteworthy to discuss that the (admittedly limited) data in ref. 37 does not indicate a "catch-bond" behavior but rather an insensitivity to force over a defined range of forces.

      The reviewer misquoted our sentence. The actual wording of the sentence was: “When kinesin-1 was connected to micron-scale beads through a DNA linker and hydrodynamic forces parallel to the microtubule imposed, dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (Urbanska et al., 2021).” The sentence the reviewer quoted was in a previous version that is available on BioRxiv and perhaps they were reading that version. Nonetheless, in the revision we will note in the Discussion that this behavior was indicative of an ideal bond (not a catch-bond), and we will also add a sentence in the Introduction highlighting this work.

      Reviewer #3 (Public review):

      The authors attribute the differences in the behaviour of kinesins when pulling against a DNA tether compared to an optical trap to the differences in the perpendicular forces. However, the compliance is also much different in these two experiments. The optical trap acts like a ~ linear spring with stiffness ~ 0.05 pN/nm. The dsDNA tether is an entropic spring, with negligible stiffness at low extensions and very high compliance once the tether is extended to its contour length (Fig. 1B). The effect of the compliance on the results should be addressed in the manuscript.

      This is an interesting point. To address it, we calculated the predicted stiffness of the dsDNA by taking the slope of theoretical force-extension curve in Fig. 1B. Below 650 nm extension, the stiffness is <0.001 pN/nM; it reaches 0.01 pN/nM at 855 nm, and at 960 nm where the force is 6 pN the stiffness is roughly 0.2 pN/nm. That value is higher than the quoted 0.05 pN/nm trap stiffness, but for reference, at this stiffness, an 8 nm step leads to a 1.6 pN jump in force, which is reasonable. Importantly, the stiffness of kinesin motors has been estimated to be in the range of 0.3 pN (Coppin et al., 1996; Coppin et al., 1997). Granted, this stiffness is also nonlinear, but what this means is that even at stall, our dsDNA tether has a similar predicted compliance to the motor that is pulling on it. We will address this point in our revision.  

      Compared to an optical trapping assay, the motors are also tethered closer to the microtubule in this geometry. In an optical trap assay, the bead could rotate when the kinesin is not bound. The authors should discuss how this tethering is expected to affect the kinesin reattachment and slipping. While likely outside the scope of this study, it would be interesting to compare the static tether used here with a dynamic tether like MAP7 or the CAP-GLY domain of p150glued.

      Please see our response to Reviewer #2 Major Comment #4 above, which asks this same question in the context of intracellular cargo. We plan to address this in our revision. Regarding a dynamic tether, we agree that’s interesting – there are kinesins that have a second, non-canonical binding site that achieves this tethering (ncd and Cin8); p150glued likely does this naturally for dynein-dynactin-activator complexes; and we speculated in a review some years ago (Hancock, 2014) that during bidirectional transport kinesin and dynein may act as dynamic tethers for one another when not engaged, enhancing the activity of the opposing motor.

      In the single-molecule extension traces (Figure 1F-H; S3), the kinesin-2 traces often show jumps in position at the beginning of runs (e.g., the four runs from ~4-13 s in Fig. 1G). These jumps are not apparent in the kinesin-1 and -3 traces. What is the explanation? Is kinesin-2 binding accelerated by resisting loads more strongly than kinesin-1 and -3?

      Due to the compliance of the dsDNA, the 95% limits for the initial attachment position are +/- 290 nm (Fig. S2). Thus, some apparent ‘jumps’ from the detached state are expected. We will take a closer look at why there are jumps for kinesin-2 that aren’t apparent for kinesin-1 or -3.

      When comparing the durations of unloaded and stall events (Fig. 2), there is a potential for bias in the measurement, where very long unloaded runs cannot be observed due to the limited length of the microtubule (Thompson, Hoeprich, and Berger, 2013), while the duration of tethered runs is only limited by photobleaching. Was the possible censoring of the results addressed in the analysis?

      Yes. Please see response to Reviewer #2 points (8) and (9) above.

      The mathematical model is helpful in interpreting the data. To assess how the "slip" state contributes to the association kinetics, it would be helpful to compare the proposed model with a similar model with no slip state. Could the slips be explained by fast reattachments from the detached state?

      In the model, the slip state and the detached states are conceptually similar; they only differ in the sequence (slip to detached) and the transition rates into and out of them. The simple answer is: yes, the slips could be explained by fast reattachments from the detached state. In that case, the slip state and recovery could be called a “detached state with fast reattachment kinetics”. However, the key data for defining the kinetics of the slip and detached states is the distribution of Recovery times shown in Fig. 4D-F, which required a triple exponential to account for all of the data. If we simplified the model by eliminating the slip state and incorporating fast reattachment from a single detached state, then the distribution of Recovery times would be a single-exponential with a time constant equivalent to t<sub>1</sub>, which would be a poor fit to the experimental distributions in Fig. 4D-F.

      We appreciate the efforts and helpful suggestions of all three reviewers and the Editor.

      References:

      Block, S.M., L.S. Goldstein, and B.J. Schnapp. 1990. Bead movement by single kinesin molecules studied with optical tweezers. Nature. 348:348-352.

      Bouchiat, C., M.D. Wang, J. Allemand, T. Strick, S.M. Block, and V. Croquette. 1999. Estimating the persistence length of a worm-like chain molecule from force-extension measurements. Biophys J. 76:409-413.

      Coppin, C.M., J.T. Finer, J.A. Spudich, and R.D. Vale. 1996. Detection of sub-8-nm movements of kinesin by high-resolution optical-trap microscopy. Proc Natl Acad Sci U S A. 93:1913-1917.

      Coppin, C.M., D.W. Pierce, L. Hsu, and R.D. Vale. 1997. The load dependence of kinesin's mechanical cycle. Proc Natl Acad Sci U S A. 94:8539-8544.

      Ezber, Y., V. Belyy, S. Can, and A. Yildiz. 2020. Dynein Harnesses Active Fluctuations of Microtubules for Faster Movement. Nat Phys. 16:312-316.

      Hancock, W.O. 2014. Bidirectional cargo transport: moving beyond tug of war. Nat Rev Mol Cell Biol. 15:615-628.

      Howard, J. 2001. Mechanics of Motor Proteins and the Cytoskeleton. Sinauer Associates, Inc., Sunderland, MA. 367 pp.

      Kunwar, A., S.K. Tripathy, J. Xu, M.K. Mattson, P. Anand, R. Sigua, M. Vershinin, R.J. McKenney, C.C. Yu, A. Mogilner, and S.P. Gross. 2011. Mechanical stochastic tug-of-war models cannot explain bidirectional lipid-droplet transport. Proc Natl Acad Sci U S A. 108:18960-18965.

      Kuo, Y.W., M. Mahamdeh, Y. Tuna, and J. Howard. 2022. The force required to remove tubulin from the microtubule lattice by pulling on its alpha-tubulin C-terminal tail. Nature communications. 13:3651.

      Laakso, J.M., J.H. Lewis, H. Shuman, and E.M. Ostap. 2008. Myosin I can act as a molecular force sensor. Science. 321:133-136.

      Leidel, C., R.A. Longoria, F.M. Gutierrez, and G.T. Shubeita. 2012. Measuring molecular motor forces in vivo: implications for tug-of-war models of bidirectional transport. Biophys J. 103:492-500.

      Marko, J.F., and E.D. Siggia. 1995. Stretching DNA. Macromolecules. 28:8759-8770.

      Nicholas, M.P., F. Berger, L. Rao, S. Brenner, C. Cho, and A. Gennerich. 2015. Cytoplasmic dynein regulates its attachment to microtubules via nucleotide state-switched mechanosensing at multiple AAA domains. Proc Natl Acad Sci U S A. 112:6371-6376.

      Purcell, E.M. 1977. Life at low Reynolds Number. Amer J. Phys. 45:3-11.

      Pyrpassopoulos, S., H. Shuman, and E.M. Ostap. 2020. Modulation of Kinesin's Load-Bearing Capacity by Force Geometry and the Microtubule Track. Biophys J. 118:243-253.

      Rai, A.K., A. Rai, A.J. Ramaiya, R. Jha, and R. Mallik. 2013. Molecular adaptations allow dynein to generate large collective forces inside cells. Cell. 152:172-182.

      Ramaiya, A., B. Roy, M. Bugiel, and E. Schaffer. 2017. Kinesin rotates unidirectionally and generates torque while walking on microtubules. Proc Natl Acad Sci U S A. 114:10894-10899.

      Rao, L., F. Berger, M.P. Nicholas, and A. Gennerich. 2019. Molecular mechanism of cytoplasmic dynein tension sensing. Nature communications. 10:3332.

      Smith, S.B., L. Finzi, and C. Bustamante. 1992. Direct mechanical measurements of the elasticity of single DNA molecules by using magnetic beads. Science. 258:1122-1126.

      Sudhakar, S., M.K. Abdosamadi, T.J. Jachowski, M. Bugiel, A. Jannasch, and E. Schaffer. 2021. Germanium nanospheres for ultraresolution picotensiometry of kinesin motors. Science. 371.

      Toleikis, A., N.J. Carter, and R.A. Cross. 2020. Backstepping Mechanism of Kinesin-1. Biophys J. 119:1984-1994.

      Urbanska, M., A. Ludecke, W.J. Walter, A.M. van Oijen, K.E. Duderstadt, and S. Diez. 2021. Highly-Parallel Microfluidics-Based Force Spectroscopy on Single Cytoskeletal Motors. Small. 17:e2007388.

      Wang, M.D., H. Yin, R. Landick, J. Gelles, and S.M. Block. 1997. Stretching DNA with optical tweezers. Biophys J. 72:1335-1346.

    1. Author Response:

      eLife Assessment

      The nematode C. elegans is an ideal model in which to achieve the ambitious goal of a genome-wide atlas of protein expression and localization. In this paper, the authors explore the utility of a new and efficient method for labeling proteins with fluorescent tags, evaluating its potential to be the basis for a larger, genome-wide effort that is likely to be very useful for the community. While the evidence for the method itself is solid, carrying out this project at a large scale will require significant additional feasibility studies.

      We appreciate the editor’s recognition that the evidence for our method is solid and that a genome-wide protein atlas in C. elegans would be highly valuable to the community. However, we respectfully disagree that significant additional feasibility studies are required. As comparison, the yeast proteome-wide GFP tagging project (Huh et al., Nature 2003) achieved ~75% coverage of ~6,000 proteins directly from an established protocol without any prior significant feasibility studies, at least to our knowledge. While the C. elegans genome is 3 times in size, we would argue that our tagging protocol may even be less labor intensive as it does not involve any cloning and the screening is visual, requiring no molecular biology skills. Reviewer 3 notes: “They also provide convincing evidence that labelling the whole proteome is an achievable goal with relatively limited resources and time.”

      Our pilot study validates all key parameters for genome-wide scaling: editing efficiency at novel loci with untested reagents, viability of tagged worms, and detectability of multiple spectrally separated fluorophores across expression ranges. These address the core technical, biological, and practical challenges of large-scale endogenous tagging in a multicellular organism, leaving no fundamental barriers in our view.

      The proposed cost and timeline align quite favorably with established large-scale consortium projects: e.g., ENCODE pilot analyzed 1% of the human genome at ~$55 million over 4 years; Mouse Knockout Consortium scaled to ~20,000 genes over 20 years (ongoing) with ~$100 million; Human Protein Atlas mapped ~87% of proteins with antibodies in fixed cells (through much more labor intensive methods) over 20+ years at >$100 million. With ~8% of C. elegans genes already tagged (WormTagDB), scaling our protocol to the proteome is feasible, potentially covering the genome in 5-6 years by a single lab or faster with distributed effort at a reagent cost of merely $2.2 million. The main barriers now are funding commitment and assembling collaborators, not further feasibility testing.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Eroglu and Hobert demonstrate that injecting CRISPR guides and repair constructs to target three genes at a time, tagging each with a different fluorescent protein, and selecting which gene to tag with which fluorophore based on genes' expression levels, can improve the efficiency of gene tagging.

      Strengths:

      This manuscript demonstrates that three genes can be targeted efficiently with three different fluorophores. It also presents some practical considerations, like using the fluorophore least complicated by agar/worm autofluorescence for genes with low expression levels, and cost calculations if the same methods were used on all genes.

      Weaknesses:

      Eroglu has demonstrated in a previous publication that single-stranded DNA injection can increase the efficiency of CRISPR in C. elegans while inserting two fluorescent proteins and a co-CRISPR marker into three loci. The current work is, therefore, an incremental advance. In general, I applaud the authors' willingness to think ahead to how whole proteome tagging might be accomplished, but I predict that the advance here will be one of many small advances that will get the field to that goal.

      Our manuscript indeed builds on prior multiplex editing (including our own co-CRISPR work), but the manuscript's primary contribution is not a novel technical breakthrough per se. Instead, our main goal was to pilot and strategize a feasible path to whole-proteome tagging in C. elegans and importantly test the following key parameters: (1) success rate of triple pools with prior untested reagents at novel targets; (2) utility of fluorophores across expression levels; (3) major effects on tagged protein function. In prior multiplexing, we used two targets which we already knew could be edited quite efficiently, with the 3rd target a point mutation with nearly 100% efficiency. Thus, it was not at all clear that picking 3 random genes and replacing the 3rd highly efficient locus with another less efficient large insertion would work or be sufficiently scalable for thousands of novel genes with unvalidated reagents at first pass.

      The title vastly oversells the advance in my view, and the first sentence of the Discussion seems a more apt summary of the key advance here.

      Some injections target genes on the same chromosome together, which will create unnecessary issues when doing necessary backcrossing, especially if the mutation rate is increased by CRISPR.

      We disagree with the reviewer’s assessment of the need for backcrossing, for two reasons: (1) Prior studies have shown that off-target mutations are not a serious concern in C. elegans (reviewed in PMID: 26336798 and PMID: 24685391). For instance, WGS of strains after CRISPR/Cas9 found negligible off-target effects (PMID: 25249454, PMID: 30420468 – using similar RNP/ssDNA method and multiple guides; PMID: 23979577, PMID: 27650892 using other methods). Targeted sequencing studies have reported similar findings, using various CRISPR/Cas9 methods, with essentially no mutations at sites other than the intended target (PMID: 23995389; PMID: 23817069). (2) If the goal is to tag the entire genome, the introduction of backcrossing should not reasonably be a routine part of the initial tagging.

      Lastly, if one wants to backcross at a later stage, the existence of tags on the same chromosome is actually an advantage because it permits selection for recombinants with wild-type chromosomes.

      Also, the need for backcrossing and perhaps sequencing made me wonder if injecting 3 together really is helpful vs targeting each gene separately, since only 5 worms need to be injected.

      Apart from our disagreement regarding backcrossing, we are puzzled by the reviewer’s comment that tagging each gene separately may not be considered helpful. Why would one do single tagging at a time, rather than triple tagging if the whole point of the paper is to demonstrate the scalability of tagging? Meaning, that one can shortcut tagging all genes by a factor of 3 through joint tagging? It is important to keep in mind that the rate limiting step for tagging the whole genome is the number of injections that can be done per day. Since there is no cloning to generate the repair templates/guides and all other reagents are commercially available and not sample specific, these can be prepared quite rapidly. Being able to isolate multiple lines (together or independently) from the same injection increases throughput 3-fold and in our view does not provide any disadvantages as individual tags can be isolated independently if desired.

      Beyond the numerous technical advantages pooling provides (also lower cost and throughput for making injection mixes as well as imaging), our results show that it yields epistemic benefits as well: we would never have noted the subcellular pattern in Fig. 6B, C with different sets of mitochondria being marked by different mitochondrial proteins had we imaged them separately or even aligned to a pan-mitochondrial landmark. As we mentioned in the discussion, grouping proteins predicted to localize to the same compartment together can simultaneously test how uniform or differentiated such compartments are during the screen.

      The limited utility of current blue fluorescent proteins makes me wonder if it's worth using at all at this stage, before there are better blue (or far red) fluorescent proteins.

      We do not think that the utility of current BFPs is very limiting. The theoretical brightness of mTagBFP2 is comparable to that of EGFP (PMID: 30886412), which was useful for the bulk of currently tagged proteins. Due to modestly higher autofluorescence in the blue spectrum, the practical brightness is somewhat less ideal, but we have shown that many proteins are expressed high enough to be detected quite well with mTagBFP2 by eye at low magnification. We also note that many tags that are not visible by eye under a dissection scope become visible with long exposure cameras of widefield microscopes or modern confocal (GaAsP) detectors, so the list of genes detectable with mTagBFP2 is likely to be much higher. We routinely use mTagBFP2 to super-resolve subnuclear structures with endogenous tags (e.g., in the nucleolus), with some tags having lower annotated FPKMs than the genes tested here.

      Some literature reviews, particularly in the Introduction and Abstract, rely too much on recent examples from the authors' laboratory instead of presenting the state of the field. I'd like to have known what exactly has been done with simultaneous injection targeting multiple loci more thoroughly, comparing what has been accomplished to date by various laboratories' advances to date.

      We are not sure what the reviewer is referring to when bemoaning that the Abstract and Introduction are too focused on our paper and not presenting the state of the field. In the Abstract, we do not refer to any literature. In the Introduction, we cite 28 papers, 6 of those from our lab (4 of which providing examples of protein tags). We do not believe that this can be fairly called an unbalanced presentation of the state of the field.

      This being said, we will gladly expand our Introduction to provide more background on co-CRISPRing. Labs have routinely used co-conversion (“coCRISPR”) markers for picking out their intended edits (e.g., point mutations or insertions), as it has been shown by multiple groups that a CRISPR/Cas9 edit at one locus correlates with efficiency at other simultaneous targets (PMID: 25161212). Generally, making point mutations with the Cas9/RNP protocol is highly efficient, especially at specific loci such as dpy-10. However, multiple FP-sized insertions have not been routinely attempted. We and only one other group have successfully attempted it using previously working targets and reagents (e.g., 28% in PMID: 26187122). Importantly, the efficiency of such multiple insertions has never been assessed at scale and using entirely untested reagents at novel sites – critical parameters to determine for a whole genome approach. So, we test here (1) the efficiency of triple insertions and (2) the chance of getting them with new and untested guides and reagents.

      In our view, since we have to use some injection/coCRISPR marker anyway for those genes which are not expressed at dissecting-scope visible levels (likely most genes), using highly expressed intended targets as improvised markers in a pooled approach makes our approach much more efficient. It allows us to find the worms with the highest chance of yielding CRISPR insertions, which we can screen with higher power methods for the dimmer targets, while enabling us to co-isolate other intended targets. Insertions, being often heterozygous in F1, can be segregated independently if desired, or homozygosed together to facilitate maintenance then outcrossed individually by those interested in studying specific genes in more detail.

      In the revised version of this manuscript, we will discuss some of these points in the first paragraph of the results section:

      “In C. elegans, screening for novel CRISPR/Cas9-induced genomic edits is facilitated either by use of co-injection markers (i.e., plasmids that form extrachromosomal arrays) that yield phenotypes or fluorescence in progeny of successfully injected worms, or co-editing well characterized loci using established and highly efficient reagents which likewise yield visible phenotypes. In the latter approach, termed “co-CRISPR”, worms edited at the marker locus are most likely to also carry the intended edit (Arribere et al., 2014).”

      “These attempts pooled reagents previously established to work efficiently and targeted genes that were known to yield functional fusion proteins when tagged. Thus, while in principle current methods could allow tagging of at least 3 independent loci in one injection if a co-CRISPR marker is omitted, it is not known to what extent such an approach could be generalized across the genome with previously unvalidated reagents (i.e., guides and repair template homology arms) at novel loci.”

      Reviewer #2 (Public review):

      The manuscript by Eroglu and Hobert presents a set of strains each harboring up to three fluorescently tagged endogenous proteins. While there is technically nothing wrong with the method and the images are beautiful, we struggled to appreciate the advance of this work - who is this paper for?

      We consider this paper to have two purposes: (1) motivate the community to come together to consider such genome-wide tagging approach; (2) provide a reference point for funding agencies that such an aim is not unreasonable and will provide novel interesting insights.

      As a technical method, the advance is minimal since the first author had already demonstrated that three mutations (fluorophore insertion and co-CRISPR marker) could be introduced simultaneously.

      We agree that the basic principle is similar. However, it was not clear that triple pooling three novel large edits would work, given the numbers in our original paper or that it would be scalable.

      The dpy-10 coCRISPR marker previously used is a highly efficient single site, with close to 100% hit rate. We also knew in the earlier study that the two pooled insertions already worked quite efficiently and did not disrupt the function of targeted proteins. Exchanging these plus dpy-10 for three novel tags was not guaranteed to succeed for many potential reasons, including both biological and technical. For instance, such a “marker free” approach necessitates that a significant number of targets in the genome should be expressed highly enough to be visible by fluorescence stereomicroscopy when tagged with current best fluorophores. The chance of disrupting gene function by tagging was also not explored in detail in C. elegans, nor whether one untested guide is generally sufficient. We think that establishing these parameters was meaningful and necessary for the goal of whole genome tagging. We have clarified some of these points in the text.

      As a pilot for creating genome-scale resources, it is not clear whether three different fluorophores in one animal, while elegantly designed and implemented, will be desired by the broader community.

      The usage of three different fluorophores is largely driven by the ability to co-inject and therefore cut injection effort by a factor of three. Moreover, having all three fluorophores together facilitates imaging and maintenance. Lastly, co-labeling has the potential to reveal unexpected patterns of co-localization or lack thereof (example: two mitochondrial proteins that we found to not have overlapping distribution). We clarified this point in the revised text in both the results and discussion.

      Finally, the interpretation of the patterns observed in the created lines is somewhat lacking. A Table with all the observations must be included. This can replace the descriptions of the observations with the different lines, which could be somewhat laborious for the reader, and are often wrong. There are numerous mistaken expectations of protein expression here, but two examples include:

      We are not convinced that expectations are mistaken. Below we respond to the reviewer’s specific examples and we are open to hear from the reviewer about additional cases.

      (1) The expectation that ACDH-10 is enriched in the intestine and epidermal tissues (hypodermis).

      There are multiple paralogs of this protein (see WormPaths or WormFlux) that may share functions in different tissues. There is also no reason to assume that fatty acid metabolism does not occur in other tissues (including the germline). Finally, there are no published studies about this enzyme, so we really don't know for sure what it's doing.

      The expression of acdh-10 is annotated in multiple scRNA datasets as intestine and epidermal enriched (Packer et al 2019, highest intestine and hyp; Ghaddar et al 2023 intestine, sheath and BWM, and even oocyte). We did not mean to imply that fatty acid metabolism does not occur in the gonad, nor that a paralog of acdh-10 could not be performing the same function in tissues where acdh-10 is not expressed.

      However, this raises an important question: why have different paralogs doing the same thing? Duplicate genes with the same function are generally not evolutionarily stable (PMID: 11073452, PMID: 24659815). That there are such striking tissue specific expression patterns of an essential or widely expressed protein class suggests that paralogs of the gene likely differ in some meaningful parameter that might align with tissue-specific functional needs or regulation. The reviewer’s statement that “there are no published studies about this enzyme, so we really don't know for sure what it's doing” is in fact an excellent demonstration of our point; finding out where the duplicates are expressed can provide a starting point to uncover potential differences between the paralogs. At the very least it can delineate to what degree paralogs diverge in their expression across the proteome and identify which such cases merit further study. In a more ideal scenario, prior information of protein function could indicate that the involved pathway requires tissue specific regulation.

      (2) The expectation that HXK-1 is ubiquitously expressed.

      Three paralogous enzymes are all associated with the same reaction, and we have shown that these three function redundantly in vivo, perhaps in different tissues (PMID: 40011787).

      The cited paper (PMID: 40011787) does not show where they are expressed. We discussed redundancy/paralogs above in point 1, and in our view the same applies here. They may perform the same reaction but are likely to differ in some meaningful way, be it regulation or rate of activity, for them to be stably maintained as functional genes over evolution.

      Moreover, single-cell RNA-seq data (PMID: 38816550) also show enrichment of hxk-1 in gonadal sheath cells.

      We note that the Ghaddar et al. and CeNGEN/Taylor et al. datasets do not. The scRNA paper cited by the referee (PMID: 38816550) also shows enrichment in neurons and pharynx, which we did not note. In our view, these in fact further support our goals: often, transcript datasets alone (frequently used to infer tissue function) do not sufficiently predict protein expression. One can post hoc find an scRNA-seq dataset that aligns somewhat with our protein observations, but how does one know which to trust a priori? Disagreements between transcript datasets will ultimately require resolution at the protein level, in our view.

      To clarify these points, we will add the following to the discussion section:

      “We also noted unexpected cell type dependent distributions of proteins involved in broadly important metabolic processes such as ACDH-10, which was depleted from the germline compared to other tissues, and HXK-1, which was highly enriched in the gonadal sheath. Notably, for these as well as other cases, scRNA-seq datasets were not sufficient to deduce a priori the observed cell type specific differences at the protein level. Importantly, many genes encoding metabolic enzymes including acdh-10 and hxk-1 have paralogs that likely perform similar catalytic functions. Yet, duplicate genes with identical functions are generally not evolutionarily stable (Adler et al., 2014; Lynch and Conery, 2000); thus such genes are likely to differ in some meaningful parameter (e.g., regulation or activity) that might align with tissue-specific functional needs. Fully annotating the expression patterns of paralogs at the protein level could indicate which tissues require unique metabolic needs and indicate which paralogous genes have undergone sub- versus neo-functionalization. For those proteins that are less functionally understood, unexpected distributions might indicate which merit further study.”

      The table should have at least the following information: gene/protein name - Wormbase ID - TPM levels of single cell data assigned to tissues for L2, L4, and adult (all published) - tissues in which expression is observed in the lines presented by the authors.

      We will add this information to the table including annotated expression levels in young adults from various datasets (but not larval datasets as we did not image these). We note that each of these studies use different pipelines and report different metrics (scaled TPM/Z-score versus Seurat average expression versus TPM), so comparisons between them are not informative unless they are integrated and analyzed together.

      Reviewer #3 (Public review):

      Summary:

      The authors argue that establishing the expression pattern and subcellular localisation of an animal's proteome will highlight many hypotheses for further study. To make this point and show feasibility, they developed a pipeline to knock in DNA encoding fluorescent tags into C. elegans genes.

      Strengths:

      The authors effectively make the points above. For example, they provide evidence of two populations of mitochondria in the C. elegans germline that differ qualitatively in the proteins they express. They also provide convincing evidence that labelling the whole proteome is an achievable goal with relatively limited resources and time.

      We are grateful for the referee’s appreciation that whole proteome tagging is feasible.

      Weaknesses:

      Cell biology in C. elegans is challenging because of the small size of many of its cells, notably neurons. This can make establishing the sub-cellular localisation of a fluorescently tagged protein, or co-localizing it with another protein, tricky. The authors point out in their introduction that advances in light microscopy, such as diSPIM, STED, and ISM (a close relative of SIM), have increased the resolution of light microscopy. They also point out that recent advances in expansion microscopy can similarly help overcome the resolution limit.

      (1) Have the authors investigated if the three fluorescent tags they use are appropriate for super-resolution microscopy of C. elegans, e.g., STED or SIM? Would Elektra be better than mTAGBFP2? How does mScarlet3-S2 compare to mScarlet 3?

      All three tags work for ISM (i.e., Airyscan). We previously tried Electra (not for the genes tested here) but could not isolate positive tags. Given Electra is not that much brighter on paper than mTagBFP2 we did not pursue it further, though we recognize that these may simply have been unlucky injections. mScarlet3-S2 is quite a bit dimmer than mScarlet3 on paper – the advantage is that it has higher photostability. In our view, the limiting factor will be having FPs that are bright enough to screen, image and scale to the whole genome, so brightness will likely provide an advantage over photostability at this stage.

      (2) Have the authors investigated what tags could be used in expansion microscopy - that is, which retain antigenicity or even fluorescence after the protocol is applied? It may be useful to add different epitope tags to the knock-in cassettes for this purpose.

      mSG and mSc3 retain fluorescence after fixing with formaldehyde. We have not tested mTagBFP2 fluorescence in fixed worms. We agree that adding different epitope tags would be useful.

      The paper is fine as it stands. The experiments above could add value to it and future-proof it, but are not essential. If the experiments are not attempted, the authors could refer to the points above in the discussion.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript describes the pattern of relaxed selection observed at spermatogenesis genes in gorillas, presumably due to the low sperm competition associated with single-male polygyny. The analyses to detect patterns of selection are very thorough, as are the follow-up analyses to characterize the function of these genes. Furthermore, the authors take the extra steps of in vivo determination of function with a Drosophila model.

      This is an excellent paper. It addresses the interesting phenomenon of relaxation of selection as a genomic signal of reproductive strategies using multiple computational approaches and follow-up analyses by pulling in data from GO, mouse knockouts, human infertility database, and even Drosophila RNAi experiments. I really appreciate the comprehensive and creative approach to analyze and explore the data. As far as I can tell, the analyses were performed soundly and statistics are appropriate. The Introduction and Discussion sections are thoughtful and well-written. I have no major criticisms of the manuscript.

      We thank you for your kind words!

      The main area that I would suggest for improvement is in the "Caveats and Limitations" section of the Discussion. Currently, the first paragraph of this section states the obvious that genetic manipulation of gorillas is not feasible. Beyond a reminder to the reader that this was a rationale for the Drosophila work, it isn't really adding much insight. The second paragraph is a brief discussion of the directionality of change. I think it comes across as overly simplistic, with a sort of "well, we can never know" feel. Obviously, there are plenty of researchers who do model change to infer direction and causation, and there are plenty of published papers attempting to do so with respect to mating systems in primates.

      We understand these statements might seem trivial, but they are meant to fully acknowledge, particularly to non-evolutionary biologists, the fact that we can’t do the genetics to “prove” these putatively deleterious mutations really are so (hence the statement about forward/reverse genetic experiments), nor causation (since this mating system evolved once in the history of gorillas we cannot know directionality in this lineage, although we could infer it if we had species in which different stages were extant, for example).”

      I do not think the authors need to remove these paragraphs, but I do encourage them to turn the "Caveats and Limitations" section into something more meaningful by addressing limitations of the work that was actually done rather than limitations of hypothetical things that were not done. A few areas come to mind. First, the authors should discuss the effect of gene-tree vs species-tree inconsistencies in the analyses, which could affect the identification of gorilla-specific amino acid changes and/or the dN/dS estimates. Incomplete lineage sorting is very common in primates including the gorilla-chimp-human splits (Rivas-González et al. 2023). It would be nice to hear the authors' thoughts on how that might affect their analyses. Second, the dN/dS-based analyses assume the neutrality of synonymous substitutions. Of course, that assumption is not completely true; it might be true enough, and the authors should at least note it as a caveat. Third, and potentially related, is the consideration that these protein-coding genes may be functioning in other ways such as via antisense transcription. The genes under relaxed selection may be on their way to becoming pseudogenes and evolving as such at the sequence level, but many pseudogenes continue to be transcribed sense or anti-sense in a regulatory purpose. I don't think there is a way to incorporate this into the authors' analyses but it would be nice to see it acknowledged as a caveat or limitation.

      We thank you for the helpful suggestion and have added a discussion of these issues in the reworked Caveats and limitations section (lines 639 - 710).

      Reviewer #1 (Recommendations for The Authors):

      This is an excellent paper with thorough and creative approaches to address an interesting connection between genotype and phenotype. Stylistically the paper is very well written.

      We thank you for your kind words.

      Page 3: I suggest deleting the word "vaginal" so the sentence reads "... the evolution of female traits such as anatomical features that allow female control...". Most of the well-documented examples of cryptic female choice are in animals that do not have vaginas like insects, fish, and birds, including the reference given at the end of the sentence (Brennan et al. 2007 on waterfowl).

      We agree and have made this edit.

      Page 3: I would delete the words "multimale-multifemale" when discussing gorillas, to make the sentence read "Most gorillas, for example, live in groups with age-graded...". The use of "multimale-multifemale" here is not exactly wrong, but can be confusing to the reader since the authors essentially use "multimale-multifemale" as a synonym for "polygamous" in the previous paragraph.

      We agree and have made this edit.

      The writing in the Materials and Methods fluctuates between present and past tense. The authors should pick a consistent style, probably past tense by convention.

      We have edited the Materials and Methods only to use past tense.

      "Drosophila" is italicized sometimes, but not sometimes not. Make consistent.

      To ensure consistency, italics were used only when genus and species were shown together (i.e., Drosophila melanogaster).

      In the main text, a few reference typos/confusions:

      Box 1, Figure 1B caption: I believe this "Dixson, n.d." reference should be Dixson (2009), if it refers to the book (Oxford Press).

      Yes, that is the case. Thank you for having spotted this. The reference has been corrected.

      Page 21: The authors use the term "false exons" and "fake exons" in the same paragraph. Are these the same thing? If so, just use "false exons" both times.

      These are the same, we have changed fake to false.

      Page 22-23, maybe elsewhere: The Smith et al. reference includes Martin's first name.

      Thank you for bringing this issue to our attention. The reference has been corrected.

      Page 25: in the parenthetical listing of scientific species names, the word "and" should not be italicized. In this same section, there's really no reason to include "gorilla" as the subspecies. It isn't given for the other species.

      Corrected.

      Page 27: Missing period in the second paragraph after "(Guyonnet et al. 2012)".

      Corrected.

      Page 29: Should read "... available in gnomAD that would allow us to exclude..." (or possibly "... available in gnomAD that would allow the exclusion of ...").

      Corrected.

      Page 33, figure legend off Appendix Figure 1A: "gray line" not "gray liner".

      Corrected.

      Box 1, Figure 1A: This is confusing in a few ways. First, the gorilla red dot is labeled "Gorilla", but the chimpanzee and bonobo dots are not labeled. Perhaps in the legend the colors could be indicated, such as "... percentage of body mass for gorilla (red), common chimpanzee (dark blue), and bonobo (light blue)"? Secondly, the bar chart shows the testes/body mass ratio but it is not clear what they are scaled to. Should there be a second y-axis on the right side of the plot?

      The bar chart showed the testis weight/body weight ratio (log), but it is not really necessary. We have removed the bar chart and labeled chimpanzees and gorillas.

      Figure 1D: I found myself confused by the vertical label of "Percent of genes with w>1 in Gorilla". Because all genes are in the stacked histogram, my first thought was that ~99% of the genes have w>1 (gray). Would be more clear if the label was the same as 1G ("Percent of genes").

      We agree and have made this change.

      The text in the figures is extremely small. I don't know what it will look like once it is fully formatted for publication, so I'll leave those concerns to the editor/publisher.

      We will wait until the proofs to determine if this figure needs to be split into multiple figures with larger text.

      References in the reference section need a LOT of cleaning up. It does not appear that any manual editing was done. Please check for consistency in capitalization, italicization, abbreviations, missing information, etc. The level of neglect to this section is frankly unprofessional.

      I (VJL) apologize for this; it is entirely my fault. To explain but not justify, I have dyslexia, and the shifting combination of text, numbers, punctuation, fonts, and font styles makes it difficult to see the inconsistencies. To mitigate this, I use a reference manager to format references (like everyone else) and almost always have someone proofread the reference section, but I didn’t do that with this manuscript. I apologize for the oversight. My dedicated co-authors have cleaned the reference section.

      Reviewer #2 (Public Review):

      As outlined in the public review, this is a nicely executed molecular evolutionary study. The analyses and overall patterns described in gorillas appear rigorous and convincing. The fundamental limitation here is a lack of comparative context to specifically establish the connection to mating system or the uniqueness of these overall patterns to gorillas.

      We thank the reviewer for the compliments. However, there is some confusion about the hypothesis we tested. We hypothesized that genes involved in male reproductive biology would have relaxed selective constraints in gorillas because of their mating system, not that polygynous mating systems would lead to relaxed selection. While that may be true, it is not the hypothesis we tested, nor do we state that the overall pattern we observe is unique to gorillas. Our data, however, support our claims: 1) We performed an unbiased selection scan in gorillas and identified genes with K<1, an evolutionary signature of reduced selection intensity; 2) We found that those genes were enriched for male reproductive functions; and 3) Some of those genes had effects on male reproduction in both Drosophila screens and in infertile men. These are the results one would expect if our hypothesis were true.

      To partly address the concern that our results do not have a connection to mating systems or may be an overall pattern rather than a gorilla-specific one, we ran RELAX using the same dataset but in the elephant seal, another species with a highly polygynous mating system. Although elephant seals are a polygynous species, they differ from gorillas in that their spermatogenesis does not undergo persistent deterioration, but instead follows a seasonal pattern. According to the comprehensive study by Laws (The Elephant Seal (Mirounga Leonina Linn.): III. The physiology of reproduction; Scientific Reports, 15, Falkland Islands Dependencies Survey, 1956], male gamete production is upregulated during the mating season and is mostly inactive throughout the rest of the year. Of the 573 genes with K<1 in gorillas only 14 also have K<1 in elephant seals, which had 350 genes with K<1. A GO analysis of the 350 elephant seal K<1 genes does not identify enrichment in spermatogenesis-related terms. In fact, the list of GO terms is quite broad. A potential, if admittedly speculative, interpretation of these findings is that although polygynous, the selective pressure on elephant seal spermatogenesis is not relaxed (unlike in gorillas) because of the seasonal nature of their mating period. In other words, by having a temporally narrower window for reproductive success than gorillas, the selective constraint on male gametogenesis in seals is not weakened. Regardless, the low overlap in relaxed genes between the two tested polygynous species support the view that this reproductive strategy is probably associated with different evolutionary signatures in the genome (depending on the species), a likely reflection of the complex, nuanced and multi-factorial aspects of such strategies. We include this analysis in the Appendix (lines 1112 - 1132).

      While there is much that I like about the study and approach, this is a substantial shortcoming that really limits the significance of the, especially given that lineage specific patterns were also analyzed by Scally et al. (2012) over a decade ago.

      While Scally et al. (2012) reported the initial sequencing, assembly, and analyses of the gorilla genome, the method they used to characterize selective pressure on coding genes - the branch and branch-site model implemented in PAML - is misspecified to detect relaxed selection (PMID: 25540451). Under relaxed selection, the d<sub>N</sub>/d<sub>S</sub> of sites under purifying selection will move towards 1, the d<sub>N</sub>/d<sub>S</sub> of sites under positive selection will also move towards 1, and some sites will not experience a change in d<sub>N</sub>/d<sub>S</sub>. The PAML test used Scally et al. (2012) averages d<sub>N</sub>/d<sub>S</sub> across all sites, rather than having distinct rate categories for each of the three selection classes. A change in d<sub>N</sub>/d<sub>S</sub> toward 1 under the PAML model can arise because the strength of positive selection is weaker in the foreground lineage than the background lineage, even if there is still positive selection acting on some sites. Averaging across all sites also means there is little power to detect relaxed selection, even if it is relaxed selection. Furthermore, the PAML test used by Scally et al. (2012) is underpowered to detect relaxed selection because it depends on selective regimes in background species. Scally et al. (2012) also used six species, which underpowers their test of relaxation, because if one or more of those species experience an increase in their d<sub>N</sub>/d<sub>S</sub> rate, the background rate will increase giving the appearance of a decrease in the gorilla lineage even if its d<sub>N</sub>/d<sub>S</sub> rate has not changed. We elaborate on this in the Appendix section (lines 1036 - 1073). Finally the method implemented in PAML does not allow for synonymous rate variation across sites or multi-nucleotide mutations per codon, ignoring synonymous rate variation dramatically inflates the false positive rates in selection tests (PMID: 32068869) as does ignoring multi-nucleotide mutations (PMID: 29967485 and PMID: 37395787); we have added a discussion of these issues in our Caveats and limitations section (lines 683 - 710).

      Reviewer #2 (Recommendations for The Authors):

      Specific comments

      Framing: Overall, the connection between mating system is referred in variable levels of certainty, some appropriate, others overstated. The paper title uses 'coincident' which is appropriate, but also at odds with the stronger conclusions that are emphasized throughout. Elsewhere the phrasing is much stronger (abstract, discussion) implying a direct statistical association with mating system variation that has not been established. Elsewhere the term 'association' is used in the same manner, but in instances where a statistical association is tested and demonstrated (tests of enrichment, etc).

      We are unsure why the Reviewer considers our claims overstatements. The patterns of molecular evolution we found are ‘associated,’ and 'coincident with,' and we believe our results are ‘compelling’. Our tests for relaxed and positive selection are statistically associated with a polygynous social system which we a priori hypothesized. We have taken care to ensure a more consistent framing of this connection throughout the manuscript to avoid potential misinterpretations of causality.

      Page 7, elsewhere- It is essential to compare the reported patterns (percentage of relaxed genes in gorilla, patterns of enrichment, etc) to other primate lineages to identify if this number is enriched due to mating system or if these patterns are unusually for sperm genes across mammals. The implication here and throughout is that the specific pattern reflects specific aspects of gorilla mating biology, but this is never established. Additionally, it would be interesting to know the relative number of genes under positive selection across species (or across great apes).

      We agree that if we were using a PAML-like approach that these controls would be informative. But with the RELAX method the foreground K is compared to the background K, K only becomes significantly less than one if there is relaxing in the intensity of selection in the foreground. If these patterns were common to sperm genes across mammals the background and foreground K would not be significantly different. Our a priori hypothesis was that genes related to male reproductive biology would show evidence of a decrease in the intensity of selection (both positive and purifying), which we tested and found to be true. In this regard, we can conclude that the gorilla mating system is associated with patterns of molecular evolution in the species’ genome.

      While we too would find it interesting to know the relative number of genes under positive selection across species (or across great apes), that is not the study we performed and is beyond the scope of this one (and we only identified 96 genes that were positively selected in gorilla suggesting that few genes are positively selected across species).

      Page 8, bottom, elsewhere- "13,491 background set" elsewhere this is 13,310 (abstract). The number of genes here is different, and the set seems to change across multiple parts of the paper without explanation. This could be a simple typo, however, it may affect statistical analysis if the problem is widespread, especially when assessing enrichment of (presumably) small sets of genes.

      This is partly true and partly a typo. We generated 13,491 alignments, 13,310 of which had HUGO gene symbols. These 13,310 genes were used in all subsequent studies. We have re-written the text to clarify this point, and have added a statement: “We thus generated a dataset of 13,491 orthologous coding gene alignments from the genomes of 261 Eutherian mammals, corresponding to 62.7% of all protein-coding genes in the gorilla genome. Of the 13,491 alignments, 13,310 had an identifiable HUGO gene symbol and were used in all subsequent analyses (lines 158 - 162).”

      Related to this, it is difficult to determine how many genes these GO associations are based on. Even small numbers of genes can result in very significant results with these tests. How many genes are these associations based on? This connection is a key component of the overall narrative that changes in sperm competition have a large effect on genome-wide shifts.

      All analyses are based on the 13,310 genes with identifiable HUGO gene symbols, including over-representation analyses (ORA). Our dataset submitted with this manuscript includes these 13,310 genes (as well as the genes with K<1 and K>1). The number of genes used as the foreground is the 578 with K<1, these genes are given in Figure 1 – source data 3. The minimum number of genes annotated in a GO or pathway term was 3. While it is unlikely that statistically significant GO term enrichments result from a few genes annotating to each term, that scenario would produce small P-values, the false discovery rate would be high and readers can decide what false discovery they are willing to accept.

      How many of these 578 genes are plausibly related to reproduction? Apologies if I missed this detail, but Figure 3 does not convey this. Could you speak to this directly in the text and include a table or supplemental table of the GO terms to show the differences in enrichment between classes of genes, and counts per term?

      These data are included in Figure – 3 source data 1.

      One of the key results is the relative frequency of relaxed constraint versus positive selection. This is expected on some level as the form of recurrent positive directional selection detected with these models is usually relatively rare. However, it is not at all clear that it is rarer in gorillas versus other mammals, as implied.

      Our comparison of relaxed constraint to positive selection was to explore if more genes experienced one pattern of molecular evolution or the other within gorillas, we do not imply that it is rarer in gorillas than in other mammals.

      Likewise, I was wondering how the dataset itself may be biased toward this result. If I understand correctly, you are requiring very high levels of conservation (251/261 genes) for inclusion in the dataset, resulting in ~60% of all gorilla genes being included. Rapidly evolving genes that are targets of recurrent positive selection often also tend not be highly conserved across such a deep phylogenetic sample. It would be good to acknowledge this potential bias when implying meaning to the differences in relative rates of the two forms of selection.

      Our results are unlikely to be subject to this bias. The RELAX test relies on accurately estimating K in background lineages, which requires that we include as many species as possible. The tradeoff is a reduction in the number of genes included in the dataset due to evolutionary dynamics across a wide range of species. However, it's not that 40% of the genes are excluded because they are evolving so rapidly we cannot identify or align them, it mainly reflects the fact that we cannot identify the gene in 251 of the 261 species included in the dataset (due to gene loss, etc).

      Page 9 - The results here (and in Figure 3D) shows that relaxed genes are enriched broadly across spermatogenesis cell types except for Sertoli cells. But the Sertoli cells and a few non-significant cell types are the only thing to compare to. Instead, it would be interesting to identify single cell expression patterns from other tissues- or even bulk RNA as sc-RNA may be limited in the species. This would show that these genes are enriched in testis compared to other tissues, as opposed to just being broadly expressed. Additionally, the authors could compare to the other primate testis sc-RNA available in Murat et al. Without such comparisons the interpretations here seem limited.

      We did not test whether K<1 were enriched in other cell types because: 1) we had an a priori hypothesis that genes with K<1 would be enriched in cells involved in male reproduction, rather than enriched in cell types in the testis compared to any other cell type; and 2) The number of genes with K<1 is relatively small and the number of known cell-types in very large, at least one estimate points to ~400 major cell types in a higher primate (PMID: 37722043). Using a P-value of 0.05 from a hypergeometric or Fisher's exact test and a Bonferroni correction to control for multiple hypothesis testing, we would need the P-value for enrichment in any cell type to be 0.000125, which we are unlikely to achieve.

      More comprehensive functional comparisons could provide evidence that even though relaxed constraint is present in all lineages, perhaps relaxed constraints in the gorilla lineages are more related to sperm formation and function.

      The RELAX test is a relative one; while relaxed constraint may be present in other lineages, to observe a statistically significant K<1 in gorillas the degree of relaxation would have to have a greater effect size in gorilla than in other lineages.

      I was also a little unclear what to make of the interpretation of K<1 versus K >1 enrichment by cell type. The enrichment of K<1 is called out as noteworthy because this is when the spermatogenesis specific genes begin to be expressed, but then the K > 1 result is dismissed as occurring during pachytene which is a transcriptional permissive state of testis. To be clear, pachytene is also a critical checkpoint for fertility and enhanced purifying selection at this step could be reasonably interpreted as being at odds with the entire erosion of reproduction argument. This seems to be a selective interpretation for the overall narrative. Also, permissive transcription is not only limited to the pachytene stage and the relaxation of constraint concomitant with increased specificity and permissive expression during the later stages of spermatogenesis is a well-known result in mammals, and not anything that can be ascribed gorillas and their change in mating system.

      We agree with the Reviewer’s comment and have removed the K<1 versus K>1 interpretation from the manuscript.

      Page 13 - The LOF enrichment identified from this random sampling is borderline significant. An improved approach would be to perform permutations of random samplings and identify the range of significance based on 1000+ permutations.

      We have redone the burden test with population-matched groups to confirm the reliability of this association (lines 435 - 446). In addition, we now acknowledge in the Caveats and limitation section that our observations could benefit from a permutation analysis (lines 695 - 697).

      Page 17, bottom- Statements like these are overstating the correlation as the comparative analyses were not shown.

      We agree and have edited the text to avoid potential overstatements.

      This is good to include the role of female reproductive tract. Shouldn't the unbiased screen pull these out anyway? The authors did find some female GO terms enriched. What additional information or experiments would be needed to test the hypothesis of female compensation? The expectations for this should be made clearer.

      Given the nature of these putative female compensatory mechanisms (primarily acting on the oviduct and lower uterus, as speculated in lines 586 – 601), it is currently impossible to functionally test them in gorillas. The continued development of in vitro systems mimicking the female reproductive tract may allow such studies in the future.

      Page 18, middle- Pleiotropy is an important consideration and this paragraph discusses some valuable points. However, this is another section that could be improved by discussing the relaxed constraints in later spermatogenesis, which likely suggests that genes expressed in later stages are less pleiotropic and more testis- specific.

      We agree and have added a brief discussion of this in lines 619 - 622: “It is also possible that the negative consequences of deleterious pleiotropy become less pronounced at later stages of spermatogenesis as meiotic and post-meiotically expressed genes are enriched for testis-specific functions (PMID: 36544022).”

      Page 27, Bottom- The criteria for selection of genes to target here is interesting and disconnected from the claimed interpretation of the results. If you're targeting genes with reliable expression in Drosophila, it is not surprising that a percentage of them will lead to fertility loss. Shouldn't the background be a random set of testis-expressed genes? This test would show that relaxed constraint is a strong way to screen for fertility genes. Additionally, the authors previously showed that these genes were enriched in SC-rna in gorilla,- and likely other species. Suggesting that you identified genes 'lacking evidence' of a role in spermatogenesis in previous studies is misleading, when many of these genes are present in testis RNA datasets and enriched for sperm go terms. I would argue that genes found to be expressed in testis and spermatogenesis specific cell types, certainly have evidence of being involved in spermatogenesis.

      We thank you for the helpful suggestion. We have generated a new background group composed of a random set of testis-expressed genes. More specifically, by looking at previously published Drosophila testis expression data (PMID: 30249207), we randomly selected 156 genes with TPM>1 (transcript per million) and determined the percentage of them with reported spermatogenic / male fertility defects in Drosophila. We observed that 18 (11.5%) had been previously demonstrated to be functionally required for male reproductive fitness. This percentage is slightly higher than what we had previously observed for a random selection of Drosophila genes (9.6% - an update, using the latest available data, to the 7.7% reported in the original version). Nevertheless, both figures are still well below the 27.6% hit rate we found for the Drosophila orthologs of the gorilla K<1 genes. We have added this new information to the manuscript (lines 380 - 386).

      Regarding the potential correlation between expression and function in spermatogenesis, we and others have shown that the majority of the protein-coding genome is expressed during spermatogenesis in both vertebrate and invertebrate species (PMID: 39388236). Although the reasons for such widespread transcription in the male germ line are not entirely clear, it advises a cautious approach in terms of correlating expression with function. Indeed, our recent analysis of 920 genes reliably expressed in insect and mammalian spermatogenesis revealed that only 27.2% of them caused male reproductive impairment when individually silenced in the Drosophila testis (PMID: 39388236). Since genetic redundancy is a factor that needs to be taken into consideration when dealing with such a central biological process for the survival of a species, we take the more stringent approach of only considering a gene to be functionally involved in spermatogenesis if there is phenotypical evidence (from our RNAi assay or from previous publications) that its disruption is associated with spermatogenic impairment and/or abnormal fertility. We have added this clarification to the manuscript (lines 349 - 363).

      Page 17 "Our data ... suggests that gorillas may be at the lowest limit of male reproductive function that can be maintained by natural selection (at least in mammals or vertebrates)." I realize this is the speculation section, but this is a massive overstatement. There is absolutely nothing in your data or results that support this statement, nor is this supported by the extensive comparative reproductive data in mammals. For example, there are many mammalian systems that show lower metrics of reproductive function than gorillas. For example, the sperm abnormality indices in Box 1F are nowhere near as severe as found in many species that still somehow manage to reproduce.

      We agree and have edited the text to avoid potential overstatements (see above).

      Reviewer #3 (Recommendations for The Authors):

      (1) More discussion is needed as to whether their results could be explained by a reduction in effective population size in gorillas.

      Thank you for raising this important point. As you know, reduced effective population size can lead to an increased load of deleterious mutations/relaxed selection intensity. However, we do not believe that it substantially affects our observations. Indeed, relatively few genes have K<1 and those are enriched in sperm biology. Given that a reduced effective population size will plausibly increase the load of deleterious mutations and relaxed selection across many genes, it is unlikely that such a broad phenomenon would result in a specific enrichment in genes related to male reproductive biology. We have added this reasoning to the Caveats and limitations section (lines 675 - 682).

      (2) Properly controlled genetic association testing when performing a burden test is essential, and methods that allow for some variants to be associated with increased fertility should be considered. Rare variants are much more likely to show population-specific differences, and selecting humans from two potentially very different cohorts and sample sizes can easily lead to confounding. I suggest performing a principal component analysis to ascertain the degree of genetic differentiation between these cohorts, and use this to guide the selection of a subset of the control cohort as well.

      We agree and have replicated this analysis using only individuals of European descent; our conclusions have not changed but the P-values have become lower (lines 435 - 446).

      (3) Citations should also be included in Table 1, for each relevant phenotype. You may also want to consider a more general comparison of p-values and effect sizes of genome-wide association studies for human male infertility to test for an enrichment in/nearby genes showing relaxed selection along the gorilla lineage. In other words, do the relaxed genes in the gorilla lineage have an enrichment of small p-values for being associated with male infertility.

      Citations have been included in Table 1, as suggested, and the table has been updated to include the latest reported phenotypes.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study presents an interesting investigation into the role of trained immunity in inflammatory bowel disease, demonstrating that β-glucan-induced reprogramming of innate immune cells can ameliorate experimental colitis. The findings are novel and clinically relevant, with potential implications for therapeutic strategies in IBD. The combination of functional assays, adoptive transfer experiments, and single-cell RNA sequencing provides comprehensive mechanistic insights. However, some aspects of the study could benefit from further clarification to strengthen the conclusions.

      We are grateful for the reviewer’s positive assessment of our study and constructive suggestions to improve the manuscript.

      Strengths:

      (1) This study elegantly connects trained immunity with IBD, demonstrating how βglucan-induced innate immune reprogramming can mitigate chronic inflammation.

      (2) Adoptive transfer experiments robustly confirm the protective role of monocytes/macrophages in colitis resolution.

      (3) Single-cell RNA sequencing provides mechanistic depth, revealing the expansion of reparative Cx3cr1⁺ macrophages and their contribution to epithelial repair.

      (4) The work highlights the therapeutic potential of trained immunity in restoring gut homeostasis, offering new directions for IBD treatment.

      Weaknesses:

      While β-glucan may exert its training effect on hematopoietic stem cells, performing ATAC-seq on HSCs or monocytes to profile chromatin accessibility at antibacterial defense and mucosal repair-related genes would further validate the trained immunity mechanism. Alternatively, the authors could acknowledge this as a study limitation and future research direction.

      We appreciate your comments on assessing the chormoatain accessibility of HSCs induced by b-glucan training, as epigenetic reprogramming is known to be one of the underlying mechanisms for trained immunity suggest by many groups including our group. To delineate the genome-wide epigenetic reprogramming induced by β-glucan (BG), we reanalyzed publicly available chromatin profiling datasets where ATACseq of HSC from control and β-glucan trained mice was performed (accession number: CRA014389). Comparative analysis revealed HSC from BG-trained mice demonstrated pronounced enrichment at promoters and distal intergenic regions—key regulatory loci governing transcriptional activity (Fig. S7A). This divergent genomic targeting was further corroborated by distinct signal distribution profiles (Fig. S7B), supporting pronounced upregulation-driven remodeling of the epigenomic landscape induced by BG treatment. Functional annotation of these epigenetically primed promoters via GO term analysis revealed significant enrichment of immune-relevant processes, including leukocyte migration, cell-cell adhesion, and chemotaxis (Fig. S7C). Consistently, KEGG pathway analysis highlighted the enrichment of signaling cascades such as chemokine signaling and cell adhesion molecules (Fig. S7D), reinforcing the involvement of BG-induced trained immunity in inflammatory and mucosal homing pathways.

      Furthermore, promoter-centric enrichment of terms related to “defense response to bacterium” (Fig. S7E) underscored the role of BG in priming antibacterial transcriptional programs, which is a crucial axis for maintaining intestinal homeostasis. Locus-specific examination of chromatin states further validated BG-induced epigenetic modifications in the upstream regions of selected target genes, including Gbp5, Gbp2 and S100a8 and Nos2 (Fig. S7F). Collectively, our integrative reanalysis demonstrates that BG reshapes the epigenomic architecture at regulatory elements, thereby orchestrating immune gene expression programs directly relevant to IBD pathophysiology and mucosal immunity. (Line 201-211)

      Reviewer 1 (Recommendations for the authors):

      (1) It’s better to include a schematic summarizing the proposed mechanism for reader clarity.

      We appreciate your comments and proposed a graphical abstract as in Author response image 1.

      Author response image 1.

      (2) Discuss potential off-target effects of β-glucan-induced trained immunity (e.g., risk of exacerbated inflammation in other contexts).

      We appreciate this important comment regarding the potential off-target or side-effects of β-glucan induced trained immunity. As trained immunity is known to augment inflammatory responses upon heterologous stimulation and has been implicated in chronic inflammation–prone conditions such as atherosclerosis, this is an important consideration. Previous in vivo studies have shown that β-glucan pretreatment can enhance antibacterial or antitumor responses without inducing basal inflammation after one week of administration (PMID: 22901542, PMID: 30380404, PMID: 36604547, PMID: 33125892). Nevertheless, it remains possible that β-glucan–induced trained immunity could have unintended effects in certain contexts, which warrants further investigation and caution. We have discussed this potential caveat in the discussion (Lines 299-302)

      Reviewer #2 (Public review):

      Summary:

      The study investigates whether β-glucan (BG) can reprogram the innate immune system to protect against intestinal inflammation. The authors show that mice pretreated with BG prior to DSS-induced colitis experience reduced colitis severity, including less weight loss, colon damage, improved gut repair, and lowered inflammation. These effects were independent of adaptive immunity and were linked to changes in monocyte function.

      The authors show that the BG-trained monocytes not only help control inflammation but confer non-specific protection against experimental infections (Salmonella), suggesting the involvement of trained immunity (TI) mechanisms. Using single-cell RNA sequencing, they map the transcriptional changes in these cells and show enhanced differentiation of monocytes into reparative CX3CR1<sup>+</sup> macrophages. Importantly, these protective effects were transferable to other mice via adoptive cell transfer and bone marrow transplantation, suggesting that the innate immune system had been reprogrammed at the level of stem/progenitor cells.

      Overall, this study provides evidence that TI, often associated with heightened inflammatory programs, can also promote tissue repair and resolution of inflammation. Moreover, this BG-induced functional reprogramming can be further harnessed to treat chronic inflammatory disorders like IBD.

      Strengths:

      (1) The authors use advanced experimental approaches to explore the potential therapeutic use of myeloid reprogramming by β-glucan in IBD.

      (2) The authors follow a data-to-function approach, integrating bulk and single-cell RNA sequencing with in vivo functional validation to support their conclusions.

      (3) The study adds to the growing evidence that TI is not a singular pro-inflammatory program, but can adopt distinct functional states, including anti-inflammatory and reparative phenotypes, depending on the context.

      We are grateful for your positive assessment of our study and recognition of its translational implications. We particularly appreciate the acknowledgment that our work expands the therapeutic potential of β-glucan–mediated trained immunity in ameliorating colitis.

      Weaknesses:

      (1) The epigenetic and metabolic basis of TI is not explored, which weakens the mechanistic claim of TI. This is especially relevant given that a novel reparative, antiinflammatory TI program is proposed.

      We appreciate your valuable comment highlighting the importance of the epigenetic and metabolic basis of TI in providing mechanistic insight. While previous studies, including work from our group (S.-C. Cheng), have extensively characterized the epigenetic and metabolic signatures of monocytes from BG-trained mice—primarily in the context of inflammatory genes—we acknowledge that these aspects are not directly addressed in our current manuscript as the current manuscript was aimed to build on the foundation of β-glucan-induced trained immunity established by many other groups including us and address its potential as a therapeutic approaches in the colitis setup.

      That being said, we fully agree with your comments to analyze the epigenetic profile on key pathways similar to the question raised by reviewer 1, we reanalyze the relevant public datasets and presenting summarize the finding in Supplementary Figure S7. ATAC-seq analysis further validated and provide the epigenetic basis of the enhanced inflammatory and antibacterial capacity of monocytes which are seeded back in the HSC compartment.

      (2) The absence of a BG-only group limits interpretation of the results. Since the authors report tissue-level effects such as enhanced mucosal repair and transcriptional shifts in intestinal macrophages (colonic RNA-Seq), it is important to rule out whether BG alone could influence the gut independently of DSS-induced inflammation. Without a BG-only control, it is hard to distinguish a true trained response from a potential modulation caused directly by BG.

      We thank the reviewer for this important suggestion. Although we did not perform qPCR for mucosal repair genes in Figure S1C and Figure S1D, our colon RNA-seq analysis in Figure 5G included a BG-only control group (Colitis_d0). These results indicate that BG preconditioning alone does not alter baseline expression of colon mucosal repair genes, supporting the conclusion that the observed effects occur in the context of DSS-induced inflammation.

      (3) Although monocyte transfer experiments show protection in colitis, the fate of the transferred cells is not described (e.g., homing or differentiation into Cx3cr1<sup>+</sup> macrophage subsets). This weakens the link between specific monocyte subsets and the observed phenotype.

      We thank the reviewer for this important point. We acknowledge that direct in vivo tracking of the adoptively transferred monocytes to confirm their homing to the colon and differentiation into specific macrophage subsets would strengthen the mechanistic link. However, due to technical limitations in reliably tracing the fate of transferred cells in our experimental setting, we were unable to provide this direct evidence. Instead, we present a strong correlative and functional evidence chain that supports the proposed model:

      (a) Following BG pretreatment, we observed a significant decrease in circulating Ly6Chi monocytes specifically at the peak of colitis (day 7, Fig. 5D), concurrent with a marked increase in monocytes/macrophages within the colonic lamina propria (Fig. 2D). This inverse relationship strongly suggests enhanced recruitment of monocytes from the blood into the inflamed colon upon BG training.

      (b) Using CX3CR1-GFP reporter mice, we found that BG pretreatment led to an increased proportion of colonic myeloid cells in an intermediate state (P5: Ly6C<sup>+</sup>MHCII<sup>+</sup>CX3CR1<sup>+</sup>, Fig. 5F). This population represents monocytes actively undergoing differentiation into intestinal macrophages, supporting the idea that BG accelerates the monocyte-to-macrophage transition in situ.

      (c) Our scRNA-seq analysis independently revealed an expansion of monocyte-derived macrophage clusters (e.g., Macro1, Macro2) in BG-treated mice, which express canonical tissue macrophage markers (including Cx3cr1) and genes associated with tissue repair (e.g., Vegfa, Fig. 4A, 5H, 5I).

      These data collectively indicate that BG-trained monocytes exhibit enhanced capacity for colonic recruitment and preferential differentiation toward reparative macrophage subsets, which aligns with the protective phenotype observed after adoptive transfer. We have explicitly noted the absence of direct fate-mapping data as a limitation in the revised Discussion and agree that future studies employing advanced tracing techniques would be valuable to definitively establish this cellular trajectory. (Line 378-380)

      (4) While scRNA-seq reveals distinct monocyte/macrophage subclusters (Mono1-3.), their specific functional roles remain speculative. The authors assign reparative or antimicrobial functions based on transcriptional signatures, but do not perform causal experiments (depletion or in vitro assays). The biological roles of these cells remain correlative.

      We agree that the functional role of CX3CR1<sup>+</sup> macrophages is not comprehensively validated and is currently inferred from scRNA-seq clustering. While our flow cytometry data show increased CX3CR1<sup>+</sup> macrophages in the BG-TI group, and our CCR2 KO and monocyte adoptive transfer experiments indicate these macrophages are monocyte-derived, suggesting at least that β-glucan pretreatment alters the monocyte capacity which directly contribute to the enhanced colitis alleviation phenotype as observed. However, due to the fact that we fail to find a cluster dependent marker, which is also the current biggest caveats of the scRNAseq defined cell subclusters, we were not able to show direct casual evidence via specifically depleting subcluster cells. However, the result from the monocyte adoptive transfer experiment with Ccr2 KO mice experimental strongly suggest the presence of monocytes is crucial for this protective effect. We fully acknowledge this as a limitation of current study and clarify in the discussion that our conclusions regarding CX3CR1<sup>+</sup> macrophage function are mainly based on transcriptional profiling and association with protective phenotypes, rather than direct causal evidence (Lines 400-404).

      (5) While Rag1<sup>-/-</sup> mice were used to rule out adaptive immunity, the potential role of innate lymphoid cells (ILCs), particularly ILC2s and ILC3s, which are known to promote mucosal repair (PMID: 27484190 IF: 7.6 Q1 IF: 7.6 Q1 IF: 7.6 Q1), was not explored. Given the reparative phenotype observed, the contribution of ILCs remains a confounding factor.

      We appreciate your valuable comment regarding the potential role of ILCs in the observed mucosal repair. Indeed, in our current manuscript examining the BG-trained immunity effect, the contribution of ILCs was not evaluated. Due to the fact that adoptive transfer of trained monocytes into CCR2 KO mice could recapitulate the colitis alleviation phenotype, we think at least the β-glucan enhanced protection are dependent on trained monocytes. While acknowledge that the limitation and we could not rule out the possible role of ILCs in this process and discuss this limitation in the discussion in the revised manuscript.

      The literature (PMID: 21502992; PMID: 32187516) supports a role for ILC3-mediated IL-22 production in tissue repair, which could overlap with our observed effects. However, our monocyte adoptive transfer experiments show that monocytes alone can alleviate DSS-induced colitis, suggesting a dominant role for monocytes in this context. Nonetheless, we will make it clear that ILC contributions cannot be excluded. (Line 322-326).

      Reviewer 2 (Recommendations for the authors):

      (1) The authors do not provide direct mechanistic evidence of TI (e.g., epigenetic and metabolic reprogramming). The absence of such data weakens the mechanistic strength of the TI claim. The authors should soften the terminology to BGinduced myeloid reprogramming suggestive of trained immunity, acknowledge, and discuss this limitation.

      We appreciate your comment highlighting the lack of direct epigenetic and metabolic assessment in our current study. Previous work from our group (S.-C. Cheng) and others has extensively documented the epigenetic and metabolic profiles of monocytes from β-glucan–trained mice, focusing primarily on inflammatory-related genes. Based on this established foundation, our current manuscript focuses on exploring the translational potential of BG-induced trained immunity.

      That said, as mentioned in our response to the identified weakness, we performed reanalysis from the public epigenetic datasets with a focus on pathways related to reparative and antibacterial functions and integrated this part in the revised manuscript (Fig S7, Lines 201-211).

      (2) CX3CR1<sup>+</sup> macrophages' role is not functionally validated. The data relies solely on scRNA-seq and cluster annotations, which are insufficient to confirm functional roles in vivo. Depletion or in vitro studies would provide stronger causal evidence. The authors should acknowledge this limitation in the Discussion.

      We agree that the functional role of CX3CR1<sup>+</sup> macrophages is not comprehensively validated and is currently inferred from scRNA-seq clustering. While our flow cytometry data show increased CX3CR1<sup>+</sup> macrophages in the BG-TI group, and our CCR2 KO and monocyte adoptive transfer experiments indicate these macrophages are monocyte-derived, suggesting at least that β-glucan pretreatment alters the monocyte capacity which directly contribute to the enhanced colitis alleviation phenotype as observed. However, due to the fact that we fail to find a cluster dependent marker, which is also the current biggest caveats of the scRNAseq defined cell subclusters, we were not able to show a direct casual evidence. We fully acknowledge this as a limitation of current study and clarify in the discussion that our conclusions regarding CX3CR1<sup>+</sup> macrophage function are mainly based on transcriptional profiling and association with protective phenotypes, rather than direct causal evidence (Lines 395-404).

      (3) Rag1<sup>-/-</sup> mice retain innate lymphoid cells (ILCs), particularly ILC3, which are mucosal and produce IL-22, contributing to tissue repair (PMID: 21502992; PMID: 32187516). The potential for BG to activate ILCs remains unexplored in this study. This limits the interpretation of whether the observed protection arises from monocyte/macrophage reprogramming or is partially mediated by residual ILC activity. The authors should explicitly acknowledge this limitation and discuss the possible contribution of ILCs to the observed phenotype.

      We appreciate your valuable comment regarding the potential role of ILCs in the observed mucosal repair. Indeed, in our current manuscript examining the BG-trained immunity effect, the contribution of ILCs was not evaluated. Due to the fact that adoptive transfer of trained monocytes into CCR2 KO mice could recapitulate the colitis alleviation phenotype, we think at least the β-glucan enhanced protection are dependent on trained monocytes. While acknowledge that the limitation and we could not rule out the possible role of ILCs in this process and discuss this limitation in the discussion in the revised manuscript

      The literature (PMID: 21502992; PMID: 32187516) supports a role for ILC3-mediated IL-22 production in tissue repair, which could overlap with our observed effects. However, our monocyte adoptive transfer experiments show that monocytes alone can alleviate DSS-induced colitis, suggesting a dominant role for monocytes in this context. Nonetheless, we will make it clear that ILC contributions cannot be excluded. (Line 322-327).

      (4) Figure 1-It would help to clarify whether a BG-only control group (without DSS) was included in the design. This would be critical to determine if BG alone alters the colon. If omitted, the authors should clearly state this and consider adding such a group in future experiments. This would help define the baseline effects of BG and support the claim that its benefits are dependent on TI (upon second challenge - DSS).

      We appreciate this valuable suggestion. While we did not perform qPCR to assess mucosal repair genes in Figure S1C and Figure S1D, our colon RNA-seq analysis in Figure 5G included a dedicated BG-only control group at based line before DSStreatment (Colitis_d0). These data indicate that BG preconditioning alone does not alter the baseline expression of colon mucosal repair genes.

      (5) Figure 3 - It would strengthen the conclusions to include a vehicle-treated PBS BMT donor control group, or to state its absence. It is unclear whether the protective effect observed in recipients of BG-treated BM is due to trained immunity or to non-specific effects of transplantation, irradiation, or batch variation.

      We fully agree with your comments that it is critical to including the vehicle-treated PBS BMT control to rule out any non-specific effects induced by transplantation, irradiation or batch variation. We actually did the blank PBS transfer control everytime after mice received irradiation treatment as a control to assess the successful induction of irradiation to get rid of bone marrow from irradiated mice. Mice that receive PBS only will die after 8 days while only mice receiving either bone marrow from PBScontrol or BG-treatment group will survive. We also perform flowcytometry to examine the successful BMT transplantation (Fig S5C). We have added part regarding the vehicle-treated control for BMT in the material method section for clarification (Lines 456-466).

      (6) No gene expression or phenotypic data is provided for monocytes/macrophages in BMT recipients; therefore, it cannot be confidently stated that these cells were reprogrammed. Expression/phenotypic data should be added or discussed.

      We thank the reviewer for raising this important point. We acknowledge that a detailed transcriptomic or phenotypic analysis of donor-derived tissue-resident myeloid cells in the BMT recipients would provide the most direct evidence for their reprogrammed state.

      While our BMT study focused primarily on assessing the transferability of the protective phenotype via endpoint disease parameters and circulating immune cell composition, we present a coherent and compelling line of evidence supporting the conclusion that BG's training effect is maintained within the hematopoietic system of recipients and mediated by reprogrammed myeloid cells:

      (a) A key finding is the significant increase in the proportion of donor-derived Ly6Chi monocytes in the peripheral blood of recipients receiving BG-trained bone marrow (Fig. 3J). This is not a bystander effect but direct evidence that the BG-induced on donor hematopoietic stem/progenitor cells instructs a biased differentiation program towards a specific effector precursor population within the new host, demonstrating the functional persistence of the trained state post-transplantation.

      (b) The core of reprogramming in trained immunity lies in persistent epigenetic and functional changes. Our new analysis of public datasets (Fig. S7) confirms that BG directly reshapes the chromatin accessibility landscape in hematopoietic stem cells (HSCs), particularly at loci regulating immune and antibacterial responses. This provides the fundamental mechanism explaining how the trained phenotype is both long-lasting and transplantable: the reprogramming occurs at the progenitor level.

      (c) The most causally compelling data in our study comes from the independent adoptive transfer experiment, where transfer of purified BG-trained monocytes alone was sufficient to ameliorate colitis in recipient mice (Fig. 3K, L). This definitively proves that the trained monocytes themselves carry the protective functional program. It strongly suggests that these reprogrammed monocytes/macrophages are the likely effectors mediating protection in the BMT model.

      (d) Our interpretation aligns with well-established paradigms in the field. Precedent studies confirm that the BG-trained phenotype (e.g., enhanced cytokine potential) can be transferred via BMT or monocyte adoption. For instance, Haacke et al. (PMID: 40020679) demonstrated that splenic monocytes from BG-trained donors, when transferred into arthritic recipient mice, led to elevated inflammatory cytokine (e.g., Tnf, Il6) expression in recipient joints, directly proving the maintained functional reprogramming of trained cells in a heterologous host environment. This provides a strong precedent supporting the functional activity of transferred trained cells in our model.

      (7) The study is consistent with emerging evidence that distinct TI programs may exist depending on the stimulus and context, including immunoregulatory and tissue-reparative responses (PMID: 35133977; PMID: 31732931; PMID: 32716363; PMID: 30555483). The authors should integrate this perspective into the Discussion to acknowledge that their findings may represent one example of such context-dependent, potentially reparative TI programs. This would place the study within the growing literature describing functional heterogeneity in innate immune training.

      We appreciate this suggestion and have incorporated it into the discussion. In the revised manuscript, we discussed how our findings of BG-induced protective myeloid reprogramming align with the concept of tissue-reparative or immunoregulatory TI, which is distinct from the pro-inflammatory TI phenotypes described in other contexts. By highlighting the functional heterogeneity of innate immune training, we position our work as an example of a stimulus-specific, reparative TI program. (Lines 356-379)

      Reviewer #3 (Public review):

      Summary:

      In the present work, Yinyin Lv et al offer evidence for the therapeutic potential of trained immunity in the context of inflammatory bowel disease (IBD). Prior research has demonstrated that innate cells pre-treated (trained) with β-glucan show an enhanced pro-inflammatory response upon a second challenge.

      While an increased immune response can be beneficial and protect against bacterial infections, there is also the risk that it will worsen symptoms in various inflammatory disorders. In the present study, the authors show that mice preconditioned with β-glucan have enhanced resistance to Staphylococcus aureus infection, indicating heightened immune responses.

      The authors demonstrate that β-glucan training of bone marrow hematopoietic progenitors and peripheral monocytes mitigates the pro-inflammatory effects of colitis, with protection extending to naïve recipients of the trained cells.

      Using a dextran sulfate sodium (DSS)-induced model of colitis, β-glucan pre-treatment significantly dampens disease severity. Importantly, the use of Rag1<sup>-/-</sup> mice, which lack adaptive immune cells, confirms that the protective effects of β-glucan are mediated by innate immune mechanisms. Further, experiments using Ccr2<sup>-/-</sup> mice underline the necessity of monocyte recruitment in mediating this protection, highlighting CCR2 as a key factor in the mobilization of β-glucan-trained monocytes to inflamed tissues. Transcriptomic profiling reveals that β-glucan training upregulates genes associated with pattern recognition, antimicrobial defense, immunomodulation, and interferon signaling pathways, suggesting broad functional reprogramming of the innate immune compartment. In addition, β-glucan training induces a distinct monocyte subpopulation with enhanced activation and phagocytic capacity. These monocytes exhibit an increased ability to infiltrate inflamed colonic tissue and differentiate into macrophages, marked by increased expression of Cx3cr1. Moreover, among these trained monocyte and macrophage subsets, other gene expression signatures are associated with tissue and mucosal repair, suggesting a role in promoting resolution and regeneration following inflammatory insult.

      Strengths:

      (1) Overall, the authors present a mechanistically insightful investigation that advances our understanding of trained immunity in IBD.

      (2) By employing a range of well-characterized murine models, the authors investigate specific mechanisms involved in the effects of β-glucan training.

      (3) Furthermore, the study provides functional evidence that the protection conferred by the trained cells persists within the hematopoietic progenitors and can be transferred to naïve recipients. The integration of transcriptomic profiling allows the identification of changes in key genes and molecular pathways underlying the trained immune phenotype.

      (4) This is an important study that demonstrates that β-glucan-trained innate cells confer protection against colitis and promote mucosal repair, and these findings underscore the potential of harnessing innate immune memory as a therapeutic approach for chronic inflammatory diseases.

      Thank you for the positive evaluation and constructive feedback on our manuscript.

      Weaknesses:

      However, FPKM is not ideal for between-sample comparisons due to its within-sample normalization approach. Best practices recommend using raw counts (with DESeq2) for more robust statistical inference.

      We appreciate the reminder about best practices for RNA-seq analysis. We apologize for the inaccurate description in the Materials and Methods section. For all differential expression analyses, we have in fact used raw count data as input for DESeq2. FPKM values were only used for visualization purposes, such as in heatmaps and clustering analyses. We correct this description in the revised manuscript to accurately reflect our analysis workflow. (Lines 488-499)

      Reviewer 3 (Recommendations for the authors):

      (1) Current best practices recommend working with raw count data when using DESeq2 to ensure statistically robust differential expression analysis between samples. However, for visualization and clustering, like heatmaps, FPKMs can be used. Could the authors explain why they have used FPKM for differential gene expression analysis?

      We appreciate the reminder about best practices for RNA-seq analysis. We apologize for the inaccurate description in the Materials and Methods section. For all differential expression analyses, we have in fact used raw count data as input for DESeq2. FPKM values were only used for visualization purposes, such as in heatmaps and clustering analyses. We correct this description in the revised manuscript to accurately reflect our analysis workflow. (Lines 488-499)

      Minor Comment

      (1) Line 92: remove extra word "that".

      We remove the extra word “that” from Line 92 in the revised manuscript.

      (2) Line 201: please state here what "GBP" stands for, as it appears first.

      We define “GBP” as “Guanylate-Binding Protein” at its first appearance in Line 201. (Lines 213)

      (3) Line 235: consider rewriting "we analyzed the day 7 RNA-seq data, which revealed significant enrichment of the myeloid"; added spacing for "day 7", "which", and "the".

      We revise the sentence in Line 235 to read: “We analyzed the day 7 RNA-seq data, which revealed significant enrichment of the myeloid…” to improve readability. (Lines

      246-247)

      (4) Line 290: consider rewriting " as seen in conditions such as rheumatoid arthritis and ...".

      We revise Line 290 to: “as observed in conditions such as rheumatoid arthritis and…” for clarity. (Lines 301-302)

      (5) Line 375-376: please check sentence starting lower case "with minor modifications, by assessing ".

      We correct the sentence to start with a capital letter: “With minor modifications, by assessing…” (Lines 422-423)

      (6) Line 399: kindly consider adding "was" after "cDNA".

      We revise Line 399 to include “was” as suggested: “cDNA was synthesized…” (Lines 446)

      (7) Line 346-347: consider adding "which" after "monocytes": "We transferred BGpreconditioned monocytes which significantly alleviated clinical symptoms".

      We revise Line 346-347 to include “which” as suggested for grammatical clarity. (Lines 385-386)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Figure 1B shows the PREDICTED force-extension curve for DNA based on a worm-like chain model. Where is the experimental evidence for this curve? This issue is crucial because the F-E curve will decide how and when a catch-bond is induced (if at all it is) as the motor moves against the tensiometer. Unless this is actually measured by some other means, I find it hard to accept all the results based on Figure 1B.

      The Worm-Like-Chain model for the elasticity of DNA was established by early work from the Bustamante lab (Smith et al., 1992) and Marko and Siggia (Marko and Siggia, 1995), and was further validated and refined by the Block lab (Bouchiat et al., 1999; Wang et al., 1997). The 50 nm persistence length is the consensus value, and was shown to be independent of force and extension in Figure 3 of Bouchiat et al (Bouchiat et al., 1999). However, we would like to stress that for our conclusions, the precise details of the Force-Extension relationship of our dsDNA are immaterial. The key point is that the motor stretches the DNA and stalls when it reaches its stall force. Our claim of the catch-bond character of kinesin is based on the longer duration at stall compared to the run duration in the absence of load. Provided that the motor is indeed stalling because it has stretched out the DNA (which is strongly supported by the repeated stalling around the predicted extension corresponding to ~6 pN of force), then the stall duration depends on neither the precise value for the extension nor the precise value of the force at stall.

      (2) The authors can correct me on this, but I believe that all the catch-bond studies using optical traps have exerted a load force that exceeds the actual force generated by the motor. For example, see Figure 2 in reference 42 (Kunwar et al). It is in this regime (load force > force from motor) that the dissociation rate is reduced (catch-bond is activated). Such a regime is never reached in the DNA tensiometer study because of the very construction of the experiment. I am very surprised that this point is overlooked in this manuscript. I am therefore not even sure that the present experiments even induce a catch-bond (in the sense reported for earlier papers).

      It is true that Kunwar et al measured binding durations at super-stall loads and used that to conclude that dynein does act as a catch-bond (but kinesin does not) (Kunwar et al., 2011). However, we would like to correct the reviewer on this one. This approach of exerting super-stall forces and measuring binding durations is in fact less common than the approach of allowing the motor to walk up to stall and measuring the binding duration. This ‘fixed trap’ approach has been used to show catch-bond behavior of dynein (Leidel et al., 2012; Rai et al., 2013) and kinesin (Kuo et al., 2022; Pyrpassopoulos et al., 2020). For the non-processive motor Myosin I, a dynamic force clamp was used to keep the actin filament in place while the myosin generated a single step (Laakso et al., 2008). Because the motor generates the force, these are not superstall forces either.

      (3) I appreciate the concerns about the Vertical force from the optical trap. But that leads to the following questions that have not at all been addressed in this paper:

      (i) Why is the Vertical force only a problem for Kinesins, and not a problem for the dynein studies?

      Actually, we do not claim that vertical force is not a problem for dynein; our data do not speak to this question. There is debate in the literature as to whether dynein has catch bond behavior in the traditional single-bead optical trap geometry - while some studies have measured dynein catch bond behavior (Kunwar et al., 2011; Leidel et al., 2012; Rai et al., 2013), others have found that dynein has slip-bond or ideal-bond behavior (Ezber et al., 2020; Nicholas et al., 2015; Rao et al., 2019). This discrepancy may relate to vertical forces, but not in an obvious way.

      (ii) The authors state that "With this geometry, a kinesin motor pulls against the elastic force of a stretched DNA solely in a direction parallel to the microtubule". Is this really true? What matters is not just how the kinesin pulls the DNA, but also how the DNA pulls on the kinesin. In Figure 1A, what is the guarantee that the DNA is oriented only in the plane of the paper? In fact, the DNA could even be bending transiently in a manner that it pulls the kinesin motor UPWARDS (Vertical force). How are the authors sure that the reaction force between DNA and kinesin is oriented SOLELY along the microtubule?

      We acknowledge that “solely” is an absolute term that is too strong to describe our geometry. We softened this term in our revision to “nearly parallel to the microtubule” (Line 464). In the Geometry Calculations section of Supplementary Methods, we calculate that if the motor and streptavidin are on the same protofilament, the vertical force will be <1% of the horizontal force. We also note that if the motor is on a different protofilament, there will be lateral forces and forces perpendicular to the microtubule surface, except they are oriented toward rather than away from the microtubule. The DNA can surely bend due to thermal forces, but because inertia plays a negligible role at the nanoscale (Howard, 2001; Purcell, 1977), any resulting upward forces will only be thermal forces, which the motor is already subjected to at all times.

      (4) For this study to be really impactful and for some of the above concerns to be addressed, the data should also have included DNA tensiometer experiments with Dynein. I wonder why this was not done?

      As much as we would love to fully characterize dynein here, this paper is about kinesin and it took a substantial effort. The dynein work merits a stand-alone paper.

      While I do like several aspects of the paper, I do not believe that the conclusions are supported by the data presented in this paper for the reasons stated above.

      The three key points the reviewer makes are the validity of the worm-like-chain model, the question of superstall loads, and the role of DNA bending in generating vertical forces. We hope that we have fully addressed these concerns in our responses above.

      Reviewer #2 (Public review):

      Major comments:

      (1) The use of the term "catch bond" is misleading, as the authors do not really mean consistently a catch bond in the classical sense (i.e., a protein-protein interaction having a dissociation rate that decreases with load). Instead, what they mean is that after motor detachment (i.e., after a motor protein dissociating from a tubulin protein), there is a slip state during which the reattachment rate is higher as compared to a motor diffusing in solution. While this may indeed influence the dynamics of bidirectional cargo transport (e.g., during tug-of-war events), the used terms (detachment (with or without slip?), dissociation, rescue, ...) need to be better defined and the results discussed in the context of these definitions. It is very unsatisfactory at the moment, for example, that kinesin-3 is at first not classified as a catch bond, but later on (after tweaking the definitions) it is. In essence, the typical slip/catch bond nomenclature used for protein-protein interaction is not readily applicable for motors with slippage.

      We acknowledge that our treatment of kinesin-3 was confusing. In response, we deleted any reference to kinesin-3 catch-bond in the Results section, and restricted it to the Discussion where it is interpretation. In Line 635 in the Discussion, we softened the statement of catch-bond activity to “…all three dominant kinesin transport families display catch-bond like behavior at stall…”. We acknowledge that, classically, the catch/slip bond nomenclature refers to simple protein-protein interactions and is easier to interpret there. However, the term ‘catch-bond’ has been used in the literature for myosin, dynein and kinesin, and thus we feel that it is sufficiently established to use it here.

      (2) The authors define the stall duration as the time at full load, terminated by >60 nm slips/detachments. Isn't that a problem? Smaller slips are not detected/considered... but are also indicative of a motor dissociation event, i.e., the end of a stall. What is the distribution of the slip distances? If the slip distances follow an exponential decay, a large number of short slips are expected, and the presented data (neglecting those short slips) would be highly distorted.

      The reviewer brings up a good point that there may be undetected slips. To address this question, we plotted the distribution of slip distances for kinesin-3, which by far had the most slip events. As the reviewer suggested, it is indeed an exponential distribution, and we calculated a corrected kinesin-3 stall duration due to these undetected slips. This data and analysis are included as a new Supplementary Figure S8. In the main text on Lines 283-293 we included the following text:

      “It was notable that the kinesin-3 stall durations at high load are longer than the ramp durations at low load, because this indicates that the kinesin-3 off-rate slows with increasing load. However, because kinesin-3 had the most slip events at stall, we were concerned that there may be undetected slip events below the 60 nm threshold of detection that led to an overestimation of the kinesin-3 stall duration. To test this hypothesis, we plotted the distribution of kinesin-3 slip distances at stall, fit an exponential, and calculated the fraction of missed slip events (Fig. S8). From this analysis, we calculated a correction factor of 1.42 that brought the kinesin-3 stall duration down 1.33 s. Notably, this stall duration value is still well above the kinesin-3 ramp duration value of 0.75 s in Fig. 3C and thus does not qualitatively change our conclusions.”

      We thank the reviewer for this suggestion.

      (3) Along the same line: Why do the authors compare the stall duration (without including the time it took the motor to reach stall) to the unloaded single motor run durations? Shouldn't the times of the runs be included?

      The elastic force of the DNA spring is variable as the motor steps up to stall, and so if we included the entire run duration then it would be difficult to specify what force we were comparing to unloaded. More importantly, if we assume that any stepping and detachment behavior is history independent, then it is mathematically proper to take any arbitrary starting point (such as when the motor reaches stall), start the clock there, and measure the distribution of detachments durations relative to that starting point. More importantly, what we do in Fig. 3 is to separate out the ramps from the stalls and, using a statistical model, we compute a separate duration parameter (which is the inverse of the off-rate) for the ramp and the stall. What we find is that the relationship between ramp, stall, and unloaded durations is different for the three motors, which is interesting in itself.

      (4) At many places, it appears too simple that for the biologically relevant processes, mainly/only the load-dependent off-rates of the motors matter. The stall forces and the kind of motor-cargo linkage (e.g., rigid vs. diffusive) do likely also matter. For example: "In the context of pulling a large cargo through the viscous cytoplasm or competing against dynein in a tug-of-war, these slip events enable the motor to maintain force generation and, hence, are distinct from true detachment events." I disagree. The kinesin force at reattachment (after slippage) is much smaller than at stall. What helps, however, is that due to the geometry of being held close to the microtubule (either by the DNA in the present case or by the cargo in vivo) the attachment rate is much higher. Note also that upon DNA relaxation, the motor is likely kept close to the microtubule surface, while, for example, when bound to a vesicle, the motor may diffuse away from the microtubule quickly (e.g., reference 20).

      We appreciate the reviewer’s detailed thinking here, and we offer our perspective. As to the first point, we agree that the stall force is relevant and that the rigidity of the motor-cargo linkage will play a role. The goal of the sentence on pulling cargo that the reviewer highlights is to set up our analysis of slips, which we define as rearward displacements that don’t return to the baseline before force generation resumes. We revised this sentence to the following: “In the context of pulling a large cargo through the viscous cytoplasm or competing against dynein in a tug-of-war, these slip events enable the motor to continue generating force after a small rearward displacement, rather than fully detaching and ‘resetting’ to zero load.” (Line 339-342)

      It should be noted that, as shown in the model diagram in Fig. 5, we differentiate between the slip state (and recovery from this slip state) and the detached state (and reattachment from this detached state). This delineation is important because, as the reviewer points out, if we are measuring detachment and reattachment with our DNA tensiometer, then the geometry of a vesicle in a cell will be different and diffusion away from the microtubule or elastic recoil perpendicular to the microtubule will suppress this reattachment.

      Our evidence for a slip state in which the motor maintains association with the microtubule comes from optical trapping work by Tokelis et al (Toleikis et al., 2020) and Sudhakar et al (Sudhakar et al., 2021). In particular, Sudhakar used small, high index Germanium microspheres that had a low drag coefficient. They showed that during ‘slip’ events, the relaxation time constant of the bead back to the center of the trap was nearly 10-fold slower than the trap response time, consistent with the motor exerting drag on the microtubule. (With larger beads, the drag of the bead swamps the motor-microtubule friction.) Another piece of support for the motor maintaining association during a slip is work by Ramaiya et al. who used birefringent microspheres to exert and measure rotational torque during kinesin stepping (Ramaiya et al., 2017). In most traces, when the motor returned to baseline following a stall, the torque was dissipated as well, consistent with a ‘detached’ state. However, a slip event is shown in S18a where the motor slips backward while maintaining torque. This is best explained by the motor slipping backward in a state where the heads are associated with the microtubule (at least sufficiently to resist rotational forces). Thus, we term the resumption after slip to be a rescue from the slip state rather than a reattachment from the detached state.

      To finish the point, with the complex geometry of a vesicle, during slip events the motor remains associated with the microtubule and hence primed for recovery. This recovery rate is expected to be the same as for the DNA tensiometer. Following a detachment, however, we agree that there will likely be a higher probability of reattachment in the DNA tensiometer due to proximity effects, whereas with a vesicle any elastic recoil or ‘rolling’ will pull the detached motor away from the microtubule, suppressing reattachment. To address this point, we added in the Discussion on lines 654-656:

      “Additionally, any ‘rolling’ of a spherical cargo following motor detachment will tend to suppress the motor reattachment rate.”

      (5) Why were all motors linked to the neck-coil domain of kinesin-1? Couldn't it be that for normal function, the different coils matter? Autoinhibition can also be circumvented by consistently shortening the constructs.

      We chose this dimerization approach to focus on how the mechoanochemical properties of kinesins vary between the three dominant transport families. We agree that in cells, autoinhibition of both kinesins and dynein likely play roles in regulating bidirectional transport, as will the activity of other regulatory proteins. The native coiled-coils may act as ‘shock absorbers’ due to their compliance, or they might slow the motor reattachment rate due to the relatively large search volumes created by their long lengths (10s of nm). These are topics for future work. By using the neck-coil domain of kinesin-1 for all three motors, we eliminate any differences in autoinhibition or other regulation between the three kinesin families and focus solely on differences in the mechanochemistry of their motor domains.

      (6) I am worried about the neutravidin on the microtubules, which may act as roadblocks (e.g. DOI: 10.1039/b803585g), slip termination sites (maybe without the neutravidin, the rescue rate would be much lower?), and potentially also DNA-interaction sites? At 8 nM neutravidin and the given level of biotinylation, what density of neutravidin do the authors expect on their microtubules? Can the authors rule out that the observed stall events are predominantly the result of a kinesin motor being stopped after a short slippage event at a neutravidin molecule?

      (7) Also, the unloaded runs should be performed on the same microtubules as in the DNA experiments, i.e., with neutravidin. Otherwise, I do not see how the values can be compared.

      To address the question of neutravidin acting as a roadblock, we did the following. Because of the sequence of injections used to assemble the tensiometer in the flow cell, there are often some residual GFP-kinesin motors that aren’t attached to DNA and thus serve as internal controls for unloaded motility on the neutravidin-functionalized Mt. We quantified the run durations of these free kinesin-GFP and found that their run duration was 0.92 s (95% CI: 0.79 to 1.04 by MEMLET). This is slightly lower but not statistically different from the 1.04 s [0.78, 1.31] on control microtubules in Fig 2A. This result is included in Figure S6 in the revised manuscript.

      We don’t have a precise estimate for the amount of neutravidin on the microtubules. Based on Fig. 3C of Korten and Diez (Korten and Diez, 2008), the reduction in the unloaded run duration that we see corresponds to a ~2% biotinylation ratio. We polymerize Mt with 10% biotinylated tubulin and add 8 nM neutravidin to the flow cell, so in principle the microtubules could be 10% biotin-streptavidin coated. However, there are a number of uncertainties that push this estimate lower – a) the precise degree of biotinylation, b) whether the %biotinylated tubulin in polymerized microtubules is lower than the mixing ratio due to unequal incorporation, and 3) what fraction of the biotinylated tubulin are occupied by the neutravidin when using this neutravidin flow-in method. Thus, our best estimate is ~2% biotin-streptavidin functionalization.

      The ramp durations in Fig. 3 provide another argument that biotinylated microtubules are not affecting the motors. Compared to unloaded durations for each motor, the kinesin-1 ramps were longer, the kinesin-2 ramps were the same, and the kinesin-3 ramps were shorter duration. That argues against any systematic effect of biotinylation on motor run durations, with the caveat that family-dependent differences could in principle be masking an effect. The fact that ramp durations aren’t systematically longer or shorter than the unloaded run durations also argues that the stalls we see, which are at the expected extension length of the dsDNA, are not caused by neutravidin roadblocks.

      The final point the reviewer brings up is whether neutravidin may be contributing to the rescues from slips events that we observe. This is difficult to fully rule out. However, because the unloaded run durations aren’t significantly altered by the biotin-streptavidin on the microtubules, we don’t expect the rescue events following a slip to be significantly affected. In principle, we could systematically increase and decrease the biotinylation and see whether the slip rescues change, but we haven’t done this.

      (8) If, as stated, "a portion of kinesin-3 unloaded run durations were limited by the length of the microtubules, meaning the unloaded duration is a lower limit." corrections (such as Kaplan-Meier) should be applied, DOI: 10.1016/j.bpj.2017.09.024.

      (9) Shouldn't Kaplan-Meier also be applied to the ramp durations ... as a ramp may also artificially end upon stall? Also, doesn't the comparison between ramp and stall duration have a problem, as each stall is preceded by a ramp ...and the (maximum) ramp times will depend on the speed of the motor? Kinesin-3 is the fastest motor and will reach stall much faster than kinesin-1. Isn't it obvious that the stall durations are longer than the ramp duration (as seen for all three motors in Figure 3)?

      The reviewer rightly notes the many challenges in estimating the motor off-rates during ramps. To estimate ramp off-rates and as an independent approach to calculating the unloaded and stall durations, we developed a Markov model coupled with Bayesian inference methods to estimate a duration parameter (equivalent to the inverse of the off-rate) for the unloaded, ramp, and stall duration distributions. With the ramps, we have left censoring due to the difficulty in detecting the start of the ramps in the fluctuating baseline, and we have right censoring due to reaching stall (with different censoring of the ramp duration for the three motors due to their different speeds). The Markov model assumes a constant detachment probability and history-independence, and thus is robust even in the face of left and right censoring (details in the Supplementary section). This approach is preferred over Kaplan-Meier because, although non-parametric methods such as K-M make no assumptions for the distribution, they require the user to know exactly where the start time is.

      Regarding the potential underestimate of the kinesin-3 unloaded run duration due to finite microtubule lengths. The first point is that the unloaded duration data in Fig. 2C are quite linear up to 6 s and are well fit by the single-exponential fit (the points above 6 s don’t affect the fit very much). The second point is that when we used our Markov model (which is robust against right censoring) to estimate the unloaded and stall durations, the results agreed with the single-exponential fits very well (Table S2). Specifically, the single-exponential fit for the kinesin-3 unloaded duration was 2.74 s (2.33 – 3.17 s 95% CI) and the estimate from the Markov model was 2.76 (2.28 – 3.34 s 95% CI). Thus, we chose not to make any corrections to the kinesin-3 unloaded run durations due to finite microtubule lengths. To address this point in the revision, we added the following note in Table S2: “* Because the Markov-Bayesian model, which is unaffected by left and right censoring of data gave same unloaded run durations for kinesin-3 as the MEMLET fit, we did not the kinesin-3 unloaded run durations for any right censoring due to finite microtubule lengths.” We also added the following point in the legend of Fig. S1: “A fraction of kinesin-3 unloaded run durations were limited by the length of the microtubules, but fitting to a model that took into account missed events gave a similar mean duration as an exponential fit, and so no correction was made (Table S2).”

      (10) It is not clear what is seen in Figure S6A: It looks like only single motors (green, w/o a DNA molecule) are walking ... Note: the influence of the attached DNA onto the stepping duration of a motor may depend on the DNA conformation (stretched and near to the microtubule (with neutravidin!) in the tethered case and spherically coiled in the untethered case).

      In Figure S6 kymograph, the green traces are GFP-labeled kinesin-1 without DNA attached (which are in excess) and the red diagonal trace is a motor with DNA attached. We clarified this in the revised Figure S6 legend. We agree that the DNA conformation will differ if it is attached and stretched (more linear) versus simply being transported (random coil), but by its nature this control experiment is only addressing random coil DNA.

      (11) Along this line: While the run time of kinesin-1 with DNA (1.4 s) is significantly shorter than the stall time (3.0 s), it is still larger than the unloaded run time (1.0 s). What do the authors think is the origin of this increase?

      We addressed this point in lines 200-212 of the revised manuscript:

      “We carried out two additional control experiments. First, to confirm that the neutravidin used to link the DNA to the microtubule wasn’t affecting kinesin motility, we analyzed the run durations of kinesin-1 motors on neutravidin-coated microtubules and found no change compared to unlabeled microtubules (Fig. S6). Second, we measured the run duration of kinesin-1 linked to a DNA tether that was not bound to the microtubule and thus was being transported (Fig. S6). The kinesin-DNA run duration was 1.40 s, longer than the 1.04 s of motors alone (Fig. 2A). We interpret this longer duration to reflect the slower diffusion constant of the dsDNA relative to the motor alone, which enables motors to transiently detach and rebind before the DNA cargo has diffused away, thus extending the run duration (Block et al., 1990). Notably, this slower diffusion constant should not play a role in the DNA tensiometer geometry because if the motor transiently detaches, it will be pulled backward by the elastic forces of the DNA and detected as a slip or detachment event.“

      (12) "The simplest prediction is that against the low loads experienced during ramps, the detachment rate should match the unloaded detachment rate." I disagree. I would already expect a slight increase.

      Agreed. We changed this text (Lines 265-267) to: “The prediction for a slip bond is that against the low loads experienced during ramps, the detachment rate should be equal to or faster than the unloaded detachment rate.”

      (13) Isn't the model over-defined by fitting the values for the load-dependence of the strong-to-weak transition and fitting the load dependence into the transition to the slip state?

      Essentially, yes, it is overdefined, but that is essentially by design and the model is still very useful. Our goal here was to make as simple a model as possible that could account for the data and use it to compare model parameters for the different motor families. Ignoring the complexity of the slip and detached states, a model with a strong and weak state in the stepping cycle and a single transition out of the stepping cycle is the simplest formulation possible. And having rate constants (k<sub>S-W</sub> and k<sub>slip</sub> in our case) that vary exponentially with load makes thermodynamic sense for modeling mechanochemistry (Howard, 2001). Thus, we were pleasantly surprised that this bare-bones model could recapitulate the unloaded and stall durations for all three motors (Fig. 5C-E).

      (14) "When kinesin-1 was tethered to a glass coverslip via a DNA linker and hydrodynamic forces were imposed on an associated microtubule, kinesin-1 dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (37)." This statement appears not to be true. In reference 37, very similar to the geometry reported here, the microtubules were fixed on the surface, and the stepping of single kinesin motors attached to large beads (to which defined forces were applied by hydrodynamics) via long DNA linkers was studied. In fact, quite a number of statements made in the present manuscript have been made already in ref. 37 (see in particular sections 2.6 and 2.7), and the authors may consider putting their results better into this context in the Introduction and Discussion. It is also noteworthy to discuss that the (admittedly limited) data in ref. 37 does not indicate a "catch-bond" behavior but rather an insensitivity to force over a defined range of forces.

      The reviewer misquoted our sentence. The actual wording of the sentence was: “When kinesin-1 was connected to micron-scale beads through a DNA linker and hydrodynamic forces parallel to the microtubule imposed, dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (Urbanska et al., 2021).” The sentence the reviewer quoted was in a previous version that is available on BioRxiv and perhaps they were reading that version. Nonetheless, in the Discussion of the revision, we added text to note that this behavior is indicative of an ideal bond (not a catch-bond) on Lines 480-483: “When kinesin-1 was connected to micron-scale beads through a DNA linker and hydrodynamic forces parallel to the microtubule imposed, dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics and instead characteristic of an ideal-bond.” We also added a sentence in the Introduction highlighting this work, Lines 84-87: “Fourth, when kinesin-1 was connected to a bead through a micron-long segment of DNA and hydrodynamic forces were imposed on the bead, motor interaction times were insensitive to hindering loads up to 3 pN, indicative of an ideal-bond.”

      Reviewer #3 (Public review):

      The authors attribute the differences in the behaviour of kinesins when pulling against a DNA tether compared to an optical trap to the differences in the perpendicular forces. However, the compliance is also much different in these two experiments. The optical trap acts like a ~ linear spring with stiffness ~ 0.05 pN/nm. The dsDNA tether is an entropic spring, with negligible stiffness at low extensions and very high compliance once the tether is extended to its contour length (Fig. 1B). The effect of the compliance on the results should be addressed in the manuscript.

      This is an interesting point. We added the following paragraph in Lines 101-111 in the Geometry Consideration section of the Supplementary Methods.

      “Another consideration when comparing the DNA tensiometer to optical trap measurements is the relative stiffness of the trap and dsDNA. Optical trap stiffnesses are generally in the range of 0.05 pN/nm [12,13]. To calculate the predicted stiffness of the dsDNA spring, we computed the slope of theoretical force-extension curve in Fig. 1B. The stiffness is highly nonlinear and is <0.001 pN/nM below 650 nm extension. At the predicted stall force of 6 pN (960 nm extension), the dsDNA stiffness ~0.2 pN/nm, which is stiffer than most optical traps, but it is similar to the estimated 0.3 pN/nm stiffness of kinesin motors themselves[12,13]. An 8 nm step at this stiffness leads to a 1.6 pN jump in force, so it is reasonable to expect that motors are dynamically stepping at stall. Therefore, there is no reason to expect that stiffness differences between optical traps and the dsDNA spring are affecting the motor detachment kinetics.”

      Compared to an optical trapping assay, the motors are also tethered closer to the microtubule in this geometry. In an optical trap assay, the bead could rotate when the kinesin is not bound. The authors should discuss how this tethering is expected to affect the kinesin reattachment and slipping. While likely outside the scope of this study, it would be interesting to compare the static tether used here with a dynamic tether like MAP7 or the CAP-GLY domain of p150glued.

      Please see our response to Reviewer #2 Major Comment #4 above, which asks this same question in the context of intracellular cargo. In response to the point from Reviewer #3, we added the following sentence on Lines 654-656: “Additionally, any ‘rolling’ of a spherical cargo following motor detachment will tend to suppress the motor reattachment rate.”

      Regarding a dynamic tether, we agree that’s interesting – there are kinesins that have a second, non-canonical binding site that achieves this tethering (e.g. ncd and Cin8); p150glued likely does this naturally for dynein-dynactin-activator complexes; and we speculated in a review some years ago (Hancock, 2014) that during bidirectional transport kinesin and dynein may act as dynamic tethers for one another when not engaged, enhancing the activity of the opposing motor.

      In the single-molecule extension traces (Figure 1F-H; S3), the kinesin-2 traces often show jumps in position at the beginning of runs (e.g., the four runs from ~4-13 s in Fig. 1G). These jumps are not apparent in the kinesin-1 and -3 traces. What is the explanation? Is kinesin-2 binding accelerated by resisting loads more strongly than kinesin-1 and -3?

      We agree that at first glance those jumps are puzzling. To investigate this question the first thing we did was to go back to our tensiometer dataset and look systematically at jumps for all three motors. We found roughly 4-6 large jumps like these for all three motors (kinesin-1: 250 +/- 99 nm (mean +/- SD; N=5); kinesin-2: 249 +/- 165 nm (N=6); kinesin-3: 490 +/- 231 nm (N=4)). Thus, although the apparent jumps may be more pronounced due to the specific rebinding kinetics of kinesin-2, this behavior is not unique to this motor. (Note that the motor binding position distribution in Fig. S2 is taken from initial binding positions that follow a clear period of detachment; thus, not all jumps are captured there.)

      Our interpretation is that these apparent jumps are simply a reflection of the long length and high compliance of the dsDNA tether. For instance, below 650 nm extension the stiffness, k <0.001 pN/nM (see Reviewer #3, point #1 above). Thus, we expect large fluctuations of the tethered motor when not bound to the microtubule. One reason that these events look like ‘jumps’ is that the sub-ms fluctuations during detached periods are not captured by the ~25 fps movies (40 ms frame acquisition time). Instead, the fitted Qdot position represents the average position during the acquisition window. Actually, due to these rapid fluctuations (and the limited depth of the TIRF illumination field) the position often can’t be determined during these periods of fluctuation (e.g. see gaps at ~2.5 s, 11 s and 24 s in Fig. 1F).

      When comparing the durations of unloaded and stall events (Fig. 2), there is a potential for bias in the measurement, where very long unloaded runs cannot be observed due to the limited length of the microtubule (Thompson, Hoeprich, and Berger, 2013), while the duration of tethered runs is only limited by photobleaching. Was the possible censoring of the results addressed in the analysis?

      Yes. Please see response to Reviewer #2 points (8) and (9) above.

      The mathematical model is helpful in interpreting the data. To assess how the "slip" state contributes to the association kinetics, it would be helpful to compare the proposed model with a similar model with no slip state. Could the slips be explained by fast reattachments from the detached state?

      In the model, the slip state and the detached states are conceptually similar; they only differ in the sequence (slip to detached) and the transition rates into and out of them. The simple answer is: yes, the slips could be explained by fast reattachments from the detached state. In that case, the slip state and recovery could be called a “detached state with fast reattachment kinetics”. However, the key data for defining the kinetics of the slip and detached states is the distribution of Recovery times shown in Fig. 4D-F, which required a triple exponential to account for all of the data. If we simplified the model by eliminating the slip state and incorporating fast reattachment from a single detached state, then the distribution of Recovery times would be a single-exponential with a time constant equivalent to t<sub>1</sub>, which would be a poor fit to the experimental distributions in Fig. 4D-F.

      Recommendations for the authors: 

      Reviewing Editor Comments:

      The reviewers are in agreement with the motivation and approach of this study. The use of DNA tethers is an important advance in tethering motor proteins to gain insight into how motors respond to load. However, all 3 reviewers express reservations on how well the results support the claims. In particular, the use of the term catch bond was problematic, with Reviewer #2 suggesting some alternative nomenclature. Reviewer #1 expressed concern with experimental evidence for the predicted force-extension curve shown in Figure 1. I agree with the reviewers that additional experimental evidence would be required to conclude the catch-bond detachment kinetics of kinesin.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) By eye, the run lengths, e.g., of kin-1 look very long in Figure S1 ... certainly above the expected 1 µm. Please check and comment.

      We agree that the long runs do stick out by eye in this figure. To address this point, we analyzed the run lengths and run times from the kymograph shown in Fig. S1. Fitting the run duration distribution gave t = 1.31 s with a 95% CI of 0.96 to 1.67. This is slightly longer than the 1.04 s duration in Fig. 2A, but the 95% CI include this population mean, and so the S1 data are not statistically significantly different. The run time distribution from the S1 kymograph is given in Author response image 1.

      Author response image 1.

      (2) The upper right kymograph in Figure 4A does not show a motor return to the baseline. Also, the scale bars, etc., are unreadable. Please modify.

      Our purpose for showing the kymographs in Fig. 4A was to show the specific features of slips and fast and slow reattachment. Because we blew up the kymographs to show those specific features, it precluded us from showing the entire return to baseline. As suggested, we magnified the scale bars and the labels on the kymograph labels to make them readable.

      Reviewer #3 (Recommendations for the authors):

      (1) The frequent references to 95% confidence intervals disrupt the flow of the text. Perhaps the confidence intervals could be listed in a table rather than in the body of the text.

      We deleted those from the text; they are shown in Fig. 2D and listed in Table S2.

      We appreciate the efforts and helpful suggestions of all three reviewers and the Editor.

      References

      Block, S.M., L.S. Goldstein, and B.J. Schnapp. 1990. Bead movement by single kinesin molecules studied with optical tweezers. Nature. 348:348-352.

      Bouchiat, C., M.D. Wang, J. Allemand, T. Strick, S.M. Block, and V. Croquette. 1999. Estimating the persistence length of a worm-like chain molecule from force-extension measurements. Biophys J. 76:409-413.

      Ezber, Y., V. Belyy, S. Can, and A. Yildiz. 2020. Dynein Harnesses Active Fluctuations of Microtubules for Faster Movement. Nat Phys. 16:312-316.

      Hancock, W.O. 2014. Bidirectional cargo transport: moving beyond tug of war. Nat Rev Mol Cell Biol. 15:615-628.

      Howard, J. 2001. Mechanics of Motor Proteins and the Cytoskeleton. Sinauer Associates, Inc., Sunderland, MA. 367 pp.

      Korten, T., and S. Diez. 2008. Setting up roadblocks for kinesin-1: mechanism for the selective speed control of cargo-carrying microtubules. Lab Chip. 8:1441-1447.

      Kunwar, A., S.K. Tripathy, J. Xu, M.K. Mattson, P. Anand, R. Sigua, M. Vershinin, R.J. McKenney, C.C. Yu, A. Mogilner, and S.P. Gross. 2011. Mechanical stochastic tug-ofwar models cannot explain bidirectional lipid-droplet transport. Proc Natl Acad Sci U S A. 108:18960-18965.

      Kuo, Y.W., M. Mahamdeh, Y. Tuna y J. Howard. 2022. The force required to remove tubulin from the microtubule lattice by pulling on its alpha-tubulin C-terminal tail. Nature communications. 13:3651.

      Laakso, J.M., J.H. Lewis, H. Shuman, and E.M. Ostap. 2008. Myosin I can act as a molecular force sensor. Science. 321:133-136.

      Leidel, C., R.A. Longoria, F.M. Gutierrez, and G.T. Shubeita. 2012. Measuring molecular motor forces in vivo: implications for tug-of-war models of bidirectional transport. Biophys J. 103:492-500.

      Marko, J.F., and E.D. Siggia. 1995. Stretching DNA. Macromolecules. 28:8759-8770.

      Nicholas, M.P., F. Berger, L. Rao, S. Brenner, C. Cho, and A. Gennerich. 2015. Cytoplasmic dynein regulates its attachment to microtubules via nucleotide state-switched mechanosensing at multiple AAA domains. Proc Natl Acad Sci U S A. 112:63716376.

      Purcell, E.M. 1977. Life at low Reynolds Number. Amer J. Phys. 45:3-11.

      Pyrpassopoulos, S., H. Shuman, and E.M. Ostap. 2020. Modulation of Kinesin's Load-Bearing Capacity by Force Geometry and the Microtubule Track. Biophys J. 118:243253.

      Rai, A.K., A. Rai, A.J. Ramaiya, R. Jha, and R. Mallik. 2013. Molecular adaptations allow dynein to generate large collective forces inside cells. Cell. 152:172-182.

      Ramaiya, A., B. Roy, M. Bugiel, and E. Schaher. 2017. Kinesin rotates unidirectionally and generates torque while walking on microtubules. Proc Natl Acad Sci U S A. 114:10894-10899.

      Rao, L., F. Berger, M.P. Nicholas, and A. Gennerich. 2019. Molecular mechanism of cytoplasmic dynein tension sensing. Nature communications. 10:3332.

      Smith, S.B., L. Finzi, and C. Bustamante. 1992. Direct mechanical measurements of the elasticity of single DNA molecules by using magnetic beads. Science. 258:11221126.

      Sudhakar, S., M.K. Abdosamadi, T.J. Jachowski, M. Bugiel, A. Jannasch, and E. Schaher. 2021. Germanium nanospheres for ultraresolution picotensiometry of kinesin motors. Science. 371.

      Toleikis, A., N.J. Carter, and R.A. Cross. 2020. Backstepping Mechanism of Kinesin-1. Biophys J. 119:1984-1994.

      Urbanska, M., A. Ludecke, W.J. Walter, A.M. van Oijen, K.E. Duderstadt, and S. Diez. 2021. Highly-Parallel Microfluidics-Based Force Spectroscopy on Single Cytoskeletal Motors. Small. 17: e2007388.

      Wang, M.D., H. Yin, R. Landick, J. Gelles, and S.M. Block. 1997. Stretching DNA with optical tweezers. Biophys J. 72:1335-1346.

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Xiong and colleagues presents a compelling validation of UniDesign, a fully computational protein design framework, by using it to engineer a novel, PAM-relaxed variant of Staphylococcus aureus Cas9 (SaCas9) named KRH. The core achievement is the successful de novo generation of a high-performance nuclease (E782K/N968R/R1015H) solely through in silico modeling, without any subsequent experimental optimization or directed evolution. The authors demonstrate that KRH expands the SaCas9 PAM specificity from NNGRRT to NNNRRT, achieving genome editing and base editing efficiencies across multiple human cell types that are comparable to, and sometimes exceed, the well-known evolution-derived KKH variant. The work positions UniDesign not merely as an analytical tool, but as a powerful engine for the generative design of complex molecular functions, offering a scalable and mechanistically insightful alternative to traditional experimental screening.

      Strengths:

      This is an outstanding manuscript that serves as a powerful proof-of-concept for the next generation of computational protein design. The primary selling point-the raw predictive and generative power of UniDesign-is convincingly demonstrated throughout.

      The manuscript shows that the tool can:

      (1) successfully navigate a complex sequence landscape to identify a minimal set of three mutations (KRH) that remodel a critical protein-DNA interface;

      (2) accurately model and balance the delicate interplay between specific base contacts and non-specific backbone interactions to achieve relaxed PAM specificity;

      (3) deliver a final product whose performance is indistinguishable from, and in some cases superior to, a variant that required extensive wet-lab evolution.

      The experimental validation is rigorous, thorough, and directly supports the computational predictions. This work will stand as a landmark study for the field, illustrating that computational design has matured to the point where it can reliably generate sophisticated tools for genome engineering.

      (1) Demonstration of Generative Power:

      The most significant finding is that UniDesign, without any experimental feedback, generated a variant (KRH) that matches the performance of the evolution-derived KKH. This is a remarkable achievement. The iterative design strategy-first reducing PAM bias (R1015H), then restoring binding through non-specific interactions (e.g., N968R, E782K)-is a textbook example of rational design, but it is executed entirely by the algorithm. This validates UniDesign's energy function and search algorithm as capable of capturing the subtle biophysical principles governing PAM recognition.

      (2) Mechanistic Insight as a Built-in Feature:

      A key advantage of UniDesign highlighted by this work is its inherent ability to provide mechanistic explanations. The computational models not only predicted which mutations would work (e.g., N968R over N968K in the KRH variant) but also why they work. The structural and energetic analyses showing the bidentate salt bridge formed by Arg968 versus the single bond formed by Lys968 (Figure 4A) is a perfect example of how the tool's output can rationalize functional differences, a level of insight that is rarely attainable from directed evolution campaigns alone.

      (3) Scalability and Accessibility for Engineering:

      The authors explicitly contrast UniDesign's efficiency (minutes to hours per design run) with the computational expense of methods like COMET and the experimental overhead of directed evolution. The improvements to UniDesign v1.2, specifically the mutation-count and sequence-uniqueness penalties, directly address a key challenge in computational design (generating diverse, low-energy point-mutant libraries). This positions the tool as a highly accessible and scalable platform for engineering other CRISPR systems, a point that will be of immense interest to the community.

      We sincerely thank the reviewer for the comprehensive summary and the highly positive and encouraging comments on our manuscript.

      Weaknesses:

      (1) Title and Abstract Emphasis: The title and abstract are effective but could be slightly sharpened to emphasize the primary message. Consider a title like "Fully computational design of a PAM-relaxed SaCas9 variant with UniDesign demonstrates power to match directed evolution." The abstract could more explicitly state upfront that the design was achieved without any experimental iteration.

      We thank the reviewer for these valuable suggestions. We agree that our current title and abstract may be overly objective and neutral, and we will consider refining them during the formal revision.

      (2) Figure 1, Panel M: The data points in panel M are currently presented at a font size that makes them difficult to read, particularly the labels for the many triple-mutant variants. This density obscures the clear identification of the top-performing designs, such as the KRH variant selected for experimental validation. I recommend that the authors increase the font size of all text elements within this panel, including axis labels, tick marks, and data point labels, to improve legibility. If necessary, the panel dimensions can be adjusted or the layout reorganized to accommodate the larger text without compromising clarity. Ensuring this figure is readable is important, as it visually communicates the energetic convergence that led to the selection of KRH.

      We thank the reviewer for these valuable suggestions. We will refine the Fig. 1M during the formal revision.

      (3) Generality of the Design Strategy for Other PAM Positions:

      The design strategy focused on relaxing specificity at the highly constrained third position of the PAM (the guanine in NNGRRT). How transferable is this specific strategy (i.e., disrupting a key specific contact and compensating with non-specific backbone binders) to relaxing other positions in the PAM or to other Cas enzymes with different PAM-interaction architectures? A short discussion on this point would help readers understand the broader applicability of the "fine-tuning the balance" principle.

      We thank the reviewer for this insightful question and suggestion. The current study builds upon our previous work on CRISPR–Cas PAM recognition modeling using UniDesign (PMID: 37078688), in which eight Cas9 proteins and two Cas12 proteins (each has a different PAM) were investigated. Our computational results demonstrated that UniDesign effectively captures the mutual preferences between natural PAMs and native PAM-interacting amino acids (PIAAs). For example, UniDesign accurately predicted the canonical PAMs of SpCas9 and SaCas9 as NGG and NNGRRT, respectively; conversely, given their canonical PAMs, UniDesign successfully recapitulated the corresponding PIAAs in both systems.

      These findings provide the foundation for the present study and motivate our selection of SaCas9 as a representative system to explore PAM relaxation, thereby further demonstrating UniDesign’s predictive power through experimental validation. Although we did not perform similar PAM relaxation designs for other Cas9 or Cas12 proteins, we believe that the UniDesign framework is broadly generalizable and can be readily extended to these systems. We will include additional discussion to clarify this point and highlight the broader applicability of our design strategy.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes the fully in silico design of a new variant of Staphylococcus aureus Cas9 (SaCas9) using an improved UniDesign workflow.

      The design strategy consists of three sequential steps:

      (1) reducing positional bias at PAM position 3;

      (2) restoring DNA binding through nonspecific interactions;

      (3) combining individually favorable substitutions.

      The overall pipeline is conceptually elegant and logically structured, and the genome-editing activity of the designed variants is comprehensively characterized. The resulting KRH variant exhibits relaxed PAM specificity, expanding the targeting range of SaCas9 across diverse cell types. Notably, the KRH variant demonstrates performance comparable to that of the evolution-derived KKH variant, underscoring the effectiveness of the proposed computational design framework.

      Strengths:

      The design pipeline is entirely computational and does not rely on experimental data for pretraining or iterative optimization.

      We thank the reviewer for the concise and accurate summary of our manuscript.

      Weaknesses:

      The computationally generated KRH mutant differs from the experimentally evolved KKH variant by only a single residue, which may reflect insufficient exploration of the available sequence space.

      We thank the reviewer for this insightful critique. In the present study, our strategy was not to allow UniDesign to freely explore all 27 mutable positions simultaneously, but rather to constrain the search to point mutations (e.g., double or triple mutants) within the full sequence space (approximately 20^27). Even with this constraint, UniDesign effectively samples a substantially large design space compared to traditional protein engineering approaches.

      Through iterative design, we observed that only certain residue types became enriched at a subset of positions when identifying effective double mutants. These enriched residues were then systematically combined to generate performance-enhancing triple mutants in an automated manner. Although we ultimately selected the KRH mutant for experimental validation due to its high similarity to the known KKH variant, UniDesign also proposed additional multi-mutants that are distinct from KKH.

      Reviewer #3 (Public review):

      Summary:

      This study reports KRH, a SaCas9 variant computationally engineered via UniDesign to recognize an expanded NNNRRT PAM with substantially enhanced editing efficiency at non-canonical sites. KRH achieves genome- and base-editing efficiencies comparable to or exceeding the evolution-derived KKH variant across multiple human cell types, demonstrating that computational design can effectively remodel PAM specificity while preserving nuclease activity.

      Strengths:

      The research follows a clear line of reasoning, and the results appear sound. The computational design strategy presented offers a valuable alternative to directed evolution, with potential applicability beyond Cas9 engineering.

      We thank the reviewer for the concise and accurate summary of our manuscript.

      Weaknesses:

      The benchmarking of the UniDesign method is insufficient. How its performance compares to other protein design algorithms, whether the energy function parameters were systematically optimized, and if the design strategy can be generalized to other Cas9 orthologs or genome engineering tasks.

      We thank the reviewer for this valuable critique. The present study builds upon our previous work on CRISPR–Cas PAM recognition modeling using UniDesign (PMID: 37078688), in which many of these concerns were systematically addressed. In that study, UniDesign was benchmarked against Rosetta, a well-established protein design platform, across eight Cas9 proteins and two Cas12 proteins, each recognizing distinct PAM sequences.

      Our results demonstrated that UniDesign effectively captures the mutual preferences between natural PAMs and native PAM-interacting amino acids (PIAAs) across these CRISPR–Cas systems. For example, UniDesign accurately predicted the canonical PAMs of SpCas9 and SaCas9 as NGG and NNGRRT, respectively; conversely, given their canonical PAMs, UniDesign successfully recapitulated the corresponding PIAAs in both systems.

      These findings provide the foundation for the present study and motivate our selection of SaCas9 as a representative system to explore PAM relaxation, thereby further demonstrating UniDesign’s predictive power through experimental validation. Although we did not perform analogous PAM relaxation designs for other Cas9 or Cas12 proteins in this work, we believe that the UniDesign framework is broadly generalizable and can be readily extended to these systems. We will incorporate additional discussion in the revised manuscript to address these points and clarify the broader applicability of our approach.

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      While the results show some loss in the eyelid meibomian glands, there is significant gland retention in HSD3b6 KO mice, as shown in Figure 2. This is supported by the lack of DEG patterns showing downregulation of Meibum lipid genes (AWAT2, Far2, Soat1, Plin2, SCD, etc.), and no decrease in Pparg expression, known to be critical for meibomian gland lipid gene expression.

      Weaknesses:

      It should be noted that while the authors indicate that CD38 is significantly up-regulated in the HSD3b6 KO mouse, the increase was not sufficient to show a significant adjusted P-value. Bulk RNA sequencing also shows no significant change in meibum lipid gene expression for aged mice that are treated with 78c, an inhibitor of CD38, which the authors indicate increases NAD levels, leading to increased meibomian gland size compared to vehicle-treated mice. Unfortunately, there was no increase in meibum lipid gene expression with 78c, as identified by adjusted P-value. However, it should be noted that the supplemental file covering DEG expression was labeled as a Microarray analysis. This did not include the 78c+NMN treated mice, which the authors contend show a more impactful effect on the meibomian gland.

      We thank the reviewer for the careful evaluation and insightful comments regarding the interpretation of meibomian gland phenotypes and gene expression profiles.

      Regarding the point on the apparent retention of meibomian gland structure and the lack of downregulation of key lipid-related genes (e.g., Awat2, Far2, Soat1, Plin2, Scd, and Pparg), we agree that these observations are important for interpreting the extent of gland dysfunction. In the revised manuscript, we will more clearly present and discuss the RNA-seq data, including the expression profiles of representative meibomian gland lipid genes (and other DEGs), to better contextualize these findings.

      With respect to Cd38 expression, we acknowledge that the statistical significance based on adjusted P-values was limited in the current microarray dataset. To address this point, we will perform additional validation using targeted quantitative PCR with specific primers to more accurately assess Cd38 expression changes.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors demonstrate strong correlations between a pro-inflammatory state, the activity of an intracrine hormone (3 beta-hydroxysteroid dehydrogenase, 3B-HSD), and the NAD co-factor. Specifically, in a 3B-HSD knockout mouse, there was an upregulation in pro-inflammatory cytokines and increased CD38+ cells (CD38 is an enzyme that depletes NAD, a necessary cofactor for 3B-HSD activity). Conversely, induction of inflammation in the eyelids resulted in reductions in 3B-HSD activity. Supplementation with 5 alpha-dihydrotestosterone (DHT) or the NAD precursor NMN, and inhibition of CD38 activity (78c), corrected the pathologies observed in both the 3B-HSD knockout mouse and the pro-inflammatory model (LPS injection into eyelids).

      Strengths:

      The experiments were performed with good rigor, assessing the impact of inflammation and 3B-HSD activity using multiple model systems. The endpoints represented a combination of transcriptional changes, protein quantification, enzymatic activity, and immunofluorescent microscopy. The authors use human tissue from both younger and older individuals to justify their hypotheses that increased CD38 + cells and reduced 3B-HSD quantity exist in older individuals. The data provide the foundation for assessing more global changes to the tear film and ocular surface.

      Weaknesses:

      The main weaknesses of the study include the following:

      (1) An absence of information on meibomian gland health, tear film, and ocular surface.

      (2) Too few human subjects to validate the hypotheses.

      Conclusion:

      Overall, this study demonstrates an important relationship that exists between intracrine signaling, inflammation, and cofactor signaling. It represents a novel approach in therapeutic design for patients with meibomian gland dysfunction.

      We thank the reviewer for the positive evaluation of our study and for recognizing the rigor of the experiments, the use of multiple model systems, and the potential of the data to provide a foundation for further investigation.

      Regarding the points raised under weaknesses, we agree that evaluation of meibomian gland function, tear film, and ocular surface phenotypes would provide important additional insight. In the present study, we focused primarily on the structural phenotype of the meibomian gland, particularly gland size, as a primary feature of MGD. We acknowledge that pathological assessments of gland function and ocular surface conditions have not been fully addressed. We will clearly state this limitation and expand the Discussion to position these aspects as important directions for future investigation.

      With respect to the limited number of human samples, we acknowledge that this is an important consideration for validating the translational relevance of our findings. We will revise the manuscript to more explicitly address this limitation and interpret the human data with appropriate caution.

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to investigate whether disruption of intracrine steroid hormone metabolism contributes to meibomian gland dysfunction and proposed a "vicious cycle" of gland dysfunction and inflammation, using a global Had3b6 knockout mouse model. The work addresses an important aspect of MGD, but its impact may be limited unless the intracrine mechanism can be more clearly distinguished from systemic hormonal effects.

      Strengths:

      This study addressed an important question. The hormonal regulation of the meibomian gland has long been recognized. If clarified, the concept of local steroid metabolism influencing gland homeostasis could have implications for understanding disease mechanisms and identifying therapeutic targets.

      Weaknesses:

      The use of a global knockout makes it difficult to separate local intracrine effects from systemic hormonal changes, and key controls and hormone measurements are lacking.

      LPS-induced inflammation may not reflect the chronic nature of MGD.

      We thank the reviewer for the thoughtful evaluation and for highlighting the importance of distinguishing intracrine mechanisms from systemic hormonal effects.

      We agree that, as currently presented, the use of a global Hsd3b6 knockout model makes it difficult to fully separate local intracrine effects from systemic hormonal changes. This point is also consistent with the major concern raised in the editorial assessment regarding the need to more clearly establish the proposed intracrine mechanism. To address this issue, we will strengthen the evidence for intracrine regulation by incorporating additional analyses. Specifically, we will assess systemic testosterone levels in Hsd3b6 knockout mice and include appropriate controls using orchidectomized (ORX) mice. These analyses will help to better distinguish local intracrine mechanisms from systemic hormonal influences.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) As mentioned above, numerous studies have reported that the number of MuSCs declines with aging. The authors' claim is valid, as Pax7 and Vcam1 were widely used for these observations. However, age-related differences have also been reported even when using these markers (Porpiglia et al., Cell Stem Cell 2022; Liu et al., Cell Rep 2013). (a) When comparing geriatric Vcam1⁺ MuSCs with young MuSCs in this study, did the authors observe any of the previously reported differences? (b) Furthermore, would increasing the sample size in Figure 1 reveal a statistically significant difference? The lack of significance appears to result from variation within the young group. (c) In addition, this reviewer requests the presentation of data on MuSC frequency in geriatric control mice using CD200 and CD63 in the final figure.

      (a) When comparing geriatric Vcam1<sup>+</sup> MuSCs with middle aged MuSCs, we found 1,428 DEGs, where 701 genes were downregulated and 727 genes were upregulated (Fig. S3E). Some of the pathways altered were similar to previously reported differences, such as alterations in the autophagy-lysosome related genes and PI3K-Akt Pathways. However, these alterations did not affect the functional integrity of geriatric Vcam1<sup>+</sup> MuSCs (Fig. 3 A-F). On the other hand, greater alterations were observed in geriatric Vcam1<sup>-</sup> MuSCs, accompanied by functional impairment. We have added further elaborations in the manuscript to reflect the comment from the reviewer (pg. 17, lines 369-379).

      (b) Thank you for this helpful comment. We understand the reviewer’s concern that the variability within the young group may contribute to the absence of statistical significance. We respectfully note that the variance observed in the young cohort could be biologically expected rather than technical noise. Multiple studies have shown that young adult MuSCs display great transcriptional and functional heterogeneity from undergoing post-natal myogenic maturation (e.g., Biressi et al., 2010; Tierney & Sacco, 2016; Motohashi & Asakura, 2014). This broader heterogeneity naturally increases variance in marker distribution within young samples. We would also like to clarify that our main conclusions are not solely based on differences in the overall proportion of YFP⁺ and Lin⁻ cells among age groups. Instead, we also rely on the functional and phenotypic heterogeneity that specifically emerges in geriatric MuSCs.

      Although the young group shows greater biological variation, the mean values are relatively similar among the groups. Multiple independent datasets in our study including functional performance and molecular profiles consistently show that the total MuSC frequency does not markedly decline with aging. For these reasons, even if the sample size is increased, we do not expect a change in the overall interpretation of this result. We have revised the Results section to acknowledge the variability observed in the young group and to emphasize that total MuSC frequency is not central to the conclusions of this study (pg. 6, lines 129-134).

      (c) MuSC frequency in geriatric control mice using CD200 and CD63 in the final figure are in the figure legend of Fig. 5F (pg. 39, line 825-828).

      (2) Can the authors identify any unique characteristics of Pax7-VCAM-1 GERI-MuSCs using only the data generated in this study, without relying on public databases? For example, reduced expression of Vcam1 and Pax7. The results of such analyses should be presented.

      In Fig S2C, using the bulk-RNA sequencing data generated in this study, we observe reduced expression of both Pax7 and Vcam1 in Pax7-VCAM-1 GERI-MuSCs population. To better highlight this finding, we have added text in the Results section that explicitly describes the reduced Pax7 expression and Vcam1 loss as distinguishing features of Pax7-VCAM-1 GERI-MuSCs in our dataset (pg. 9, lines 199-200).

      (3) In the senolysis experiment, the authors state that GER1-MuSCs were depleted. However, no data are provided to support this conclusion. Quantitative cell count data would directly address this concern. In addition, the FACS profile corresponding to Figure 4D should be included.

      In Figure 4D we quantified the frequency of VCAM1 Low YFP positive Lin negative MuSCs after senolysis treatment. This analysis shows a clear trend toward a decrease in the GERI subpopulation, although the difference did not reach conventional statistical significance in this experiment (t test p = 0.0596). We have therefore revised the text to describe this as a reduction trend rather than complete depletion, and we now explicitly report the p value in the results section (pg. 12, line 270-272). Furthermore, representative FACS profiles for Figure 4D is now included with the quantification (pg. 38, line 811-814).

      (4) Figure S4: It remains unclear whether DHT enhances regenerative ability through restoration of the VCAM1 expression in GER1-MuSCs, as DHT also acts on non-MuSC populations. Analyses of the regenerative ability of Senolysis+DHT mice may help to clarify this issue.

      We thank the reviewer for this important insight. We agree that DHT can act on non-stem cell populations in the muscle environment and therefore we cannot conclusively attribute the improved regenerative performance solely to restoration of VCAM1 expression in GERI-MuSCs. To address this concern, we have revised the discussion to explicitly state this limitation and to clarify that DHT may influence multiple cell types that contribute to muscle regeneration. We also indicate that combined senolysis plus DHT treatment would be an informative future approach, although additional animal experiments were not feasible within the scope of the current study (pg. 18, line 382-390).

      (5) Why are there so many myonuclear transcripts detected in the single-cell RNA-seq data? Was this dataset actually generated using single-nucleus RNA-seq? This reviewer considers it inappropriate to directly compare scRNA-seq and snRNA-seq results.

      Regarding the question of why many myonuclear transcripts were detected and whether this dataset was generated using single nucleus RNA sequencing, we confirm that the experiments were performed using single cell RNA sequencing. The presence of myonuclear transcripts likely reflects partial nuclear leakage or fragmentation during the enzymatic dissociation of aged muscle tissue. This is a known technical issue when preparing single cell suspensions from adult or geriatric skeletal muscle.

      To avoid inappropriate interpretation, we identified the myonuclear transcript enriched cluster and excluded it from all downstream analyses that involve MuSC comparison. Therefore, our major conclusions do not rely on this cluster. We have revised the Results text to clearly state that the dataset was generated using single cell RNA sequencing and to explain how myonuclear transcript-positive cells were handled (pg. 8, lines 176-181).

      Reviewer #2 (Public review):

      In this study, Kim et al. explore the heterogeneity within the aged MuSC population using a mouse model that enables lineage tracing of MuSCs throughout life. The questions addressed in the manuscript are highly relevant to the fields of aging and stem cell biology, and the experimental approach overcomes limitations of earlier studies. However, some of the claims would benefit from additional data analysis, and the central claim of the identification of a "previously unrecognized subpopulation" of aged MuSCs should be evaluated in light of prior work that has also examined MuSC heterogeneity in aging.

      Specific points:

      (1) As a general comment that is transversal to multiple figures, several experiments should include a direct comparison to a young cohort. Previous studies have shown that the depletion of subpopulations with aging is observed early in the aging process, for example, the loss of Pax7-high MuSCs is observed already in 18‐month‐old mice (Li, 2019, doi: 10.15252/embj.2019102154). Using only mice at 12-14 months as the control group is therefore insufficient to claim that no changes occur with aging.

      We thank the reviewer’s suggestion for comparing the aged mice to a young cohort and we acknowledge that previous studies have observed depletion of subpopulations is observed early in the aging process. However, this study is specifically designed to delineate the transition from middle aged to geriatric stages, rather than to characterize differences that are already well established in young versus geriatric comparisons. Previous studies have extensively documented the decline in MuSC function between young and aged animals, whereas the process and timing by which these changes emerge remain unclear. Our results show that major alterations in MuSC phenotype and identity are detected predominantly in the geriatric stage rather than at the middle aged stage. To avoid any misunderstanding, we have revised the text to clearly state that the primary objective of this work is to define the critical shift that occurs from middle aged to geriatric muscle stem cells (page 3-4, line 67-71).

      (2) One of the central claims of the manuscript is a challenge to the notion that MuSCs number declines with age. However, the data analysis associated with the quantification of YFP+ cells needs to be expanded to support this conclusion. The authors present YFP+ cells only as a proportion of Lin-neg cells. Since FAP numbers are known to decrease with aging, a stable proportion of YFP+ cells would simply indicate that MuSCs decline at the same rate as FAPs. To more accurately assess changes in MuSC abundance, the authors should report absolute numbers of YFP+ cells normalized to tissue mass (cells/ mg of muscle).

      We thank the reviewer for this helpful suggestion. We agree that a proportion based analysis alone does not fully exclude the possibility that MuSCs and FAPs decrease at similar rates during aging. At the time of isolation, muscle mass was not recorded, so we are unable to report YFP<sup>+</sup> cell numbers normalized to tissue weight as requested. To partially address this limitation, we have now clarified our gating strategy in the methods and Figure 1 to explicitly indicate Sca1<sup>+</sup> FAP exclusion (pg. 6, line 121-122, pg. 22, lines 460-463). These analyses do not support a major selective loss of MuSCs relative to other mesenchymal populations with aging.

      (3) The authors emphasize that several studies use VCAM1 as a surface marker to identify MuSCs. However, many other groups rely on α7-integrin, and according to Figure 1D, the decline in ITGA7 expression within the YFP+ population is not significant. Therefore, the suggestion that MuSC numbers have been misquantified with aging would apply only to a subset of studies. If the authors can demonstrate that YFP+ cell numbers (normalized per milligram of tissue) remain unchanged in geriatric mice, the discussion should directly address the discrepancies with studies that quantify MuSCs using the Lin−/α7-integrin+ strategy.

      We thank the reviewer for this important comment. We agree that VCAM1 is only one of several commonly used surface markers for MuSC identification and that many studies quantify MuSCs using the Lin negative and ITGA7 positive strategy. That is why in our study, in addition to VCAM1, we also examined ITGA7 expression within the YFP positive population. Although the mean ITGA7 level did not significantly decline, the variance among geriatric MuSCs was significantly increased based on the F test. This supports the idea that aging does not uniformly reduce marker expression but instead increases phenotypic instability, which could lead to under detection of a subset of MuSCs even when ITGA7 is used as the primary marker. We have added this interpretation to the Discussion (pg. 16, lines 346-355).

      (4) The authors focus their attention on a population of VCAM-low/VCAM-neg subpopulation of MuSCs that is enriched in aging. However, the functional properties of this same population in middle-aged (or young) mice are not addressed. Thus, it remains unclear whether geriatric VCAM-low/VCAM-neg MuSCs lose regenerative potential or whether this subpopulation inherently possesses low regenerative capacity and simply expands during aging.

      We thank the reviewer for this comment. In young and middle aged mice, the VCAM low or VCAM negative population is extremely small, nearly absent in most samples. The emergence and expansion of this population is therefore a feature that becomes detectable only at the geriatric stage. Given that these cells are not present in appreciable numbers earlier in life, the reduced regenerative performance observed in geriatric VCAM1<sup>low</sup> MuSCs likely reflects a phenotype that arises during aging rather than an inherent property of a pre-existing subpopulation. We have added this clarification to the Results section (pg. 7, lines 142-146).

      (5) According to Figure 1F, the majority of MuSCs appear to fall within the category of VCAM-low or VCAM-neg (over 80% by visual estimate). It would be important to have an exact quantification of these data. As a result, the assays testing the proliferative and regenerative capacity of VCAM-low/negative cells are effectively assessing the performance of more than 80% of geriatric MuSCs, which unsurprisingly show reduced efficiency. Perhaps more interesting is the fact that a population of VCAM-high geriatric MuSCs retains full regenerative potential. However, the existence of MuSCs that preserve regenerative potential into old age has been reported in other studies (Garcia-Prat, 2020, doi: 10.1038/s41556-020-00593-7; Li, 2019, doi: 10.15252/embj.2019102154). At this point, the central question is whether the authors are describing the same aging-resistant subpopulations of MuSCs using a new marker (VCAM) or whether this study truly identifies a new subpopulation of MuSCs. The authors should directly compare the YFP+VCAM+ aged cells with other subpopulations that maintain regenerative potential in aging.

      We thank the reviewer for this comment. First, in response to the request for precise quantification, we now provide the proportions of VCAM1-high and VCAM1-low/negative MuSCs in each age group in the figure legends for Fig.1F (pg. 34-35, lines 765-772). In geriatric mice, VCAM1 low/negative MuSCs represent approximately 44.6% ± 35.7%, whereas VCAM-high MuSCs represent 3.9% ± 1.8%. The substantial variability reflects mouse-to-mouse heterogeneity at very advanced ages.

      Importantly, our conclusions do not rely solely on the observation that a large fraction of geriatric MuSCs exhibit reduced regenerative potential. Rather, the VCAM-low state represents a transcriptionally and functionally distinct subpopulation that emerges specifically in the geriatric stage, and exhibits molecular signatures not present in young or mid-aged MuSCs. We have expanded the Results and Discussion to clarify this point.

      Regarding whether VCAM-high geriatric MuSCs correspond to previously reported “aging-resistant” MuSCs (e.g., Garcia-Prat 2020; Li 2019), we agree that there may be conceptual overlap, as both populations retain regenerative activity. However, those studies identified resilient MuSCs based on mitochondrial or Pax7-high properties, whereas our classification is based on surface VCAM1 intensity, and we currently lack direct evidence that these populations are equivalent. We have therefore added a statement acknowledging this possibility while clarifying that our work does not claim that VCAM1-high MuSCs represent a newly discovered resilient subset, but instead focuses on the emergence and characterization of the VCAM-low dysfunctional subpopulation (pg. 16, lines 346-355).

      (6) In Figure 3F, it is unclear from the data presentation and figure legend whether the authors are considering the average of fiber sizes in each mouse as a replicate (with three data points per condition), or applied statistical analysis directly to all individual fiber measurements. The very low p-values with n=3 are surprising. It is important to account for the fact that observations from the same mouse are correlated (shared microenvironment, mouse-specific effects) and therefore cannot be considered independent.

      We thank the reviewer for raising this important statistical point. We fully agree that individual myofibers from the same mouse are not independent biological replicates. In morphometric analyses of regenerated muscle, however, it is standard practice to analyze the full CSA distribution across all regenerated fibers, as the distribution itself (rather than a per-mouse mean) provides the biologically relevant measure of regeneration quality.

      The original analysis therefore treated each regenerated fiber as a component of the overall CSA distribution, not as an independent biological replicate, and the statistical comparison was performed at the level of distributions rather than per-mouse replication. We agree that per-mouse averaged CSA values would also be informative, but the raw data were not archived in a format that allows reconstruction of mouse-specific fiber subsets.

      Importantly, the group-level CSA distribution differences are robust and remain clearly detectable regardless of statistical approach. We have added clarification in the figure legend to explicitly describe how CSA measurements were obtained and analyzed mouse (pg. 36, lines 796-800).

      (7) Regarding Figure 5, it is unclear why ITGA7, a classical surface marker for MuSCs that appears unchanged in aged YFP+ MuSCs (Fig. 1F), is considered inadequate for detecting and isolating GERI-MuSCs.

      We thank the reviewer for raising this point. As shown in Figure 1F, the mean ITGA7 expression level does not significantly decline in geriatric YFP positive MuSCs. However, the variance of ITGA7 expression is significantly increased in geriatric MuSCs based on the F test, indicating instability in surface marker expression. This suggests that a fraction of MuSCs may fall below the conventional gating threshold for ITGA7 during aging. Therefore, ITGA7 remains effective for identifying a large portion of MuSCs but may under detect the subset of geriatric MuSCs with reduced marker expression. We have revised the Discussion to clarify this point (pg. 16, lines 346-355).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 3B: In the colony formation assay, the authors should specify the number of biological replicates and the number of cells analyzed per mouse.

      We have now added the number of biological replicates and the number of cells analyzed per mouse in the figure legend of Figure 3B (pg. 37, lines 790-791).

      (2) Figure 3F: The replication number is indicated as n = 3, which appears to refer to the number of transplanted mice. How many myofibers were analyzed in each transplanted mouse? The authors should provide a more detailed description of the methodology in the Figure legend or M&M.

      We thank the reviewer for the question and clarify that n = 3 refers to three independent transplanted mice per group. For each mouse, the entire TA muscle was cryosectioned and immunostained, and all regenerated fibers containing centrally located nuclei were included in the CSA quantification. We have added clarification in the Figure legend to indicate that quantification was performed on all regenerated fibers from each mouse (pg. 37, lines 796-800).

      (3) Figure 4: The RNA-seq results are presented as a single dataset per sample. If multiple experiments were performed, individual datasets should be shown. Replicated analyses are essential to ensure the reliability of the findings.

      In response to the reviewer comment, we confirm that the RNA sequencing in Figure 4 was performed with 3-4 independent biological replicates for each condition. These replicates showed very consistent sequencing quality and gene expression profiles and were therefore combined for the differential expression analysis. We have revised the materials and methods to clearly describe the number of biological replicates and the analysis workflow. (pg. 25, lines 543).

      (4) Line 148: If the authors examined MyoG expression, it should be described as committed myoblasts.

      We have now changed the term from myoblasts to committed myoblasts (pg. 8, line 168).

      (5) Typo and Referencing Errors:

      (a) Line 244: The term 'Antide' appears to be a typo.

      We thank the reviewer for noting this point. ‘Antide’ is not a typo but the correct name of a GnRH antagonist (Antide acetate). To avoid confusion, we have revised the text to specify ‘Antide, a GnRH antagonist’ at its first mention (pg. 13, line 289).

      (b) Lines 278, 280: Please correct Figure 5H to Figure 5F.

      We apologize for this error. We have fixed the figure notations accordingly (pg. 15, lines 326-330).

      (c) Some references are incomplete or inappropriate (ex. line 49, line 71, line 86, line 109).

      We apologize for this error. We have fixed the references accordingly (pg. 4, line 94, pg.6, line 117).

      (d) Line 49: Skeletal muscle regeneration is orchestrated primarily by tissue resident stem cells, known as muscle stem cells (MuSCs) or satellite cells (Relaix et al., 2021). The following paper should be cited:

      Satellite cell of skeletal muscle fibers.

      MAURO A. J Biophys Biochem Cytol. 1961 Feb;9(2):493-5.

      The reference has been revised (pg. 3, line 49).

      (e) Line 109: Paired box protein 7 (Pax7) is a transcription factor widely recognized as a defining marker of MuSCs (Sambasivan et al., 2011). The following paper should be cited:

      Pax7 is required for the specification of myogenic satellite cells.

      Seale P, Sabourin LA, Girgis-Gabardo A, Mansouri A, Gruss P, Rudnicki MA. Cell. 2000 Sep 15;102(6):777-86.

      The reference has been revised (pg.6, line 117).

      (6) Lines 73-74: Many rejuvenation studies define 'aged' mice as 12 to 24 months old. This reviewer is not aware of any studies that have examined 12-month-old MuSCs as a model of aging.

      We apologize for this error. We have fixed the numbers to 18 months accordingly (pg. 4, line 94).

      Reviewer #3 (Recommendations for the authors):

      (1) Geriatric versus aged mice in the MuSC subpopulation analysis. The authors use geriatric mice (>28 months) to demonstrate the loss of VCam expression in MuSCs and propose that this accounts for previous reports of decreased MuSC numbers in aged contexts. However, as noted in their introduction, most reports use "aged" mice, which are typically around 24 months old, which is biologically distinct from the geriatric stage. This distinction makes it difficult to conclude that the reported decline in MuSC numbers in aged mice can be explained by the phenomenon observed only in geriatric mice (Line 289). The authors should test whether VCam expression is altered in aged (24-month-old) mice to strengthen this argument.

      We appreciate the reviewer’s thoughtful comment and agree that 24 month old mice are commonly used as an aged reference in the literature. However, prior studies using 18 to 24 month old animals have reported inconsistent results regarding whether and to what extent MuSCs decline during this period. To avoid ambiguity from intermediate aging stages, we purposefully selected geriatric mice older than 28 months, a condition under which MuSC depletion has been more consistently reported in previous studies. Notably, our data show that even at this stage MuSC abundance is not dramatically reduced, which makes it unlikely that a robust decline would already be present at 24 months. We have clarified this rationale in the revised text. Although investigating the precise timing of the emergence of these changes at earlier time points is an important future direction, it is beyond the scope of the present study.

      (2) Variability and bimodal distributions.

      Figure 1b: The decline in VCAM+ MuSCs in geriatric mice shows high variability - 3 of 7 replicates align more closely with young/mid-aged levels. Please clarify this variability.

      We thank the reviewer for pointing out the variability. We agree that there is heterogeneity in the extent of VCAM1 reduction across geriatric mice. This variability likely reflects animal-to-animal differences in the onset and progression of aging-related phenotypes, which are known to vary at very advanced ages. Importantly, despite this variability, all geriatric samples contain a detectable VCAM1 low population that is not observed in young or middle-aged mice, and the overall trend is consistent across all replicates. We have clarified this in the revised manuscript (pg. 6, lines 125-127).

      Figure 1c: While the Mid and Geriatric groups are tightly clustered, the Young group appears bimodal, which challenges the claim (Line 118) that values are "comparable across ages." Since all males were used and it is not sex related, what is driving this bimodal distribution?

      We appreciate the reviewer’s observation regarding the variability in the young group. Muscle stem cells in young adult mice are known to encompass diverse transcriptional and functional substates, which contribute to greater biological heterogeneity at this stage (Biressi et al. 2010; Tierney & Sacco 2016; Motohashi & Asakura 2014). As aging progresses, these substates gradually converge toward a common functional phenotype, resulting in more uniform profiles in middle-aged and geriatric mice. Therefore the bimodal appearance in the young group likely reflects the broader developmental heterogeneity of early adult MuSCs rather than a technical discrepancy. We have added this explanation to the revised in the results section (pg.6. lines 129-134).

      Figure 4D: Geriatric replicates also display a trimodal distribution. This should be addressed throughout - what is causing these types of distribution, and how does this impact significance tests and conclusions?

      We appreciate the reviewer’s observation regarding the multimodal distribution. We interpret this pattern as reflecting increased individual variability that becomes more pronounced at the geriatric stage. Even though aging affects all mice, the extent and timing of age-related phenotypic changes can vary considerably across individuals at very advanced ages. This leads to broader divergence in VCAM1 expression states among geriatric mice. Therefore, when we look at the correlation between VCAM1 High and VCAM1 Low/- population, there exists a significant negative correlation between the two populations (Fig. S3F). We have clarified this interpretation in the text and note that the statistical analysis was performed using the mouse as the biological replicate, so this variability does not alter the overall conclusion (pg.12-13, lines 270-278).

      (3) The fate of the Vcam-low/negative cells should be better assessed. For example, Line 180: Colony formation is low/absent in VCAM-low/- cells. Are these cells still viable? Cell death assays are needed. Is expansion capacity truly impaired, or are the cells simply non-viable? Using gene expression as the only means (Line 300) to suggest not dying is insufficient.

      We thank the reviewer for this important point. As per the reviewer's analysis, there is lack of direct evidence to show that these cells are viable and apoptosis or viability assay would further strengthen our research. However, we carefully suggest that they are viable from the fact that these cells can be isolated by FACS and generate high quality RNA sequencing libraries, which would not be possible if they were undergoing cell death. Moreover, the transcriptomic data indicate upregulation of stress response and senescence associated pathways rather than apoptotic or necrotic signatures. These findings suggest that VCAM low or negative cells are alive but exhibit reduced proliferative and regenerative capacity. We have revised the text to clarify that our data reflect impaired function rather than loss of viability and that apoptosis assays represent a direction for future investigation (pg. 16, 360-366).

      (4) Transplant assays are suggestive, but could use additional characterization. Lines 191 & Figure 3E-F: While representative images match quantification, areas at the edge of VCAM-low/- TAs show signs of regeneration. Please include lower-magnification images. Additionally, assess early post-transplant engraftment efficiency - do certain populations experience a higher loss rate (cell death)? YFP-tracing would also help confirm the donor contribution to fibers.

      While we did not collect additional early time-point samples for new engraftment analyses, we carefully re-examined all available transplantation data, including the distribution and density of YFP<sup>+</sup> donor-derived cells in early post-injury sections. We did not observe patterns suggestive of differential early cell loss between VCAM-high and VCAM-low groups. Thus, although we cannot formally quantify early engraftment efficiency, the existing evidence does not support a model in which differential donor-cell retention accounts for the observed regenerative differences.

      Also, we attempted direct YFP co-staining of regenerated myofibers, but as reported by several groups, YFP signal within mature or regenerating myofibers is often diminished or inconsistent after fixation and permeabilization, making reliable fiber-level YFP detection technically challenging in our system. Therefore, instead, we confirmed donor contribution using PBS-injected control muscles, which lack donor MuSCs, and showed that PBS-injected muscles never generated YFP<sup>+</sup> fibers. This demonstrates that endogenous MuSCs do not contribute to YFP⁺ myofibers in our model, and therefore indirectly supports our suggestion that any YFP⁺-regenerated fiber necessarily originates from transplanted donor cells. We hope the reviewer understands the technical limitations.

      (5) Figure S3D: mRNA profiling suggests Mid-aged MuSCs are more distinct from Geriatric Vcam-hi than expected. This should be addressed or at least elaborated on in text.

      We appreciate this insightful comment. We agree that mid aged VCAM high MuSCs show detectable transcriptional differences from geriatric VCAM high cells. This pattern likely reflects the fact that some aging related molecular changes begin to accumulate gradually during the middle aged stage even before overt functional decline or VCAM1 loss becomes evident. Importantly, however, these transcriptomic shifts do not lead to the emergence of the VCAM low dysfunctional phenotype that is uniquely present in geriatric muscle. We have added clarification to the text noting that molecular alterations arise progressively while the major phenotypic transition in VCAM1 expression and regenerative impairment occurs at the geriatric stage (pg.11, 238-244).

      (6) The conclusion of senescence needs more support. Lines 218-226: p16 is elevated in VCAM-low/- cells, but drawing conclusions on senescence from 1-2 markers (mRNA) is insufficient. DQ Treatment: It's unclear how DQ alters cell composition in the absence of clear senescence markers (besides p16). Since DQ targets BCL-2/anti-apoptotic pathways, analyzing these signaling cascades is necessary. Line 255: The term "terminally senescent" is contradictory. These may be pre-senescent. It's also surprising DQ would target such cells, and further clarification is needed. Lines 307-313: Proposing a revised definition of senescence is premature. These cells may be pre-senescent, and multiple ways to senescence exist (replicative, stress-induced, etc.). Please clarify.

      We agree with the reviewer that the term 'terminally senescent' may be premature and potentially contradictory. Although p16 is elevated in this population, we acknowledge that one or two mRNA markers are insufficient to establish bona fide senescence, and that multiple senescence programs exist, including replicative, stress-induced, and mitochondrial-associated pathways. We have revised this to 'senescent-like' throughout the manuscript to better reflect the complexity of this state. Also, although beyond the scope of this study, we now emphasize that future studies incorporating additional senescence markers, functional assays, and lineage tracing will be required to determine the precise senescence status of VCAM-low MuSCs (pg.17-18, lines 381-392).

      Regarding DQ treatment, we agree that DQ is not selective for senescent cells, as it targets BCL-2–related survival pathways. The reduction of VCAM-low cells after DQ treatment therefore indicates increased dependence on survival signaling in this population rather than providing direct evidence of senescence. We have revised the text to clarify this interpretation (pg.12-13, lines 270-278).

      (7) Figure 5C: The Pax7+ cells appear interstitial rather than sublaminar. This raises questions about the specificity of staining. Providing lower-magnification images with these as insets may help.

      We thank the reviewer for this helpful comment. We agree that the high-magnification image in Figure 5C may give the impression that Pax7<sup>+</sup> cells are interstitial due to the limited field of view. We regret to inform the reviewer that low-magnification images for this sample are not available as these images were obtained via confocal imaging where we only recorded areas of interest. Therefore, we are unable to provide an additional panel at this time and we hope the reviewer understand.

      (8) CD63 and CD200 expression on Pax7-YFP traced cells. Figure 5: YFP-traced geriatric MuSCs co-stained for CD63 and CD200 are essential. Current data only show expression in Young traced cells. It's crucial to confirm whether protein/surface expression persists in geriatric YFP+ (traced) cells. The current Figure 5 F does not appear to include YFP tracing for geriatrics.

      We thank the reviewer for highlighting the importance of confirming CD63 and CD200 expression specifically in Pax7-YFP traced MuSCs from geriatric muscle. The datasets shown in Figure 5F were generated from wild-type C57BL/6 mice using a standard MuSC gating strategy rather than Pax7-YFP animals. All geriatric Pax7-YFP mice available for this study were exhausted during earlier experiments, and additional tissue is not available for new co-staining or FACS analyses. We now state this technical limitation in the manuscript and clarify that the geriatric CD63/CD200 data were obtained from conventionally isolated MuSC populations rather than YFP-traced cells (pg.18-19, lines 407-416).

      Minor points:

      (1) Please show the outliers in addition to the concentric circles. Figures 1B, C, and F are examples, but this should be addressed throughout.

      Outliers have been added where applicable.

      (2) Figure 2C: Was a significance test performed between the 5 dpi and "geri" fractions?

      We thank the reviewer for this important point. We have now performed the requested statistical comparison between the 5 dpi fraction and the geriatric VCAM1-defined subpopulations using the same analysis framework applied in Figure 2 (Kruskal–Wallis test followed by Dunn’s multiple comparisons).

      While 5 dpi MuSCs differed significantly from young MuSCs (adjusted p = 0.0139), the comparisons between 5 dpi and each geriatric subgroup (VCAM-high, -mid, and -low) did not reach statistical significance after correction for multiple testing (adjusted p = 0.17, 0.15, and 0.17, respectively). These results have been added to the revised Figure 2C corresponding figure legend (pg. 36, lines 777-780).

      Importantly, we now clarify in the text that although 5 dpi muscles display a prominent increase in VCAM1-high cells at the population level, this increase does not statistically exceed the variability observed within geriatric subpopulations under the conservative non-parametric testing framework used.

      (3) Line 155: The phrase "Surprisingly, all clusters mapped to quiescent clusters" is misleading; this is expected given the population type.

      We thank the reviewer for this helpful comment. We have revised the sentence to remove the misleading wording and now describe the observation more accurately (pg. 8 lines 180-181).

      (4) Line 211: The figure notation should be corrected from Figure S4E to Figure S3E.

      We apologize for this error. We have fixed the figure notation for Figure S4E to S3E (pg. 11, line 247).

      (5) Line 216: "All of which" seems overstated. Many populations share similar profiles with minor differences.

      We appreciate the reviewer’s comment. We agree that the phrase “all of which” overstated the degree of divergence among clusters. We have revised the wording to more accurately reflect the data (pg. 11-12, lines 252-253).

      (6) Line 270: The notations for panels D, E, and F need to be updated to match the figure. Panel "H" is not indicated in Figure 5.

      We apologize for this error. We have fixed the figure notations accordingly (pg. 15, lines 326-336).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Xu et al. reported base-resolution mapping of RNA pseudouridylation in five bacterial species, utilizing recently developed BID-seq. They detected pseudouridine (Ψ) in bacterial rRNA, tRNA, and mRNA, and found growth phase-dependent Ψ changes in tRNA and mRNA. They then focused on mRNA and conducted a comparative analysis of Ψ profiles across different bacterial species. Finally, they developed a deep learning model to predict Ψ sites based on RNA sequence and structure.

      This is the first comprehensive Ψ map across multiple bacterial species, and systematically reveals Ψ profiles in rRNA, tRNA, and mRNA under exponential and stationary growth conditions. It provides a valuable resource for future functional studies of Ψ in bacteria.

      We thank Reviewer 1 for the supportive and positive comments, particularly for highlighting the novelty and value of our comprehensive pseudouridine landscapes across multiple bacterial species as a valuable resource for the scientific community.

      Ψ is highly abundant on non-coding RNA such as rRNA and tRNA, while its level on mRNA is very low. The manuscript focuses primarily on mRNA, which raises questions about the data quality and the rigor of the analysis. Many conclusions in the manuscript are speculative, based solely on the sequencing data but not supported by additional experiments.

      We appreciate the insightful comments of Reviewer 1. We fully agree that Ψ is highly abundant on rRNA and tRNA, while its fractions on mRNA are generally lower. Ψ is highly conserved at specific positions in rRNA and tRNA, such as Ψ within tRNA T‑arm (position 55), where it plays essential roles in tRNA structural folding, tRNA stability, and mRNA translation, across plants, mammals, and bacteria[1–3]. However, most Ψ sites in mRNA exhibit lower fractions compared to rRNA and tRNA. This phenomenon is also widely observed in HeLa cell mRNA and plant mRNA, as evidenced by bisulfite-induced deletion sequencing and 2-bromoacrylamide-assisted cyclization sequencing[3–5]. In bacteria, the modifications on mRNA are harder to map and quantify, due to its low abundance in total RNA and difficulty in bacterial rRNA removal. This highlights the significance of our study.

      To prove our data quality and analytical rigor, we first present the most convincing sites in bacteria, as benchmark sites. Specifically, we detected 9 out of 10 known conserved pseudouridine (Ψ) sites in E. coli across two biological replicates [6], displaying notable modification fraction. Ψ516 site in E. coli 16S rRNA, which serves as a benchmark site, consistently exhibited a high modification fraction (~100%) under multiple growth conditions, underscoring the robustness of our method. In other strains, we also observed conserved 16S rRNA Ψ sites.

      To further demonstrate strong reproducibility and sensitivity. We selected three positive Ψ sites from two independent biological replicates for experimental validation, alongside one negative control site, using pseU‑TRACE method[6]. Ct values were first normalized to the corresponding Ct value of the negative control site, and the treated samples were then further normalized to their corresponding input controls (new Supplementary Fig. 2e).

      Four Ψ sites were tested with pseU‑TRACE: Ψ site at position 944 on 23S rRNA, a negative control site located within guaA gene, a Ψ site within clpV1 gene, and an intergenic Ψ site located between guaA and guaB genes. We successfully validated these Ψ sites in P. aeruginosa. The detailed pseU‑TRACE experimental procedures and corresponding data figures have been added to the revised manuscript, in either Results or Methods sections (Line 171-175, 594–617).

      Previous transcriptome-wide mapping of Ψ have primarily relied on CMC-based methods to induce RT truncation signatures at the modified sites, exhibiting a limited Ψ detection sensitivity caused by low labeling efficiency[5]. In contrast, BID-seq method used in this study provides substantially higher sensitivity of Ψ detection, particularly the low-stoichiometry Ψ sites within mRNA. The high reliability and quantitative performance of BID-seq have been extensively validated in prior work using mammalian cells and synthetic Ψ-containing oligonucleotides[4].

      To further ensure robustness and minimize false positives—when identifying low-level mRNA Ψ sites through bioinformatic analysis—we have applied stringent and uniform filtration criteria to all candidate sites on mRNA (new Supplementary Table 1):

      (1) Total sequencing coverage >20 reads in both ‘Treated’ (BID-seq; Σd<sub>t</sub> > 20) and ‘Input’ libraries (Σd<sub>i</sub> > 20);

      (2) An average deletion count >5 in ‘Treated’ libraries;

      (3) An average modification fraction >0.02 (2%) in ‘Treated’ libraries;

      (4) A deletion ratio in ‘Treated’ libraries at least two-fold higher than that in ‘Input’ libraries.

      Sites with a Ψ stoichiometry >0.5 (50%) were classified as highly modified. These filtration criteria have now been explicitly described in Methods section (Lines 739–745). We strictly adhered to these Ψ site identification standards, leading to all subsequent analysis and functional studies.

      Finally, to address concerns regarding reproducibility, we calculated mRNA Ψ site overlap and correlation of Ψ fractions, between two biological replicates, which has been presented in (new Supplementary Fig. 2a,d).

      Overall, we have revised the manuscript to clarify these methodological strengths, and validate mRNA Ψ detection. We also tone down all speculative conclusions, with more clear linkage to the actual sequencing data, which await future functional validation.

      Reviewer #2 (Public review):

      Summary:

      In this study, Xu et al. present a transcriptome-wide, single-base resolution map of RNA pseudouridine modifications across evolutionarily diverse bacterial species using an adapted form of BID-Seq. By optimizing the method for bacterial RNA, the authors successfully mapped modifications in rRNA, tRNA, and, importantly, mRNA across both exponential and stationary growth phases. They uncover evolutionarily conserved Ψ motifs, dynamic Ψ regulation tied to bacterial growth state, and propose functional links between pseudouridylation and bacterial transcript stability, translation, and RNA-protein interactions. To extend these findings, they develop a deep learning model that predicts pseudouridine sites from local sequence and structural features.

      Strengths:

      The authors provide a valuable resource: a comprehensive Ψ atlas for bacterial systems, spanning hundreds of mRNAs and multiple species. The work addresses a gap in the field - our limited understanding of bacterial epitranscriptomics, by establishing both the method and datasets for exploring post-transcriptional modifications.

      We thank Reviewer 2 for the supportive and positive comments. We appreciate the reviewer’s recognition of the novelty and value of our work in providing a comprehensive pseudouridine atlas across multiple bacterial species.

      Weaknesses:

      The main limitation of the study is that most functional claims (i.e., translation efficiency, mRNA stability, and RNA-binding protein interactions) are based on correlative evidence. While suggestive, these inferences would be significantly strengthened by targeted perturbation of specific Ψ synthases or direct biochemical validation of proposed RNA-protein interactions (e.g., with Hfq).

      We thank Reviewer 2 for the constructive feedback. We fully agree that our functional claims regarding translation efficiency, mRNA stability, and RNA-binding protein interactions rely primarily on correlative evidence from existing datasets rather than a direct experimental validation. We agree that the perturbation of specific pseudouridine synthases and direct biochemical validation of proposed RNA-protein interactions (for instance, Hfq) would substantially strengthen the conclusions on bacterial Ψ function. In Discussion section, we have added a discussion on this limitation of our current study (Line 517–523). Considering the scope of our current work, we anticipate such validation experiments in future research.

      Additionally, the GNN prediction model is a notable advance, but methodological details are insufficient to reproduce or assess its robustness.

      In response to methodological concerns regarding our pseU_GNN prediction model, we have undertaken substantial improvements to address these issues comprehensively. We have updated the complete codebase on GitHub (https://github.com/Dylan-LT/pseU_NN.git) with comprehensive documentation and a user-friendly prediction tool specifically designed for Ψ site prediction across the four bacterial species examined in this study.

      We further systematically evaluated multiple neural network architectures and implemented critical architectural refinements. Specifically, we incorporated bidirectional LSTM (bid-LSTM) layers upstream of the transformer block to more effectively capture sequential dependencies and contextual information in RNA sequences. This enhanced architecture demonstrates substantially improved predictive performance, achieving an AUC-ROC of 0.89 on independent test datasets using 41-nucleotide input sequences (new Figure 6).

      We have revised Figure 6 and Supplementary Fig. 7, along with their corresponding content and figure legends (Lines 428-430, 434–436, 440-447, 1065-1073), to reflect these architectural improvements and performance enhancements. We have detailed the methods part (Lines 679–708), including model architecture, validation methods and evaluation score calculation. Additionally, we have provided detailed documentation of the evaluation score calculation methodology to ensure reproducibility and transparency.

      Reviewer #3 (Public review):

      Summary:

      This study aimed to investigate pseudouridylation across various RNA species in multiple bacterial strains using an optimized BID-seq approach. It examined both conserved and divergent modification patterns, the potential functional roles of pseudouridylation, and its dynamic regulation across different growth conditions.

      Strengths:

      The authors optimized the BID-seq method and applied this important technique to bacterial systems, identifying multiple pseudouridylation sites across different species. They investigated the distribution of these modifications, associated sequence motifs, their dynamics across growth phases, and potential functional roles. These data are of great interest to researchers focused on understanding the significance of RNA modifications, particularly mRNA modifications, in bacteria.

      We thank Reviewer 3 for the supportive and positive assessment. We are particularly grateful for the reviewer’s acknowledgment of the value of our analyses on modification distribution, sequence motifs, growth‑phase dynamics, and potential functional roles, which we hope will be of broad interest to researchers studying bacterial RNA modifications, particularly mRNA Ψ.

      Weaknesses:

      (1) The reliability of BID-seq data is questionable due to a lack of experimental validations.

      We thank Reviewer 3 for the constructive feedback. We have undertaken comprehensive revisions to address the concerns regarding manuscript structure and information organization. We have incorporated pseU‑TRACE experiments and data quality results to provide orthogonal validation of Ψ detection, strengthening the robustness of our work.

      Here we copied the response in Reviewer 1 section:

      “To further demonstrate strong reproducibility and sensitivity. We selected three positive Ψ sites from two independent biological replicates for experimental validation, alongside one negative control site, using pseU‑TRACE method[6]. Ct values were first normalized to the corresponding Ct value of the negative control site, and the treated samples were then further normalized to their corresponding input controls (new Supplementary Fig. 2e ).

      Four Ψ sites were tested with pseU‑TRACE: Ψ site at position 944 on 23S rRNA, a negative control site located within guaA gene, a Ψ site within clpV1 gene, and an intergenic Ψ site located between guaA and guaB genes. We successfully validated these Ψ sites in P. aeruginosa. The detailed pseU‑TRACE experimental procedures and corresponding data figures have been added to the revised manuscript, in either Results or Methods sections (Line 171-175, 594–617).”

      (2) The manuscript is not well-written, and the presented work shows a major lack of scientific rigor, as several key pieces of information are missing.

      We thank Reviewer 3 for the suggestion. We restructured the main text to present a clearer logical flow, with key objectives (Lines 83–96, 171–175, 428–447, 517-523) explicitly stated in Introduction section and Conclusions section, with data figures directly addressing these stated aims (Supplementary Fig. 1–7).

      (3) The manuscript's organization requires significant improvement, and numerous instances of missing or inconsistent information make it difficult to understand the key objectives and conclusions of the study.

      We thank Reviewer 3 for the constructive feedback. All supplementary figures have been updated with detailed figure legend, methodology description, and consistent formatting. We also systematically inspected and resolved instances of missing or inconsistent information throughout the main text and supplementary materials (Supplementary Fig. 1–7; Supplementary Table 1). To enhance computational reproducibility, we have updated our GitHub repository with well-documented code and developed user-friendly prediction tools for Ψ identification across the four bacterial species examined in this study.

      (4) The rationale for selecting specific bacterial species is not clearly explained, and the manuscript lacks a systematic comparison of pseudouridylation among these species.

      We thank Reviewer 3 for the constructive feedback. The bacterial species analyzed in this study were selected based on both diversity and significance. K. pneumoniae, B. cereus, and P. aeruginosa are top model human pathogens responsible for a wide range of clinically significant infections, yet transcriptome-wide pseudouridylation has not been systematically explored in these organisms[7–9]. P. syringae, the most important model plant pathogen, was included to extend our analysis beyond human pathogens and to examine Ψ modification in a distinct ecological and evolutionary context, where epitranscriptomic regulation also remains poorly characterized[10]. Importantly, the selected species represent both Gram-positive (B. cereus) and Gram-negative (K. pneumoniae, P. aeruginosa, and P. syringae) bacteria, spanning substantial differences in genome size, GC content, lifestyle, and pathogenic strategies. This diversity enables a comparative framework for examining conserved and species-specific pseudouridylation patterns across bacterial lineages.

      To address the reviewer’s concern, we have revised the manuscript to more clearly articulate the rationale for species selection and have added a comparative analysis highlighting similarities and differences in Ψ site distribution and modification levels among these species (Lines 83–96). We systematically compared Ψ-carrying motif for analyzing sequence context of 10 bases flanking Ψ sites in bacterial mRNA, with Supplementary Fig. 4 added.

      Reference

      (1) Leppik, M., Liiv, A. & Remme, J. Random pseuoduridylation in vivo reveals critical region of Escherichia coli 23S rRNA for ribosome assembly. Nucleic Acids Res. 45, (2017).

      (2) Rajan, K. S. et al. A single pseudouridine on rRNA regulates ribosome structure and function in the mammalian parasite Trypanosoma brucei. Nat. Commun. 14, (2023).

      (3) Li, H. et al. Quantitative RNA pseudouridine maps reveal multilayered translation control through plant rRNA, tRNA and mRNA pseudouridylation. Nat. Plants 11, 234–247 (2025).

      (4) Dai, Q. et al. Quantitative sequencing using BID-seq uncovers abundant pseudouridines in mammalian mRNA at base resolution. Nat. Biotechnol. 41, 344–354 (2023).

      (5) Xu, H. et al. Absolute quantitative and base-resolution sequencing reveals comprehensive landscape of pseudouridine across the human transcriptome. Nat. Methods 21, 2024–2033 (2024).

      (6) Fang, X. et al. A bisulfite-assisted and ligation-based qPCR amplification technology for locus-specific pseudouridine detection at base resolution. Nucleic Acids Res. 52, (2024).

      (7) Wyres, K. L., Lam, M. M. C. & Holt, K. E. Population genomics of Klebsiella pneumoniae. Nature Reviews Microbiology vol. 18 Preprint at https://doi.org/10.1038/s41579-019-0315-1 (2020).

      (8) Kerr, K. G. & Snelling, A. M. Pseudomonas aeruginosa: a formidable and ever-present adversary. Journal of Hospital Infection vol. 73 Preprint at https://doi.org/10.1016/j.jhin.2009.04.020 (2009).

      (9) Ehling-Schulz, M., Lereclus, D. & Koehler, T. M. The Bacillus cereus Group: Bacillus Species with Pathogenic Potential . Microbiol. Spectr. 7, (2019).

      (10) Xin, X. F., Kvitko, B. & He, S. Y. Pseudomonas syringae: What it takes to be a pathogen. Nature Reviews Microbiology vol. 16 Preprint at https://doi.org/10.1038/nrmicro.2018.17 (2018).

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This important study functionally profiled ligands targeting the LXR nuclear receptors using biochemical assays in order to classify ligands according to pharmacological functions. Overall, the evidence is solid, but nuances in the reconstituted biochemical assays and cellular studies and terminology of ligand pharmacology limit the potential impact of the study. This work will be of interest to scientists interested in nuclear receptor pharmacology.

      Strengths:

      (1) The authors rigorously tested their ligand set in CRTs for several nuclear receptors that could display ligand-dependent cross-talk with LXR cellular signaling and found that all compounds display LXR selectivity when used at ~1 µM.

      (2) The authors tested the ligand set for selectivity against two LXR isoforms (alpha and beta). Most compounds were found to be LXRbeta-specific.

      The majority of ligands were found to be LXRβ-selective; however, examples of non-selective and LXRα-selective ligands were identified. It should be noted that this is a small compound set of literature ligands with reasonable structural diversity.

      (3) The authors performed extensive LXR CRTs, performed correlation analysis to cellular transcription and gene expression, and classification profiling using heatmap analysis-seeking to use relatively easy-to-collect biochemical assays with purified ligand-binding domain (LBD) protein to explain the complex activity of full-length LXR-mediated transcription.

      Weaknesses:

      (1) The descriptions of some observations lack detail, which limits understanding of some key concepts.

      Changes to the submitted manuscript hopefully add clarity. Several observations reinforce aspects of the literature and are a corollary of the observation that the majority of ligands with agonist activity more strongly stabilize/induce coactivator-bound complexes with LXRβ. This results in general LXRβ selectivity for agonists and also more variability in the response of LXRα to different ligand chemotypes. The most significant observations were for partial agonists that stabilize corepressor binding, in particular of the complex with LXRα.

      (2) The presence of endogenous NR ligands within cells may confound the correlation of ligand activity of cellular assays to biochemical assay data.

      This is generally a confounding factor for ligands with apparent antagonist activity and is a source of ambiguity in designating inverse agonists across the nuclear receptor research field. Theoretically, this could also impact weak and partial agonists; however, this requires further study.

      (3) The normalization of biochemical assay data could confound the classification of graded activity ligands.

      Normalization to TO (100%) and vehicle (0%) is applied to most data. It is not clear how this confounds data interpretation. TO is a very reliable and reproducible agonist without significant bias towards LXR isoforms.

      (4) The presence of >1 coregulator peptide in the biplex (n=2 peptides) CRT (pCRT) format will bias the LBD conformation towards the peptide-bound form with the highest binding affinity, which will impact potency and interpretation of TR-FRET data.

      Multiplex assays must be optimized to balance binding affinity of the coregulator peptides (bear in mind these are somewhat-artificial small peptide constructs that are hoped to reflect binding of the much larger coregulator protein itself). Since the dominant theory of NR tissue-selectivity is based on the cellular availability (read concentration) of coregulators, this balance exists in a cellular context.

      (5) Correlation graphical plots lack sufficient statistical testing.

      Correlations are now supported by statistical data and we have added hierarchical clustering analysis.

      (6) Some of the proposed ligand pharmacology nomenclature is not clear and deviates from classifications used currently in the field (e.g., hard and soft antagonist; weak vs. partial agonist, definition of an inverse agonist that is not the opposite function to an agonist).

      Classifications used currently in the field vary from one NR to another and the use of partial and inverse agonist, in particular, is usually qualitative, unclear, and often misleading. We expand on these classifications with respect to our use of labels to classify pCRT response to LXR ligands. In agreement with the reviewer, we have replaced IA (inverse agonist) with (RA) reverse agonist as a label specifically associated with pCRT analysis.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript by Laham and co-workers, the authors profiled structurally diverse LXR ligands via a coregulator TR-FRET (CRT) assay for their ability to recruit coactivators and kick off corepressors, while identifying coregulator preference and LXR isoform selectivity.

      The relative ligand potencies measured via CRT for the two LXR isoforms were correlated with ABCA1 induction or lipogenic activation of SRE, depending on cellular contexts (i.e, astrocytoma or hepatocarcinoma cells). While these correlations are interesting, there is some leeway to improve the quantitative presentation of these correlations. Finally, the CRT signatures were correlated with the structural stabilization of the LXR: coregulator complexes. In aggregate, this study curated a set of LXR ligands with disparate agonism signatures that may guide the design of future nonlipogenic LXR agonists with potential therapeutic applications for cardiovascular disease, Alzheimer's, and type 2 diabetes, without inducing mechanisms that promote fat/lipid production.

      Strengths:

      This study has many strengths, from curating an excellent LXR compound set to the thoughtful design of the CRT and cellular assays. The design of a multiplexed precision CRT (pCRT) assay that detects corepressor displacement as a function of ligand-induced coactivator recruitment is quite impressive, as it allows measurement of ligand potencies to displace corepressors in the presence of coactivators, which cannot be achieved in a regular CRT assay that looks at coactivator recruitment and corepressor dissociation in separate experiments.

      Weaknesses:

      I did not identify any major weaknesses.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Page 2. "The endogenous ligands ... activate LXR via canonical or alternate mechanisms." What is an alternate mechanism?

      Small modifications to Fig. 1 caption identify a mechanism alternative to the canonical mechanism: LXR transcriptional complexes are RXR heterodimers that can be activated by a canonical mechanism of coregulator recruitment or an alternative de-repression mechanism

      (2) Page 5: "Notably, the 25 amino acid SRC-1 peptide is the only coactivator tested for LXR binding that has the fluorophore remote from the coactivator peptide." What does this mean, and could it influence the results?

      The sentence has been expanded to clarify the meaning. Notably, the 25 amino acid SRC-1 peptide is the only coactivator, amongst those tested for LXR binding, which has the fluorophore remote from the coactivator peptide: i.e., the only coactivator tested that uses a fluorophore labeled anti-tag antibody to bind the tagged coactivator rather than a fluorophore-labeled coactivator. In methods based on fluorescent tags (CRT, TR-FRET, fluorescence polarization, etc.), a fluorophore that interacts directly with the receptor can generate a maximal signal that differs depending on this interaction: i.e. the identity of the coregulator used in CRT can influence the response. As seen in Figures 6 and S6, maximal response is dependent on ligand and coregulator.

      (3) Page 5: "The [CRT] assay measures the EC50 for coactivator recruitment, a measure of ligand binding affinity." The dose-dependent activity in the CRT assays is more classically defined as a functional "potency", not "affinity".

      The text is changed to remove “measure of affinity”: The assay measures the ligand-dependent EC<sub>50</sub> for ligand-induced coactivator recruitment to LXR; the affinity of the ligand for the LXR:coregulator complex contributes to this potency

      (4) Page 5: "Perhaps surprisingly, considering the description of multiple LXR ligands as partial agonists, most agonists studied gave maximal response at the same level as T0, behaving as full agonists." Can the authors speculate as to why partial agonist activity is not observed in their CRT assays when it has been observed in CRT assays for other nuclear receptors?

      This section has been reworded and please note the apparent partial agonist activity observed in CRT assays for multiple coactivators as shown in Figures 6 and S6 (also see (2) above). Although many LXR ligands have been reported to display partial agonist activity, most agonists studied in this specific biotin-SRC-1 CRT assay, gave maximal response at the same level as T0, behaving as full agonists.

      (5) Page 5: "Conformational cooperativity of LBD residues beyond these two amino acids leads to different conformations of Leu274 and Ala275 that generally favor ligand binding to LXRβ." Where are these residues located? Why are they important?

      We have simplified this paragraph that introduces the interesting observations and interpretation of Ding et al. to illustrate potential contributions to isoform selectivity: The ligand binding pockets of the two LXR isoforms differ by only one amino acid located in helix-3. (H3: LXRα-Val263 and LXRβ-Ile277) Interestingly, correction of this difference by mutation of these residues to alanine (V263A and I277A) was observed to lower, but not to ablate isoform selectivity in reporter assays.[108] Supported by modeling studies, this observation by Ding et al. led to the suggestion that conformational cooperativity of LBD residues beyond these two amino acids, generally favors ligand binding to LXRβ. Therefore, most reported ligands, including those examined in the current work, are LXRβ-selective or non-selective.

      (6) Some correlation plots are described to show "poor" correlations without showing the underlying statistical fits. All correlation plots should show Pearson and Spearman correlation coefficients and p-values within the figures.

      This section of the manuscript has been completely reworked with full correlation analysis and stats . There is no substantive change in data interpretation.

      (7) The normalization of TR-FRET data could introduce undesired bias when comparing activities. The methods section should provide more details about normalization of CRT data, including stating whether the control compounds' activity data were collected on the same CRT 384-well plate on the same day, or different plates, or different days, etc.

      This is now clarified in SI materials and methods section. In-plate controls are always used.

      (8) The authors describe their pCRT assay as "multiplex", whereas "biplex" might be more accurate, as they only used two peptides.

      Biplex is commonly used referring to qPCR. Bio-Plex is a commercial version of an antibody assay. Duplex is obviously a term used in nucleic acid research. Therefore, multiplex is a simpler, more generic term that we feel is suitable and can be extended to add a third coregulator.

      (9) The pCRT assays use the same peptide concentrations (200 nM). However, the peptides will have different affinities for the LBD, which may bias ligand-dependent pCRT profiles. The peptide that binds with higher affinity in the absence of ligand will bias the LBD conformation and impact ligand affinity. Can the authors comment on any limitations of the pCRT approach vs. a normal CRT? Did the authors perform any optimization to see if increasing peptide concentrations (>200 nM) or having different concentrations (e.g., 400 nM SRC1 and 200 nM NCorR2) influences the pCRT data, extracted parameters, correlations, etc.?

      As we write in the Limitations section, our assays are focused on ligand-dependence, whereas other excellent studies focus more on coregulator-dependence. The length and affinity of peptide constructs varies and therefore it is important to “balance” corepressor and coactivator concentrations. The most important conclusions from our pCRT assays concern the ability of some ligands to stabilize corepressor binding in the monoplex CRT and the universal ability of coactivator complex stabilization to eject the corepressor in the multiplex assay. Furthermore, without measurements and correlations in “natural” cellular contexts, the CRT data obtained in cell-free conditions is somewhat artificial. We evaluated a range of peptide concentrations to assess signal-to-background and overall assay performance. Each new receptor added to the panel underwent rigorous optimization to establish robust and reliable assay conditions. This included identifying a suitable positive control for each receptor, determining the optimal coregulator selection and concentration, and refining other key parameters such as buffer composition and total well volume. The concentrations reported represent the optimized balance—producing a strong, reproducible signal without oversaturation or disproportionate contribution from any individual assay component.

      (10) Page 11. The authors introduce a few ligand classification terms that are not standard in the field and unclear: "soft" vs. "hard" antagonist, "weak" vs. "partial" agonist, and their definition of an inverse agonist that, in classical pharmacologic terms, should have an opposite (inverse) function to an agonist. Furthermore, the presence of endogenous LXR ligands within cells may confound the correlation of ligand activity of cellular assays to biochemical assay data. See the following paper for an example of ligand-dependent classification and activation mechanisms when there are endogenous cellular ligands at play: https://elifesciences.org/articles/47172

      The paragraph discussing nomenclature went through many iterations of terminology and a further paragraph was removed that discussed problems with ligand classification in the broader field of NR pharmacology: this has now been added back. We apologise for not citing the excellent Strutzenberg et al. paper on RORa pharmacology, which is now included. In this paper, Griffin and co-workers also use terms that are not standard in the field, such as “silent agonist”, which covers, in part, ligands that we describe as “weak agonists”. A standard, definitive lexicon of terms across NRs is unfortunately problematic. We have added 2 paragraphs:

      The nomenclature for NR ligands often lacks precision and differs across NR classes. SERM (a subset of selective NR modulator) is used to describe varied families of ER ligands that show tissue-selective agonist and/or antagonist actions. Unfortunately, “partial agonist” is also widely used to describe SERMs, even though its use is usually pharmacologically incorrect and biased agonist may be a more accurate label.[124] The majority of reported ER ligands are SERMs, even some that cause ER degradation, because they are transcriptionally active. Consequently, the term “pure antagonist” (PA) has been used to differentiate transcriptionally null ligands[125]; although, pure antagonist/antiestrogen was originally introduced to describe antagonism of both AF1 and AF2 functions.[90]

      Elegant work by Griffin’s team on RAR-related orphan receptor C (RORɣ) is interesting, because it used a combination of HDX-MS and CRT and defined categories of RORɣ ligands.[126] In addition to full agonist, “silent agonist” was introduced to include endogenous and synthetic partial agonists; although, by definition, partial agonists should antagonize full agonists. On the antagonist side of the spectrum, “active antagonist” was used to describe ligands that reduce cellular activity to baseline; and “inverse agonist” for ligands that reduce cellular transcription below baseline and induce recruitment of corepressors. Curiously, inverse agonist has almost never been used to describe ER ligands and is used frequently for other NR ligands, mostly for ligands that reduce transcription below baseline, without any evidence for corepressor recruitment. GSK2033 and SR9238 show inverse agonist activity in cells (Figs 3, 5); however, neither is capable of recruiting SMRT2 or NCOR2 to LXR (Fig. 7).

      (11) Figure 9A and Figure S8. Could hierarchical clustering analysis be used to more rigorously compare the activities of the ligands?

      We have now added hierarchical clustering analysis (Figs 4 S4). It should be noted that the value of such an analysis is much higher when the number of ligands is increased.

      (12) How does cellular potency correlate to pCRT vs. CRT potencies? Does pCRT better explain cellular potency?

      We have added this specific correlation (multiplex CRT vs. monoplex CRT).

      (13) The authors should provide an SI table of parameters (potency values) used for correlation and heatmap analyses.

      Tables have been added to SI accordingly.

      Reviewer #2 (Recommendations for the authors):

      This manuscript has many strengths, but can still be improved by addressing the following critiques:

      (1) I am surprised the team did not find a ligand with a higher efficacy than T0. Please would you explain why T0 seems to have maxed out ligand efficacy for both LXRalpha and LXRbeta?

      Several ligands gave superior efficacy to T0 in cell-based reporter assays and in CRT assays shown in Figures 6 and S6: AZ876, BE1218, and MK9 gave maximal response higher than that of T0.

      (2) In the subsection, "Activity and isoform selectivity of LXR ligands", you mentioned that "The assay measures the EC50 for coactivator recruitment, a measure of ligand binding affinity." This is incorrect. EC50 is a measure of ligand potency, not affinity.

      See Reviewer-1 (3)

      (3) In Figure 3 it is unclear what was used to normalize the antagonist responses in Panel F. Also, I recommend changing the y-axis of Panel F to -100 to 50 to get a better view of the response.

      This has been clarified: zero is vehicle control. Change to y-axis is made.

      (4) In Figure 4, the correlation R-squared values should be presented as a Table to have a better qualitative assessment of the correlations. It is challenging to judge which correlations are better by relying only on visual inspection. I also recommend moving the two panels from Figure S3 to Figure 4 as panels E and F.

      Extensive changes to Figure 4 have been made in response to this comment and that of Reviewer 1, who wanted these values in the figures: Reviewer-1 points (6) and (12).

      (5) In Figure 5, the fold changes in panels G, H, and I could better be presented as a bar graph. Also, the cytotoxicity of ligands needs to be assessed. For instance, in BE1218, there is a sharp decrease in fold change going from ~1 uM to ~10 uM. This will also confirm if the downward trends for SR9238 and GSK2033 are "real" and not as a result of cells dying off at higher ligand concentrations.

      Across our many studies on potent NR ligands, at concentrations above 3 uM, cell growth inhibition is observed. This is true for ER ligands, such as tamoxifen, with explanations in the literature including membrane disruption and low-affinity cytoplasmic binding proteins. We include cell viability measurements in Supplemental as a specific response to the reviewer’s query. There is no loss of cell viability in HepG2 cells.

      (6) Several ligands induce recruitment of coactivators but with minimal ability to displace corepressors. Physiologically, what would be the expected effect of these ligands on LXR activity?\

      We have defined such ligands from pCRT analysis as weak agonists (WA); however, pCRT shows WA ligands induce corepressor loss in the presence of coactivator. Depending on coregulator balance and isoform expression and the importance of the derepression mechanism in a specific cell context, WA ligands might be expected to be differentiated from SA (strong agonist) ligands.

      (7) In the subsection, "synchronous coregulator recruitment by multiplex, precision CRT" you mentioned that "For LXRbeta, the correlation between SRC1 recruitment in monoplex and multiplexed CRT is good," but the data is not shown. I think it would be better to show this data for transparency.

      See query (4) and Reviewer-1. Done.

      (8) In Figure 9, Panel A, the heat map is quantitated as 0-150. Is this fold change? If so, add this label to the figure legend.

      It is Normalized Response as %, which is now added.

      (9) In Figure 9, Panel B, please explain why in all cases, CoA-bound LXR resides at a higher energy level than the CoR-bound, and the apo LXR is at a lower energy level than the CoA-bound protein. A coregulator-bound (holo) protein structure is generally a lower energy (more stable) structure than the unbound (apo) protein. The binding of a coregulator stabilizes the protein's conformation and shifts the equilibrium towards a more thermodynamically favorable state. Using the same argument, it does not make sense to me that the CoR-bound LXR is on the same energy level as the apo LXR.

      This schema reflects our observations in pCRT. No signal was observed for coactivator-bound (holo) protein in the absence of ligand; whereas, a signal was observed for corepressor-bound (holo) protein in the absence of ligand. Therefore, the CoA-bound LXR is higher energy than apo-LXR (+ unbound CoA). Conversely, the signal for CoR-bound LXR can be reduced or increased by ligands, requiring the CoA-bound LXR to be of similar energy to apo-LXR (+ unbound CoR).

      (10) In the Figure 9b caption, "measured at 1uM" pertains to the concentration of ligand or coregulator? This is unclear. You should report the concentration of both ligand and coregulator.

      Clarified in caption.

      (11) In Figure S4, signal for SR9238 shoot up to ~300 units for ligand concentrations >3 uM. Please explain what could have contributed to this anomalous activation and why this was moved to the Supplementary File and not shown in the main figure (Figure 5).

      The HepG2-SRE assay is a nano-luc reporter assay, unlike the CCF-ABCA1 that is a firefly luciferase assay. There is substantial anecdotal evidence that furimazine/nano-luc is susceptible to stabilization enhancement. The RT-PCR data presented in Fig. 5 confirms that this is an artifact for some biphenyl sulfones.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study presents results supporting a model that tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the stem cell niche and inhibit the differentiation of neighboring cells. The valuable findings show that GSC tumors often contain non-mutant cells whose differentiation is suppressed by the GSC tumorous cells. However, the evidence showing that the GSC tumors produce BMP ligands to suppress differentiation of non-mutant cells is incomplete. It could be strengthened by the use of sensitive RNA in situ hybridization approaches.

      Thank you for your valuable assessment. RNA in situ hybridization evidence has been added to the revised manuscript (Figure 5A-D) to support that GSC tumors produce BMP ligands.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This preprint from Shaowei Zhao and colleagues presents results that suggest tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the ovarian stem cell niche and inhibit the differentiation of neighboring non-mutant GSC-like cells. The authors use FRT-mediated clonal analysis driven by a germline-specific gene (nos-Gal4, UASp-flp) to induce GSC-like cells mutant for bam or bam's co-factor bgcn. Bam-mutant or bgcn-mutant germ cells produce tumors in the stem cell compartment (the germarium) of the ovary (Figure 1). These tumors contain non-mutant cells - termed SGC for single-germ cells. 75% of SGCs do not exhibit signs of differentiation (as assessed by bamP-GFP) (Figure 2). The authors demonstrate that block in differentiation in SGC is a result of suppression of bam expression (Figure 2). They present data suggesting that in 73% of SGCs, BMP signaling is low (assessed by dad-lacZ) (Figure 3) and proliferation is less in SGCs vs GSCs. They present genetic evidence that mutations in BMP pathway receptors and transcription factors suppress some of the non-autonomous effects exhibited by SGCs within bam-mutant tumors (Figure 4). They show data that bam-mutant cells secrete Dpp, but this data is not compelling (see below) (Figure 5). They provide genetic data that loss of BMP ligands (dpp and gbb) suppresses the appearance of SGCs in bam-mutant tumors (Figure 6). Taken together, their data support a model in which bam-mutant GSC-like cells produce BMPs that act on non-mutant cells (i.e., SGCs) to prevent their differentiation, similar to what is seen in the ovarian stem cell niche.

      Strengths:

      (1) Use of an excellent and established model for tumorous cells in a stem cell microenvironment.

      (2) Powerful genetics allow them to test various factors in the tumorous vs non-tumorous cells.

      (3) Appropriate use of quantification and statistics.

      We greatly appreciate your valuable comments.

      Weaknesses:

      (1) What is the frequency of SGCs in nos>flp; bam-mutant tumors? For example, are they seen in every germarium, or in some germaria, etc, or in a few germaria?

      This is a good question. Because the SGC phenotype depends on the presence of both germline tumor clones and out-of-niche wild-type germ cells, our quantification was restricted to germaria containing both. In 14-day-old fly ovaries, 70% of germaria (432/618) met this criterion (Line 103). Each of them contained an average of 1.5 SGCs (Figure 1K).

      (2) Does the breakdown in clonality vary when they induce hs-flp clones in adults as opposed to in larvae/pupae?

      Our attempts to induce ovarian hs-FLP germline clones by heat-shocking adult flies were unsuccessful, with very few clones being observed. Therefore, we shifted our approach to an earlier developmental stage. Successful induction was achieved by subjecting late-L3/early-pupal animals to a twice-daily heatshock at 37°C for 6 consecutive days (2 hours per session with a 6-hour interval, see Lines 331-335) (Zhao et al., 2018).

      (3) Approximately 20-25% of SGCs are bam+, dad-LacZ+. Firstly, how do the authors explain this? Secondly, of the 70-75% of SGCs that have no/low BMP signaling, the authors should perform additional character rization using markers that are expressed in GSCs (i.e., Sex lethal and nanos).

      These 20-25% of SGCs are bamP-GFP<sup>+</sup> dad-lacZ<sup>-</sup>, not bam<sup>+</sup> dad-lacZ<sup>+</sup> (see Figure 2C and 3D). They would be cystoblast-like cells that may have initiated a differentiation program toward forming germline cysts (see Lines 122-130). The 70-75% of SGCs that have low BMP signaling exhibit GSC-like properties, including: 1) dot-like spectrosomes; 2) dad-lacZ positivity; 3) absence of bamP-GFP expression. While additional markers would be beneficial, we think that this combination of properties is sufficient to classify these cells as GSC-like.

      (4) All experiments except Figure 1I (where a single germarium with no quantification) were performed with nos-Gal4, UASp-flp. Have the authors performed any of the phenotypic characterizations (i.e., figures other than Figure 1) with hs-flp?

      Yes, we initially identified the SGC phenotype through hs-FLP-mediated mosaic analysis of bam or bgcn mutant in ovaries. However, as noted in our response to Weakness (2), this approach was very labor-intensive. Therefore, we switched to using the more convenient nos>FLP system for subsequent experiments. To our observation, there was no difference in inducing the SGC phenotype by these two approaches.

      (5) Does the number of SGCs change with the age of the female? The experiments were all performed in 14-day-old adult females. What happens when they look at a young female (like 2-day-old). I assume that the nos>flp is working in larval and pupal stages, and so the phenotype should be present in young females. Why did the authors choose this later age? For example, is the phenotype more robust in older females? Or do you see more SGCs at later time points?

      These are very good questions. The SGC phenotype was consistent over the 14-day analysis period (Figure 1J) and was specifically dependent on the presence of germline tumor clones. In 14-day-old fly ovaries, these clones were both larger and more frequent than in younger flies. This age-dependent enhancement in clone size and frequency significantly improved our quantification efficiency (see Lines 101-112).

      (6) Can the authors distinguish one copy of GFP versus 2 copies of GFP in germ cells of the ovary? This is not possible in the Drosophila testis. I ask because this could impact the clonal analyses diagrammed in Figure 4A and 4G and in 6A and B. Additionally, in most of the figures, the GFP is saturated, so it is not possible to discern one vs two copies of GFP.

      Thank you for this valuable comment. It was also difficult for us to distinguish 1 and 2 copies of GFP in the Drosophila ovary. In Figure 4A-F, to resolve this problem, we used a triple-color system, in which red germ cells (RFP<sup>+/+</sup> GFP<sup>-/-</sup>) are bam mutant, yellow germ cells (RFP<sup>+/-</sup> GFP<sup>+/-</sup>) are wild-type, and green germ cells (RFP<sup>-/-</sup> GFP<sup>+/+</sup>) are punt or med mutant. In Figure 4G-J, we quantified the SGC phenotype only in black germ cells (GFP<sup>-/-</sup>), which are wild-type (control) or mad mutant. In Figure 6, we quantified the SGC phenotype only in green germ cells (both GFP<sup>+/+</sup> and GFP<sup>+/-</sup>), all of which are wild-type.

      (7) More evidence is needed to support the claim of elevated Dpp levels in bam or bgcn mutant tumors. The current results with the dpp-lacZ enhancer trap in Figure 5A, B are not convincing. First, why is the dpp-lacZ so much brighter in the mosaic analysis (A) than in the no-clone analysis (B)? It is expected that the level of dpp-lacZ in cap cells should be invariant between ovaries, and yet LacZ is very faint in Figure 5B. I think that if the settings in A matched those in B, the apparent expression of dpp-lacZ in the tumor would be much lower and likely not statistically significant. Second, they should use RNA in situ hybridization with a sensitive technique like hybridization chain reactions (HCR) - an approach that has worked well in numerous Drosophila tissues, including the ovary.

      Thank you for this critical comment. The settings of immunofluorescent staining and confocal parameters in the original Figure 5A were the same as those in 5B. To our observation, the levels of dpp-lacZ in terminal filament and cap cells were highly variable across germaria, even within the same ovary. We have omitted these results from the revised Figure 5. Instead, the HCR-FISH data have been added (Figure 5A-D) to support that bam mutant germline tumors secret BMP ligands.

      (8) In Figure 6, the authors report results obtained with the bamBG allele. Do they obtain similar data with another bam allele (i.e., bamdelta86)?

      No. Given that bam<sup>BG</sup> was functionally indistinguishable from bam<sup>Δ86</sup> in inducing the SGC phenotype (Figure 1J), we believe that repeating these experiments with bam<sup>Δ86</sup> would be redundant and would not alter the key conclusion of our study. Thank you for your understanding!

      Reviewer #2 (Public review):

      While the study by Zhang et al. provides valuable insights into how germline tumors can non-autonomously suppress the differentiation of neighboring wild-type germline stem cells (GSCs), several conceptual and technical issues limit the strength of the conclusions.

      Major points:

      (1) Naming of SGCs is confusing. In line 68, the authors state that "many wild-type germ cells located outside the niche retained a GSC-like single-germ-cell (SGC) morphology." However, bam or bgcn mutant GSCs are also referred to as "SGCs," which creates confusion when reading the text and interpreting the figures. The authors should clarify the terminology used to distinguish between wild-type SGCs and tumor (bam/bgcn mutant) SGCs, and apply consistent naming throughout the manuscript and figure legends.

      We apologize for any confusion. In our manuscript, the term "SGC" is reserved specifically for wild-type germ cells that maintain a GSC-like morphology outside the niche. bam or bgcn mutant germ cells are referred to as GSC-like tumor cells (Lines 89-90), not SGCs.

      (a) The same confusion appears in Figure 2. It is unclear whether the analyzed SGCs are wild-type or bam mutant cells. If the SGCs analyzed are Bam mutants, then the lack of Bam expression and failure to differentiate would be expected and not informative. However, if the SGCs are wild-type GSCs located outside the niche, then the observation would suggest that Bam expression is silenced in these wild-type cells, which is a significant finding. The authors should clarify the genotype of the SGCs analyzed in Figure 2C, as this information is not currently provided.

      The SGCs analyzed in Figure 2A-C are wild-type, GSC-like cells located outside the niche. They were generated using the same genetic strategy depicted in Figures 1C and 1E (with the schematic in Figure 1B). The complete genotypes for all experiments are available in Source data 1.

      (b) In Figures 4B and 4E, the analysis of SGC composition is confusing. In the control germaria (bam mutant mosaic), the authors label GFP⁺ SGCs as "wild-type," which makes interpretation unclear. Note, this is completely different from their earlier definition shown in line 68.

      The strategy to generate SGCs in Figure 4B-F (with the schematic in Figure 4A) is different from that in Figure 1C-F, H, and I (with the schematic in Figure 1B). In Figure 4B-F, we needed to distinguish punt<sup>-/-</sup> (or med<sup>-/-</sup>) with punt<sup>+/-</sup> (or med<sup>+/-</sup>) germ cells. As noted in our response to Reviewer #1’s Weakness (6), it was difficult for us to distinguish 1 and 2 copies of GFP in the Drosophila ovary. Therefore, we chose to use the triple-color system to distinguish these germ cells in Figure 4B-F (see genotypes in Source data 1).

      (c) Additionally, bam<sup>+/-</sup> GSCs (the first bar in Figure 4E) should appear GFP<sup>+</sup> and Red>sup>+</sup> (i.e., yellow). It would be helpful if the authors could indicate these bam<sup>+/-</sup> germ cells directly in the image and clarify the corresponding color representation in the main text. In Figure 2A, although a color code is shown, the legend does not explain it clearly, nor does it specify the identity of bam<sup>+/-</sup> cells alone. Figure 4F has the same issue, and in this graph, the color does not match Figure 4A.

      The color-to-genotype relationships for the schematics in Figures 2A and 4E are provided in Figures 1B and 4A, respectively. Due to the high density of germ cells, it is impractical to label each genotype directly in the images. In contrast to Figure 4E, the colors in Figure 4F do not represent genotypes; instead, blue denotes the percentage of SGCs, and red denotes the percentage of germline cysts, as indicated below the bar chart.

      (2) The frequencies of bam or bgcn mutant mosaic germaria carrying [wild-type] SGCs or wild-type germ cell cysts with branched fusomes, as well as the average number of wild-type SGCs per germarium and the number of days after heat shock for the representative images, are not provided when Figure 1 is first introduced. Since this is the first time the authors describe these phenotypes, including these details is essential. Without this information, it is difficult for readers to follow and evaluate the presented observations.

      Thank you for this constructive suggestion. These quantification data have been added to the revised Figure 1 (Figure 1J, K).

      (3) Without the information mentioned in point 2, it causes problems when reading through the section regarding [wild-type] SGCs induced by impairment of differentiation or dedifferentiation. In lines 90-97, the authors use the presence of midbodies between cystocytes as a criterion to determine whether the wild-type GSCs surrounded by tumor GSCs arise through dedifferentiation. However, the cited study (Mathieu et al., 2022) reports that midbodies can be detected between two germ cells within a cyst carrying a branched fusome upon USP8 loss.

      Unlike wild-type cystocytes, which undergo incomplete cytokinesis and lack midbodies, those with USP8 loss undergo complete cell division, with the presence of midbodies (white arrow, Figure 1F’ from Mathieu et al., 2022) as a marker of the late cytokinesis stage (Mathieu et al., 2022).

      (a) Are wild-type germ cell cysts with branched fusomes present in the bam mutant mosaic germaria? What is the proportion of germaria containing wild-type SGCs versus those containing wild-type germ cell cysts with branched fusomes?

      (b) If all bam mutant mosaic germaria carry only wild-type GSCs outside the niche and no germaria contain wild-type germ cell cysts with branched fusomes, then examining midbodies as an indicator of dedifferentiation may not be appropriate.

      We appreciate your critical comment. bam mutant mosaic germaria indeed contained wild-type germline cysts, as evidenced by an SGC frequency of ~70%, rather than 100% (see Figures 2H, 4F, 4J, 6F, 6I, and Figure 6-figure supplement 3C). Since the SGC phenotype depends on the presence of bam or bgcn mutant germline tumors, we quantified it as “the percentage of SGCs relative to the total number of SGCs and germline cysts that are surrounded by germline tumors” (see Lines 103-108). Quantifying the SGC phenotype as "the percentage of germaria with SGCs" would be imprecise. This is because the presence and number of SGCs were variable among germaria with bam or bgcn mutant germline clones, and a small number of germaria entirely lacked these clones. The data of "SGCs per germarium with both germline clones and out-of-niche wild-type germ cells" have been added to the revised Figure 1 (Figure 1K).

      (c) If, however, some germaria do contain wild-type germ cell cysts with branched fusomes, the authors should provide representative images and quantify their proportion.

      Such germaria could be found in Figure 2G, 3B, 3C, 6D, 6E, and 6H. The percentage of germline cysts can be calculated by “100% - SGC%”.

      (d) In line 95, although the authors state that 50 germ cell cysts were analyzed for the presence of midbodies, it would be more informative to specify how many germaria these cysts were derived from and how many biological replicates were examined.

      As noted in our response to points a) and b) above, the germ cells surrounded by germline tumors, rather than germarial numbers, are more precise for analyzing the phenotype. For this experiment, we examined >50 such germline cysts via confocal microscopy. As the analysis was performed on a defined cellular population, this sample size should be sufficient to support our conclusion.

      (4) Note that both bam mutant GSCs and wild-type SGCs can undergo division to generate midbodies (double cells), as shown in Figure 4H. Therefore, the current description of the midbody analysis is confusing. The authors should clarify which cell types were examined and explain how midbodies were interpreted in distinguishing between cell division and differentiation.

      We assayed for the presence of midbodies or not specifically within the wild-type germline cysts surrounded by bam or bgcn mutant tumors, not within the tumors themselves (Lines 96-97). As detailed in Lines 90-100, the absence of midbodies was used as a key criterion to exclude the possibility of dedifferentiation.

      (5) The data in Figure 5 showing Dpp expression in bam mutant tumorous GSCs are not convincing. The Dpp-lacZ signal appears broadly distributed throughout the germarium, including in escort cells. To support the claim more clearly, the authors should present corresponding images for Figures 5D and 5E, in which dpp expression was knocked down in the germ cells of bam or bgcn mutant mosaic germaria. Showing these images would help clarify the localization and specificity of Dpp-lacZ expression relative to the tumorous GSCs.

      Thank you for your constructive comment. RNA in situ hybridization data have been added to support that bam or bgcn mutant germline tumors secret BMP ligands (Figure 5A-D).

      (6) While Figure 6 provides genetic evidence that bam mutant tumorous GSCs produce Dpp to inhibit the differentiation of wild-type SGCs, it should be noted that these analyses were performed in a dpp⁺/⁻ background. To strengthen the conclusion, the authors should include appropriate controls showing [dpp<sup>+/-</sup>; bam<sup>+/-</sup>] SGCs and [dpp<sup>+/-</sup>; bam<sup>+/-</sup>] germ cell cysts without heat shock (as referenced in Figures 6F and 6I).

      Schematic cartoons in Figure 6A and 6B demonstrate that these analyses were performed in a dpp<sup>+/-</sup> background. Figure 6-figure supplement 1 indicates tha dpp<sup>+/-</sup> or gbb<sup>+/-</sup> does not affect GSC maintenance, germ cell differentiation, and female fly fertility. Figure 6C is the control for 6D and 6E, and 6G is the control for 6H, with quantification in 6F and 6I. We used nos>FLP, not the heat shock method, to induce germline clones in these experiments (see genotypes in Source data 1).

      (7) Previous studies have reported that bam mutant germ cells cause blunted escort cell protrusions (e.g., Kirilly et al., Development, 2011), which are known to contribute to germ cell differentiation (e.g., Chen et al., Frontiers in Cell and Developmental Biology, 2022). The authors should include these findings in the Discussion to provide a broader context and to acknowledge how alterations in escort cell morphology may further influence differentiation defects in their model.

      Thank you for teaching us! We have included the introduction of these two papers in the revised manuscript (Lines 197-199).

      (8) Since fusome morphology is an important readout of SGCs vs differentiation. All the clonal analysis should have fusome staining.

      SGC is readily distinguishable from multi-cellular germline cyst based on morphology. In some clonal-analysis experiments, fusome staining was not feasible due to technical limitations such as channel saturation or antibody incompatibility. Thank you for your understanding!

      (9) Figure arrangement. It is somewhat difficult to identify the figure panels cited in the text due to the current panel arrangement.

      The figure panels were arranged to optimize space while ensuring that related panels are grouped in close proximity for logical comparison. We would be happy to consider any specific suggestions for an alternative layout that could improve clarity.

      (10) The number of biological replicates and germaria analyzed should be clearly stated somewhere in the manuscript-ideally in the Methods section or figure legends. Providing this information is essential for assessing data reliability and reproducibility.

      The detailed quantification information is labeled directly in figures or described in figure legends, and all raw quantification data are provided in Source data 2.

      Reviewer #3 (Public review):

      Summary:

      Zhang et al. investigated how germline tumors influence the development of neighboring wild-type (WT) germline stem cells (GSC) in the Drosophila ovary. They report that germline tumors inhibit the differentiation of neighboring WT GSCs by arresting them in an undifferentiated state, resulting from reduced expression of the differentiation-promoting factor Bam. They find that these tumor cells produce low levels of the niche-associated signaling molecules Dpp and Gbb, which suppress bam expression and consequently inhibit the differentiation of neighboring WT GSCs non-cell-autonomously. Based on these findings, the authors propose that germline tumors mimic the niche to suppress the differentiation of the neighboring stem cells.

      Strengths:

      This study addresses an important biological question concerning the interaction between germline tumor cells and WT germline stem cells in the Drosophila ovary. If the findings are substantiated, they could provide valuable insights applicable to other stem cell systems.

      We greatly appreciate your valuable comments.

      Weaknesses:

      Previous work from Xie's lab demonstrated that bam and bgcn mutant GSCs can outcompete WT GSCs for niche occupancy. Furthermore, a large body of literature has established that the interactions between escort cells (ECs) and GSC daughters are essential for proper and timely germline differentiation (the differentiation niche). Disruption of these interactions leads to arrest of germline cell differentiation in a status with weak BMP signaling activation and low bam expression, a phenotype virtually identical to what is reported here. Thus, it remains unclear whether the observed phenotype reflects "direct inhibition by tumor cells" or "arrested differentiation due to the loss of the differentiation niche." Because most data were collected at a very late stage (more than 10 days after clonal induction), when tumor cells already dominate the germarium, this question cannot be solved. To distinguish between these two possibilities, the authors could conduct a time-course analysis to examine the onset of the WT GSC-like single-germ-cell (SGC) phenotype and determine whether early-stage tumor clones with a few tumor cells can suppress the differentiation of neighboring WT GSCs with only a few tumor cells present. If tumor cells indeed produce Dpp and Gbb (as proposed here) to inhibit the differentiation of neighboring germline cells, a small cluster or probably even a single tumor cell generated at an early stage might prevent the differentiation of their neighboring germ cells.

      Thank you for your critical comment. The revised manuscript now includes a time-course analysis of the SGC phenotype (Figure 1J). Our data in Figure 6 demonstrate that BMP ligands from germline tumors are required to inhibit SGC differentiation. Furthermore, we have incorporated into the manuscript the possibility that disruption of the differentiation niche may also contribute to the SGC phenotype (Lines 197-199).

      The key evidence supporting the claim that tumor cells produce Gpp and Gbb comes from Figures 5 and 6, which suggest that tumor-derived dpp and gbb are required for this inhibition. However, interpretation of these data requires caution. In Figure 5, the authors use dpp-lacZ to support the claim that dpp is upregulated in tumor cells (Figure 5A and 5B). However, the background expression in somatic cells (ECs and pre-follicular cells) differs noticeably between these panels. In Figure 5A, dpp-lacZ expression in somatic cells in 5A is clearly higher than in 5B, and the expression level in tumor cells appears comparable to that in somatic cells (dpp-lacZ single channel). Similarly, in Figure 5B, dpp-lacZ expression in germline cells is also comparable to that in somatic cells. Providing clear evidence of upregulated dpp and gbb expression in tumor cells (for example, through single-molecular RNA in situ) would be essential.

      We greatly appreciate your critical comment. In our data, the expression levels of dpp-lacZ in terminal filament and cap cells were highly variable across germaria, even within the same ovary. We have omitted these results in the revised Figure 5. RNA in situ hybridization data have been added to visualize the expression of BMP ligands within bam mutant germline tumor cells (Figure 5A-D).

      Most tumor data present in this study were collected from the bam[86] null allele, whereas the data in Figure 6 were derived from a weaker bam[BG] allele. This bam[BG] allele is not molecularly defined and shows some genetic interaction with dpp mutants. As shown in Figure 6E, removal of dpp from homozygous bam[BG] mutant leads to germline differentiation (evidenced by a branched fusome connecting several cystocytes, located at the right side of the white arrowhead). In Figure 6D, fusome is likely present in some GFP-negative bam[BG]/bam[BG] cells. To strengthen their claim that the tumor produces Dpp and Gbb to inhibit WT germline cell differentiation, the authors should repeat these experiments using the bam[86] null allele.

      Although a structure resembling a "branched fusome" is visible in Figure 6E (right of the white arrowhead), it is an artifact resulting from the cytoplasm of GFP-positive follicle cells, which also stain for α-Spectrin, projecting between germ cells of different clones (see the merged image). In both our previous (Zhang et al., 2023) and current studies, bam<sup>BG</sup> was functionally indistinguishable from bam<sup>Δ86</sup> in its ability to block GSC differentiation and induce the SGC phenotype (Figure 1J). Given this, we believe that repeating the extensive experiments in Figure 6 with the bam<sup>Δ86</sup> allele would be scientifically redundant and would not change the key conclusion of our study.

      It is well established that the stem niche provides multiple functional supports for maintaining resident stem cells, including physical anchorage and signaling regulation. In Drosophila, several signaling molecules produced by the niche have been identified, each with a distinct function - some promoting stemness, while others regulate differentiation. Expression of Dpp and Gbb alone does not substantiate the claim that these tumor cells have acquired the niche-like property. To support their assertion that these tumors mimic the niche, the authors should provide additional evidence showing that these tumor cells also express other niche-associated markers. Alternatively, they could revise the manuscript title to more accurately reflect their findings.

      Dpp and Gbb are the key niche signals from cap cells for maintaining GSC stemness. Our work demonstrates that germline tumors can specifically mimic this signaling function, not the full suite of cap cell properties, to create a non-cell-autonomous differentiation block. The current title “Tumors mimic the niche to inhibit neighboring stem cell differentiation” reflects this precise concept: a partial, functional mimicry of the niche's most relevant activity in this context. We feel it is an appropriate and compelling summary of our main conclusion.

      In the Method section, the authors need to provide details on how dpp-lacZ expression levels were quantified and normalized.

      Because of the highly variable expression levels in terminal filament and cap cells, we have omitted the dpp-lacZ results in the revised manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Minor points

      (1) Not all readers may be familiar with the nos>FLP/FRT or hs-FLP/FRT systems. It would be helpful if the authors could briefly introduce these genetic mosaic systems and explain how they were used in this study before presenting the results.

      Thank you for this constructive suggestion. Such brief introduction has been added to the revised manuscript (Lines 64-70).

      (2) Line 68-70: "Surprisingly, ...outside the niche retained a GSC-like single-germ-cell (SGC) morphology, even when encapsulated within egg chambers (Figure 1C, D, Figure 1- figure supplement 1).

      (3) The figure citation is not appropriate, as Figures 1C and 1D do not show "single germ cells (SGCs) encapsulated within egg chambers." To improve clarity, the authors could revise the sentence as follows: "Surprisingly, wild-type germ cells located outside the niche retained a GSC-like single-germ-cell (SGC) morphology (Figures 1C and D), even when encapsulated within egg chambers (Figure 1-figure supplement 1)." This modification would make the description consistent with the figure content and easier for readers to follow.

      Thank you for teaching us! The manuscript has been revised following this suggestion (Lines 70-73).

      (4) Line 106-110. The description is confusing. The authors state, "Under normal conditions... Notably, 74% of SGCs (n = 132) were GFP-negative, while the remaining 26% were GFP-positive (Figure 2B, C). However, Figure 2B shows the bam mutant mosaic germaria, and Figure 2C does not specify the genotypes of the germaria used for the analysis of GSCs, CBs, and SGCs. The authors should clarify the experimental conditions and genotypes corresponding to each panel. In addition, it would be more informative to indicate how many germaria these quantified GSCs, CBs, and SGCs were derived from.

      (5) Throughout the manuscript, the authors report the number of SGCs analyzed (e.g., Lines 149-151). However, it would be more informative to also indicate how many germaria these quantified SGCs were derived from. Providing this information would help readers assess the sampling size and variability across biological replicates.

      Thank you for your suggestion. As shown in Figure 2B, these wild-type (RFP-positive) GSCs and CBs were also derived from bam mutant mosaic germaria. The phrase "under normal conditions" has been deleted from the revised manuscript to prevent any potential ambiguity. Given the specificity of the SGC phenotype, the germ cells surrounded by germline tumors, rather than germarial numbers, are more precise for its quantification (Lines 103-108). The data of “SGCs per germarium with both germline clones and out-of-niche wild-type germ cells” have been added to the revised Figure 1K.

      Reviewer #3 (Recommendations for the authors):

      (1) Additionally, the authors should clarify what the "red dot" signal in the GFP-positive cap cell in Figure 3 F (left panel) represents.

      The “red dot” is an asterisk that is used to mark a cap cell (Line 620).

      (2) Finally, on line 266, "bamP-GFP-positive" should be corrected to "bamP-GFP-negative."

      It should be “bamP-GFP-positive”, not “bamP-GFP-negative” (see Figure 2B).

      Reference:

      Mathieu, J., Michel-Hissier, P., Boucherit, V., and Huynh, J.R. (2022). The deubiquitinase USP8 targets ESCRT-III to promote incomplete cell division. Science 376, 818-823.

      Zhang, Q., Zhang, Y., Zhang, Q., Li, L., and Zhao, S. (2023). Division promotes adult stem cells to perform active niche competition. Genetics 224.

      Zhao, S., Fortier, T.M., and Baehrecke, E.H. (2018). Autophagy Promotes Tumor-like Stem Cell Niche Occupancy. Curr Biol 28, 3056-3064.e3053.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work provides evidence that slender T. brucei can initiate and complete cyclical development in Glossina morsitans without GlcNAc supplementation, in both sexes, and importantly in non-teneral flies, including salivary-gland infections.

      Comparative transcriptomics show early divergence between slender- and stumpy-initiated differentiation (distinct GO enrichments), with convergence by ~72 h, supporting an alternative pathway into the procyclic differentiation program.

      The work addresses key methodological criticisms of earlier studies and supports the hypothesis that slender forms may contribute to transmission at low parasitaemia.

      Strengths:

      (1) Directly tackles prior concerns (no GlcNAc, both sexes, non-teneral flies) with positive infections through to the salivary glands.

      (2) Transcriptomic time course adds some mechanistic depth.

      (3) Clear relevance to the "transmission paradox"; advances an important debate in the field.

      Weaknesses:

      (1) Discrepancy with Ngoune et al. (2025) remains unresolved; no head-to-head control for colony/blood source or microbiome differences that could influence vector competence.

      We acknowledge that a direct head-to-head comparison was not performed and that microbiome composition can affect vector competence. However, both the tsetse flies used in Ngoune et al. (2025) and those in our study originated from the same colony and were maintained under comparable standard laboratory conditions. In both cases, flies were fed on sheep blood through identical silicon membrane systems, minimizing potential differences.

      (2) Lacks in vivo feeding validation (e.g., infecting flies directly on parasitaemic mice) to strengthen ecological relevance.

      Our study deliberately focused on controlling experimental variables through the use of an artificial feeding system, which allows for standardization of parasite dose and exposure conditions. This approach facilitates reproducibility and direct comparison with previous studies. Also, to us it appears questionable if feeding flies on infected laboratory mice really adds ecological relevance.

      (3) Mechanistic inferences are largely correlative (although not requested, there is no functional validation of genes or pathways emerging from the transcriptomics).

      Functional validation of individual genes or pathways was not undertaken in this study. Instead, the aim was to identify and compare transcriptional signatures associated with slender-to-procyclic versus stumpy-to-procyclic differentiation, and to directly address previous criticism of original finding that slender bloodstream forms are capable of infecting the tsetse fly.

      (4) Reliance on a single parasite clone (AnTat 1.1) and one vector species limits external validity.

      Incorporating additional pleomorphic T. brucei clones and alternative tsetse species would undoubtedly broaden our understanding of parasite-vector interactions, and studies using fresh field isolates and wild-caught tsetse flies would be even more informative. However, in order to directly address the specific concerns raised against our original study (Schuster et al., 2021), it was essential to employ the same parasite clone and vector species.

      We further emphasize that the pleomorphic clone used here is a well-characterized and widely employed T. brucei strain that closely reflects parasites encountered under natural conditions. Likewise, Glossina morsitans represents the standard vector species used in the majority of tsetse laboratories, thereby ensuring reproducibility and facilitating comparison with existing work in the field.

      Reviewer #2 (Public review):

      Summary:

      This paper is an exciting follow-up to two recent publications in eLife: one from the same lab, reporting that slender forms can successfully infect tsetse flies (Schuster, S et al., 2021), and another independent study claiming the opposite (Ngoune, TMJ et al., 2025). Here, the authors address four criticisms raised against their original work: the influence of N-acetyl-glucosamine (NAG), the use of teneral and male flies, and whether slender forms bypass the stumpy stage before becoming procyclic forms.

      Strengths:

      We applaud the authors' efforts in undertaking these experiments and contributing to a better understanding of the T. brucei life cycle. The paper is well-written and the figures are clear.

      Weaknesses:

      We identified several major points that deserve attention.

      (1) What is a slender form? Slender-to-stumpy differentiation is a multi-step process, and most of these steps unfortunately lack molecular markers (Larcombe et al, 2023). In this paper, it is essential that the authors explicitly define slender forms. Which parameters were used? It is implicit that slender forms are replicative and GFP::PAD1-negative. Isn't it possible that some GFP::PAD1-negative cells were already transitioning toward stumpy forms, but not yet expressing the reporter? Transcriptomically, these would be early transitional cells that, upon exposure to "tsetse conditions" (in vitro or in vivo), could differentiate into PCF through an alternative pathway, potentially bypassing the stumpy stage (as suggested in Figure 4). Given the limited knowledge of early molecular signatures of differentiation, we cannot exclude the possibility that the slender forms used here included early differentiating cells. We suggest:

      (1.1) Testing the commitment of slender forms (e.g., using the plating assay in Larcombe et al., 2023), assessing cell-cycle profile, and other parameters that define slender forms.

      (1.2) In the Discussion, acknowledging the uncertainty of "what is a slender?" and being explicit about the parameters and assumptions.

      We appreciate the critical evaluation concerning the identity of slender forms and potential presence of intermediate forms displaying slender morphology yet exhibiting cell-cycle arrest, as proposed in Larcombe et al. (2023). Indeed, our original paper is entitled “Unexpected plasticity in the life cycle of Trypanosoma brucei.” It is precisely this phenotypic plasticity that enables slender parasites to transition directly into the procyclic insect stage. Notably, we have shown that even monomorphic trypanosome strains are capable of undergoing this transition in the fly, and such strains are not considered to represent “intermediate” or “half-stumpy” forms. Consequently, while the question “what constitutes a slender parasite?” may be of conceptual interest, it currently is, in our view, not central to the biological conclusions of this study.

      Nevertheless, we now have included an additional section in our Discussion that compares the slender cells used in our study with the commitment classification introduced by Larcombe et al. Our infection experiments were conducted using cells that meet the Larcombe-criteria of “true slender cells”, characterized by the absence of PAD1 expression and the maintenance of a slender morphology (Supplementary Figure 3A, B, following FACS sorting). Moreover, these cells are not cell-cycle arrested but continue to proliferate (Supplementary Figure 3C). Accordingly, our experimental assumptions and parameters align those of previous studies, in which continuous cell division, lack of cell cycle arrest, lack of PAD1 expression, and slender morphology are still established markers defining the slender bloodstream form.

      (1.3) Clarifying in the Materials and Methods how cultures were maintained in the 3-4 days prior to tsetse infections, including daily cell densities. Ideally, provide information on GFP expression, cell cycle, and morphology. While this will not fully resolve the concern, it will allow future reinterpretation of the data when early molecular events are better understood.

      We thank the reviewer for this helpful suggestion. Details on the maintenance of T. brucei cultures and culture conditions, including cell density, are provided in our previous publication (Schuster et al., 2021). In the present study, cultures were routinely monitored prior to infection to ensure that the cells used were GFP-negative and exhibited the characteristic slender morphology.

      For infections performed with higher cell numbers, fluorescence-activated cell sorting (FACS) was used to obtain a 100% GFP-negative population, thereby avoiding the need for daily monitoring of GFP fluorescence. This approach ensured that all infection experiments were initiated with a homogeneous population of slender bloodstream forms.

      (2) Figure 1: This analysis lacks a positive control to confirm that NAG is working as expected. It would strengthen the paper if the authors showed that NAG improves stumpy infection. Once confirmed, the authors could discuss possible differences in the tsetse immune response to slender vs. stumpy forms to explain the absence of an effect on slender infections.

      The enhancing effect of N-acetylglucosamine (NAG) on stumpy-form infections of T. brucei is well established and widely accepted in the field (e.g. Peacock et al., 2006, 2012). In the present Research Advance, our objective was to directly address the specific concerns raised in response to our previous publication (Schuster et al., 2021), in which NAG supplementation during stumpy infections was already included and shown to function as expected. Accordingly, the aim here was not to reiterate the established role of NAG in promoting stumpy infections, but rather to directly examine infections initiated by slender bloodstream forms in the absence of NAG, thereby approximating more natural conditions.

      (3) Figure 2. To conclude that teneral flies are less infected than non-teneral flies, data from Figures 1 and 2 must be directly comparable. Were these experiments performed simultaneously? Please clarify in the figure legends. Moreover, the non-teneral flies here are still relatively young (6-7 days old), limiting comparisons with Ngoune, TMJ et al. 2025, where flies were 2-3 weeks old.

      The experiments presented in Figures 1 and 2 were not performed simultaneously. Importantly, the comparison between teneral and non-teneral flies was not intended as a direct quantitative comparison across experiments, but rather to assess infection outcomes under distinct physiological states of the vector. It is well established that teneral flies are generally more susceptible to T. brucei infection than non-teneral flies, a phenomenon commonly referred to as the “teneral phenomenon.”

      Our objective was to demonstrate that slender bloodstream forms are capable of establishing infections also in non-teneral flies, thereby directly addressing concerns in the comment to our original study (Schuster et al.) that the experimental set-up may have created an unnaturally permissive environment. The data presented here in fact support the conclusion that slender forms can contribute to disease transmission under more natural conditions.

      A key determinant of the increased susceptibility of teneral flies is the incomplete maturation of the peritrophic matrix (PM) (Walshe et al., 2011; Haines, 2013). In Glossina morsitans morsitans, the PM reaches its full length along the midgut approximately 84 hours post-eclosion (Lehane and Msangi, 1991). In addition, teneral flies have not yet taken a bloodmeal prior to the infective one, a factor known to further increase susceptibility (Haines, 2013).

      In the present paper, non-teneral flies were selected that had received two non-infectious bloodmeals prior to the infective challenge. At 6-7 days post-eclosion, these flies possessed a fully established PM, which is known to increase refractoriness to infection (Walshe et al., 2011), while still being sufficiently young to survive the time required for T. brucei to complete its developmental cycle. This is an important point, as our timing allowed robust interpretation of infection outcomes, without the substantial loss of flies (approximately 40%) that has been reported to occur prior to dissection in Ngoune et al., 2025.

      (4) Figure 3. The PCA plot (A) appears to suggest the opposite of the authors' interpretation: slender differentiation seems to proceed through a transcriptome closer to stumpy profiles. Plotting DEG numbers (panel C) is informative, but how were paired conditions selected? Besides, plotting of the number of DEGs between consecutive time points within and between parasite types is also necessary. There may also be better computational tools to assess temporal relationships. Finally, how does PAD1 transcript abundance change over time in both populations? It would also be important to depict the upregulation of procyclic-specific genes.

      Regarding the PCA plot (Figure 3A), we agree that slender form differentiation transiently exhibits transcriptomic similarities to stumpy form profiles. However, as discussed in the paper, this overlap specifically reflects shared early differentiation responses rather than the adoption of a full stumpy-like transcriptome. The overall trajectory and clustering pattern indicate that slender-derived parasites follow a distinct differentiation path that - as expected -ultimately converges with the procyclic stage, consistent with our interpretation.

      For the DEG analysis (Figure 3C), paired conditions were selected based on biologically meaningful time points corresponding to key stages in the differentiation process, allowing for direct comparisons between slender- and stumpy-derived populations either for the same timepoints following addition of cis-aconitate (Supplementary Figure 5) or timepoints plotting close on the PCA (Supplementary Figure 6).

      We also appreciate the recommendation to consider alternative computational approaches for assessing temporal relationships. While our current analysis provides robust insights into transcriptomic transitions, we agree that future studies employing different tools could further refine our observations.

      Finally, we have included the expression dynamics of PAD1 and PAD2 in the Supplementary Data (Supplementary Figure 8). The expression profile for procyclic-specific genes can now be found in Supplementary Figure 9.

      (5) Could methylcellulose in the medium sensitize parasites to QS-signal, leading to more frequent and/or earlier differentiation, despite low densities? If so, cultures with vs. without methylcellulose might yield different proportions of early-differentiating (yet GFP-negative) parasites. This could explain discrepancies between the Engstler and Rotureau labs despite using the same strain. The field would benefit from reciprocal testing of culture conditions. Alternatively, the authors could compare infectivity and transcriptomes of their slender forms under three conditions: (i) in vitro with methylcellulose, (ii) in vitro without methylcellulose, and (iii) directly from mouse blood.

      The original description of stumpy induction factor (SIF)-mediated quorum sensing in Trypanosoma brucei was performed by the Boshart laboratory using (a) the same cell line employed in the present study and (b) an identical HMI-9 medium supplemented with the same amount of methylcellulose (Reuner et al., 1997; Vassella et al., 1997). All relevant controls were comprehensively reported in those studies in the late 1990s. There is therefore no experimental or historical basis to suggest that methylcellulose sensitises parasites to stumpy differentiation. Moreover, the viscosity of HMI-9-methylcellulose remains well below the threshold required to impose a diffusion barrier for small molecules such as peptides. Consequently, accumulation of SIF as a result of increased medium viscosity can be excluded on physical grounds.

      The present Research Advance was conducted with a focused objective, namely, to directly address the specific concerns raised in response to our original publication (Schuster et al., 2021). Expanding the study to include additional experimental conditions, such as systematic comparisons of cultures grown with and without methylcellulose, or analyses of parasites freshly isolated from mouse blood, would have extended the scope well beyond what is useful for a Research Advance and would have diluted the central purpose of this contribution.

      Recommendations for authors:

      Reviewer #1 (Recommendations for the authors):

      Thank you for your perseverance in filling the gaps flagged by others - these data strengthen the story.

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1: The use of teneral flies is not mentioned in the text or the legend

      Thank you: we added this to the main text and figure legend (lines 103 and 140).

      (2) Figure 1 legend (line 2): Typo - "with or 60 nm" should read "with or without 60 nm."

      Thank you: this has been corrected (line 141).

      (3) Figure 2. Please provide the FACS gating strategy and cell numbers before and after sorting

      The cell number before gating is 1x10<sup>7</sup> cells, and 1x10<sup>6</sup> cells were collected via FACS for infection experiments. This is stated in the Materials & Methods section (lines 473 and 478).

      (4) Figure 3. RNAseq data presentation could be improved:

      (a) Clarify which type of differentially expressed genes are shown in panels B and C (presumably those upregulated in slender forms and those upregulated in stumpy forms).

      Thank you: the information has now been added to the figure legend (lines 279 and 282).

      (b) The color code in panel A is inverted relative to panels B and C.

      Thank you: this has been corrected (figure 3B and C).

      (c) The GO-term analysis represents an important conclusion and should be moved to the main figure.

      As a Research Advance, this paper is restricted in the number of figures and therefore the decision had to be made to move the GO-term analysis to the Supplements.

      (d) Provide dataset quality control in the supplement (genes detected per sample, sample consistency, replicate correlations, etc.).

      Sequencing analysis is now explained in detail in the Materials & Methods section (lines 515 - 528).

      (5) Figure legends: Indicate how many times each experiment was performed and the number of independent biological replicates.

      The number of replicates (and flies per replicate) is stated for both infection experiments in the respective figure legends (lines 143 and 203/04). For the RNA sequencing, it is stated in the main text, and we now have also added the information to the figure legend (lines 219 and 276/77).

      (6) Discussion: Despite the ongoing debate about midgut pH, could the authors also comment on other evidence suggesting that stumpy forms are better adapted to the fly?

      The pH of the midgut has been determined by the Acosta-Serrano laboratory. We have cited the paper (Liniger et al. 2003) in lines 328-330 of the discussion. Furthermore, we have discussed the developing mitochondria of stumpy forms as well as expression of Krebs cycle, and the proposed higher resistance to proteolytic stress (Vickerman, 1965; Brown et al., 1973; Hamm et al., 1990; Reuner et al., 1997, Nolan et al., 2000).

    1. Author response:

      Reviewer #1 (Public review):

      (1) While the manuscript convincingly documents distinct expression patterns, the functional consequences of these differences remain unexplored. The conclusions regarding non-redundant roles would benefit from functional perturbation experiments. Relatedly, the authors propose that tnfa and tnfb may play different immunological roles, but the mechanistic basis underlying these differences is not addressed. For example, do the two cytokines engage different receptors or signaling pathways? Do they trigger distinct downstream transcriptional programs?

      We agree functional analysis on Tnfb is relevant to address, however, the focus of the current manuscript (Tools and Resources article type) was to report the generation and validation of the new tnfb-reporter line, we feel that functional data is better suited for a separate manuscripts. In fact, this will be part of a follow manuscript which will be forthcoming soon.

      (2) Some imaging-based observations appear largely qualitative. Additional quantitative analyses, such as statistical comparisons of expression levels across time points or cell populations, would strengthen the robustness of the conclusions. For instance, in Figure 4, the expression levels of tnfa and tnfb reporter transgenes in immune cells should be quantitatively compared between control and amputated conditions.

      In figure 4, we focus on which cells express either cytokine, not on when they express it nor whether the one cell expresses more or less eGFP/mCh. Also, tnfb:mCh-F and tnfa:eGFP-F expression is membrane-bound as these protein is farnesylated, whereas il1b:eGFP is not, and has a cytoplasmic distribution. Because of possible biases due to the different distribution or abundance of cytoplasmic vs farnesylated proteins within a cell, we never compared max eGFP to max mCherry within a treatment group.

      (3) It would also be important to clarify whether the distinct maturation kinetics of the fluorescent reporters were taken into account when interpreting expression timing. Since GFP typically matures more rapidly than mCherry in vivo, the authors should comment on whether this difference could influence the apparent expression kinetics of tnfa versus tnfb.

      In figure 5, we do count the cells expressing either of the cytokine, and use eGFP/mCherry signal to infer on how early these cells express the cytokine. We, however, do not directly compare maximum eGFP or mCherry fluorescence intensity per cell, which, especially in the early time points, could be biased by differences in protein maturation, we only score eGFP or mCherry presence in a cell. We could not really compare or account for differences in protein maturation as we do not possess Il1b and tnfa transgenic lines driving mCherry expression for comparison (and to our knowledge are not available in other laboratories). Based on the obtained results however, it appears that the earlier maturation of eGFP compared to mCherry may not influence the outcome of the analysis, as no single tnfa:eGFP-F+ cells were observed at any time point and single il1b:eGFP+ cells were observed only 6h after amputation, whereas eGFP/mCherry double positive cells could be observed as early as 2h after amputation. Any bias should influence the period between 1h and 2h, and we did not look at time lapses shorter than 1h.

      Reviewer #2 (Public review):

      (1) Lack of functional analysis; these lines are a potentially valuable tool, but so far provide no clue regarding the role of tnfb. Is it a pro-inflammatory cytokine acting in synergy with tnfa, or is it an antagonist? What are its receptor(s)? What signalling pathways and downstream genes does it induce? Addressing at least some of these questions should greatly increase the impact of the paper.

      Please refer to response to Reviewer #1 point 1.

      We will address the other recommendation to the authors as they will improve the manuscript.

    1. Author Response:

      eLife assessment:

      The study provides an important advance towards understanding how spatial and temporal transcriptional programs are integrated to regulate lineage-specific chromatin and enhancer activation. The functional evidence is currently incomplete, but the current data provide a solid correlative and conceptual foundation. Functional experiments directly linking Gsb occupancy to chromatin state and regulation of some lineage-specific targets would further strengthen the causal interpretation of the model. Clarifying the scope of conclusions and explicitly acknowledging the technical limitations of current chromatin assays would provide a more balanced interpretation of the manuscript.

      We thank the reviewers and editors for their comments on our manuscript. We address here the concerns raised by them.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      It has long been known that Drosophila embryonic ventral nerve cord neuroblasts incorporate both spatial and temporal transcription factor expression to generate 30 distinct neuroblasts and lineages per hemisegment. This manuscript aims to elucidate the mechanism by which this integration of spatial and temporal transcription factors occurs through "direct regulation" or "epigenetic regulation". Direct regulation is defined as both spatial and temporal factors binding to open chromatin and working together to dictate specific lineages. Epigenetic regulation is defined as a spatial factor priming the chromatin in a neuroblast-specific manner to allow for the integration of temporal factors to generate specific lineages. The authors conclude that there is a two-step model in which a spatial transcription factor code "primes" the chromatin in terms of accessibility and then recruits temporal factors to ensure lineage-specific enhancer activation.

      We thank the reviewer for this clear and succinct summary and for accurately capturing the central idea of the model we propose. In particular, we appreciate that the reviewer highlights the distinction between the previously proposed “direct regulation” and “epigenetic regulation” models, which our work suggests may operate together within neuroblast lineages through a combinatorial spatial transcription factor code.

      Strengths:

      The authors tested two models, "direct regulation" vs "epigenetic regulation" in a well-defined pool of neural stem cells during normal development.

      We thank the reviewer for recognizing this aspect of the study.

      Weaknesses:

      The data in this study cannot clearly substantiate these two models.

      Overall, there are a number of issues that are inconsistent and not supportive of the model proposed in this manuscript. Firstly, there is no evidence of pioneer factor activity in any of the NB lineages described - i.e., any changes in chromatin accessibility being shown over time. The authors must show chromatin conformation changes during the window of spatial transcription factor expression in order to convince the readers of this phenomenon.

      Thank you for raising this point. In most studies, pioneer or chromatin-priming activity is inferred from a transcription factor’s ability to bind regions of relatively low accessibility and to remodel chromatin upon perturbation, rather than from direct developmental time-course measurements of chromatin accessibility.

      In our study we provide two lines of evidence consistent with such activity. First, TaDa profiling shows that Gsb occupies both accessible loci and regions that are relatively less accessible in NB5-6. Second, ectopic expression of Gsb in the non-cognate NB7-4 lineage results in clear chromatin remodelling, with loci both gaining and losing accessibility (Fig. 6). These perturbation experiments demonstrate that Gsb is sufficient to alter chromatin accessibility in vivo and therefore support a chromatin-priming role for it.

      We agree that a developmental time-course would be very informative. The difficulty is that, in this system, the relevant sequence unfolds extremely rapidly and across two different cellular contexts. Spatial transcription factors such as Gsb are expressed in the neuroectoderm, neuroblasts are then specified and delaminate, and Hb expression begins almost immediately after NB formation — on the order of minutes to tens of minutes. Before delamination there is no neuroblast to target with NB-specific drivers, and once the NB forms the temporal program is already underway. More generally, resolving chromatin accessibility changes across this transition would require temporally precise profiling at very high resolution in vivo, likely with live or near-live methods, and is not feasible with the Dam-based lineage-restricted approaches currently available.

      Secondly, the phenotypic data do not align with the sequencing data - the story would be more cohesive if the sequencing data and phenotypic data were in the same NB subtypes. On one hand, we are shown that Gsb misexpression induces loss of chromatin accessibility in NB 7-4, however in the widespread loss model, we are not shown a phenotype in these NB7-4 - which suggest that the chromatin accessibility at these sites (sites that have already been distinguished as SoIs for that NB subtype) does not play an important role in distinguishing NB 7-4 identity. However, the authors report loss of NB3-5 identity but have no evidence as to how the chromatin has changed (or if it has at all) in that subtype, leaving the readers to wonder how the loss of identity occurred

      Thank you for raising this point regarding the alignment between the chromatin and phenotypic analyses. The reviewer’s comment made us realise that the rationale for these experiments may not have been sufficiently clear in the original manuscript and could therefore be perceived as misaligned. We therefore explain the logic of the experimental design here and will edit the manuscript in the revision to clarify this point for readers.

      The chromatin experiments were designed to test whether Gsb is capable of remodelling chromatin when introduced into a non-cognate lineage. For this purpose, NB7-4 provided a suitable lineage with clean genetic access for TaDa/CATaDa experiments, allowing us to assess whether ectopic Gsb expression can alter chromatin accessibility in vivo.

      The functional role of Gsb, however, was examined within the spatial domain in which it is normally expressed. We knocked-down Gsb broadly and early in development and assayed its effects on NB5-6. Consistent with its established role in row-5/6 patterning, reduction of Gsb disrupted the specification of NB5-6 identity. In the converse experiment, broad misexpression of Gsb led to a partial expansion of NB5-6 markers. Because spatial patterning in the ventral nerve cord is organized into mutually exclusive row identities, changes in NB5-6 specification can be accompanied by reciprocal effects in neighbouring lineages. In our experiments, this is reflected in changes in markers of adjacent identities, particularly NB3-5. For this reason, NB3-5 markers provide a sensitive and informative readout of altered NB5-6 specification in the phenotypic analyses.

      We recognize that this point may not have been clear in the original manuscript. To avoid similar confusion for readers, we will make this reasoning explicitly clear in the revision.

      Reviewer #2 (Public review):

      Summary:

      This article by Bhattacharya et al. investigates how neural stem cells (NSCs, NBs) in Drosophila integrate spatial and temporal cues to activate neuron-specific terminal selector (TS) genes. Prior to this work, it was understood that NSCs utilize spatial transcription factors (STFs) and temporal transcription factors (TTFs) to determine lineage identity and birth order, but the mechanisms of integration were not fully elucidated. The authors employed chromatin profiling techniques to analyze the binding of STFs and TTFs in two specific neuroblast lineages, NB5-6 and NB7-4. They found that Gsb (an STF) binds both accessible and less-accessible chromatin in NB5-6, while En (another STF) binds only to pre-accessible chromatin in NB7-4. The findings support an "STF code" where the combination of pioneer and non-pioneer spatial factors, along with temporal factors, triggers neuroblast-specific enhancer activation and determines lineage identity.

      We appreciate the reviewer’s careful summary of our findings and their clear articulation of the STF-code framework that emerges from the work.

      Strengths:

      The experiments are well-executed, the interpretations are generally sound, and the figures are clear and elegant. However, some conclusions are drawn too broadly without essential functional data. Therefore, additional work is needed to more effectively convey the central message.

      We thank the reviewer for their positive assessment of the experiments, interpretation, and figures, and we respond to their specific concerns below.

      Weaknesses:

      (1) Integration of TaDa and functional data on Gsb for the STF model

      The authors demonstrate that TaDa profiling maps Gsb binding across the genome and identifies candidate chromatin-priming sites in NB5-6. Gsb LOF/GOF experiments reveal effects on NB identity. Combining TaDa data with LOF and GOF analyses indicates that Gsb influences NB5-6 specification by binding to both open and relatively closed chromatin, helping maintain NB5-6 identity while limiting NB3-5 fate.

      However, the study does not establish a direct link between specific LOF/GOF phenotypes and particular genomic targets. For instance, analyzing Gsb occupancy at lineage-specific identity factors or terminal selector genes (such as Lbe, Ap, or Eya for NB5-6; and Ems, etc., for NB3-5) in wild-type and manipulated conditions (Gsb misexpression) would directly connect chromatin binding to the regulation of fate determinants. These investigations would strengthen the mechanistic connection between the correlative TaDa profiles and the observed identity changes, supporting the idea that Gsb functions as a context-dependent chromatin-priming factor within the STF code, rather than as a generic transcription factor.

      We thank the reviewer for this very helpful suggestion. We agree that illustrating how the TaDa binding profiles relate to known lineage determinants will help connect the genome-wide chromatin data to the developmental phenotypes. In the revision therefore, we will examine Gsb occupancy at several genes associated with NB5-6 and NB3-5 identity (including Lbe, Ap, Eya, and Ems).

      (2) Gsb misexpression reveals bidirectional chromatin remodelling

      Experiments with ectopic Gsb expression demonstrate bidirectional chromatin remodeling in NB7-4, showing decreases in accessibility at some binding sites and increases at others. While the authors show that Gsb can disrupt chromatin upon misexpression, interpreting its "pioneer-like" or chromatin-priming activity is complex due to several factors: the misexpression occurs in a non-native lineage, the direct versus indirect effects rely on whole-embryo Dam-Gsb peaks instead of NB7-4-specific binding, and heat-shock-induced chromatin changes are not fully accounted for. These issues make it challenging to definitively determine Gsb's role in chromatin priming.

      A complementary approach would be to perform Gsb knockdown/loss-of-function in its native NB5-6 lineage and profile chromatin accessibility (TaDa or CATaDa). This would allow a cleaner, more physiologically relevant assessment of Gsb's contribution to priming, SoI establishment, and Hb recruitment. Such an experiment would strengthen the causal link between Gsb occupancy and chromatin state and clarify whether Gsb truly acts as a context-dependent pioneer in vivo, rather than producing indirect effects due to ectopic misexpression.

      We thank the reviewer for this thoughtful comment. We agree that the ectopic Gsb misexpression experiment in NB7-4 should be interpreted as a test of chromatin-remodelling capacity rather than as a fully physiological assay of Gsb function in its native NB5-6 context. At the same time, we note that ectopic expression in a non-native lineage is a standard approach used to assess pioneering or chromatin-remodelling capacity, precisely because it tests whether a factor can alter chromatin outside its endogenous setting. In the revision, we will explicitly discuss this distinction.

      We also agree that NB7-4-specific Gsb occupancy under misexpression would provide a cleaner distinction between direct and indirect effects. In the current manuscript, we infer likely direct effects from overlap with whole-embryo Gsb Dam profiles: loci that lose accessibility upon Gsb misexpression overlap whole-embryo Gsb binding, whereas loci that gain accessibility generally do not. We interpret this as support for the idea that decreased accessibility is more likely to reflect direct Gsb action, whereas increased accessibility is more likely to be indirect. We will clarify this logic in the revision.

      Regarding the reviewer’s suggestion of profiling chromatin accessibility after Gsb loss in native NB5-6, we completely agree that this would be an important complementary experiment. However, this experiment is not currently possible in our system. Gsb is required before NB specification/delamination, whereas available NB5-6 Gal4 drivers turn on only after this stage, precluding the use of RNAi. Early mutant analysis is also technically difficult because homozygous mutant embryos cannot be readily identified at the required stage, and the TaDa/CATaDa approach in this system requires large amounts of input material collected during the very short Hb window. We also tested an early CRISPR-based strategy using maternally contributed Cas9, but in this context the NB5-6 driver is lost, preventing TaDa/CATaDa profiling. We will therefore revise the manuscript to acknowledge that the current misexpression data support chromatin-remodelling capacity and are consistent with context-dependent priming, while not definitively establishing endogenous priming activity in NB5-6.

      (3) En is not a pioneer factor

      The authors conclude that Engrailed (En) is not a pioneer factor, based on the observation that En binding correlates with accessible chromatin and that En is not enriched at NB5-6-specific SOIs. However, this conclusion is not sufficiently supported by the functional data.

      We thank the reviewer for raising this point. We agree that, in several places, our wording was stronger than warranted by the data. For example, we stated that this pattern “argues against a pioneer role for En” and that the results “indicate that En does not act as a pioneer factor.” We agree that these statements are too definitive given the current evidence. Below, we address each of the reviewer’s specific concerns and explain the reasoning behind our original interpretation.

      First, the absence of En binding at NB5-6-specific SOIs does not necessarily indicate an inability to engage closed chromatin. These regions were not selected for the presence of En consensus motifs, so their lack of occupancy may simply reflect the absence of En binding motifs rather than a lack of pioneering capacity. A systematic motif analysis at NB5-6-specific SOIs is needed to determine whether En binding sites are present but unoccupied.

      We agree that the absence of En binding at NB5-6-specific SOIs alone would not be sufficient to infer a lack of pioneering activity, particularly if these loci do not contain En consensus motifs. That observation was only the starting point for our interpretation. Our reasoning was based on several additional lines of evidence from the genome-wide analysis:

      (1) When we examined En binding genome-wide, we consistently found that En occupancy in NB7-4 is restricted to regions of accessible chromatin.

      (2) Loci that are less accessible in NB7-4 show no detectable En occupancy.

      (3) Accessibility is strongly predictive of En binding: chromatin accessibility is markedly higher at En-bound loci than at En-unbound loci.

      Taken together, these patterns suggested to us that En binding in this lineage occurs primarily at pre-accessible chromatin rather than at less accessible regions that would require priming.

      Our interpretation was also guided by the broader literature. To our knowledge, neither Drosophila Engrailed nor its vertebrate homologues (EN1/EN2) have been reported to bind nucleosome-occluded DNA or initiate chromatin opening, which further informed our original interpretation.

      That said, we agree with the reviewer that these observations are suggestive rather than definitive. We will therefore temper the language throughout the manuscript so that we do not make categorical claims about En lacking pioneer activity. We will also perform the suggested motif analysis at NB5-6-specific SOIs to determine whether En binding motifs are present at these loci, which should help clarify whether the lack of En occupancy reflects motif availability or chromatin state.

      Second, the claim that En lacks pioneer activity relies solely on a single steady-state TaDa/DamID occupancy assay at one developmental stage. Because pioneer factor interactions can be transient, low-affinity, and stage-specific, such binding may not be detected by TaDa, which also depends on local GATC density and methylation kinetics and may yield false negatives. Given these technical limitations, the absence of En binding at less accessible regions does not definitively rule out a priming role.

      We take the reviewer’s point that our data cannot definitively rule out En as a pioneer. At the same time, it may be useful to clarify that TaDa is not a snapshot assay. Because Dam-mediated methylation accumulates over time while the fusion protein is expressed, even weak or transient interactions can leave a detectable signal when averaged across many cells and across the duration of the expression window.

      This cumulative nature of the assay is why our consistent observation of strong enrichment of En at accessible loci, and no detectable enrichment at less accessible regions across the genome, led us to infer that En binding in NB7-4 is strongly conditioned on chromatin accessibility. We nevertheless agree that this does not definitively exclude rare or transient interactions below the detection threshold of the assay, and we will temper the language in the manuscript accordingly.

      In the absence of direct functional assays (En LOF/GOF), the authors should explicitly acknowledge these technical and conceptual limitations and tone down the claim that "En lacks pioneer activity".

      Yes, we will do that!

      (4) Clarity of STF-code Model and Central Message

      The manuscript begins by presenting two models, direct and epigenetic, but the central takeaway of the paper is not clear. Specifically, the nuanced roles of the spatial factors Gsb and En as chromatin-priming versus stabilizing/effector factors within an STF code, and the resulting division of labor, are not clearly illustrated. The distinction between Gsb as a chromatin-priming factor and En as a cofactor-dependent activator/stabilizer should be explicitly presented in a stepwise model for better clarity. The authors could strengthen this by providing a schematic with two sequential stages illustrating how neuroblast identity factors (STF code) change chromatin states to drive lineage-specific enhancer activation. The schematic can be shown from the neuroectoderm to individual NB lineages to make it more panoramic.

      We thank the reviewer for this suggestion and for clearly articulating the conceptual point. As the reviewer points out, the literature has generally framed spatial–temporal integration as two alternative models—direct regulation at pre-accessible enhancers versus epigenetic priming by spatial factors. Our results suggest that elements of both mechanisms may operate within a lineage through a combinatorial STF code, with different spatial factors playing distinct roles (for example, Gsb contributing to chromatin priming, while En acts primarily at pre-accessible enhancers together with Hb). We agree that this central idea would benefit from being illustrated more explicitly. In the revision we will add a schematic summarizing this proposed two-step model and clarify the relevant parts of the text.

      (5) Identification of Priming Factors in NB7-4

      While the authors suggest that an unknown priming factor might be responsible for establishing sites of integration in NB7-4, they do not identify or explore potential candidates for this role. Further investigation into what factors might be involved in chromatin priming in NB7-4 could provide a more complete understanding of the mechanisms at play.

      We agree that identifying the factor responsible for establishing sites of integration in NB7-4 would be very informative. However, doing so would require substantial additional experiments to systematically test candidate spatial factors and assess their effects on chromatin accessibility in this lineage. Our goal in the present study was to establish how spatial and temporal cues are integrated at lineage-specific enhancers rather than to fully dissect all components of the STF code in each lineage. Identifying the priming factor in NB7-4 is therefore an important next step that we intend to pursue in future work, and we will clarify this point in the Discussion.

      (6) Functional Validation of STF Code Components

      The study proposes an STF code for each neuroblast lineage, but the specific components of these codes, beyond Gsb and En, are not fully explored. Identifying and validating additional factors that contribute to the STF code in each lineage could strengthen the conclusions.

      We agree that identifying additional components of the STF codes operating in each lineage would be very informative. Our goal in this study was not to comprehensively define all spatial factors involved in each lineage, but rather to understand how spatial and temporal inputs are integrated at lineage-specific enhancers. By examining two well-characterized spatial factors with distinct properties -- Gsb in NB5-6 and En in NB7-4 -- we aimed to illustrate how different members of an STF code can play distinct roles in shaping chromatin accessibility and enhancer activation. Identifying additional factors that contribute to these lineage-specific codes will be an important direction for future work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Reviews:

      In this manuscript, the authors proposed an approach to systematically characterise how heterogeneity in a protein signalling network affects its emergent dynamics, with particular emphasis on drug-response signalling dynamics in cancer treatments. They named this approach Meta Dynamic Network (MDN) modelling, as it aims to consider the potential dynamic responses globally, varying both initial conditions (i.e., expression levels) and biophysical parameters (i.e., protein interaction parameters). By characterising the "meta" response of the network, the authors propose that the method can provide insights not only into the possible dynamic behaviours of the system of interest but also into the likelihood and frequency of observing these dynamic behaviours in the natural system.

      The authors studied the Early Cell Cycle (ECC) network as a proof of concept, specifically focusing on PI3K, EGFR, and CDK4/6, with particular interest in identifying the mechanisms that cancer could potentially exploit to display drug resistance. The biochemical reaction model consists of 50 equations (state variables) with 94 kinetic parameters, described using SBML and computed in Matlab. Based on the simulations, the authors concluded the following main points: a large number of network states can facilitate resistance, the individual biophysical parameters alone are insufficient to predict resistance, and adaptive resistance is an emergent property of the network. Finally, the authors attempt to validate the model's prediction that differential core sub-networks can drive drug resistance by comparing their observations with the knock-out information available in the literature. The authors identified subnetworks potentially responsible for drug resistance through the inhibition of individual pathways. Importantly, some concerns regarding the methodology are discussed below, putting in doubt the validity of the main claims of this work.

      While the authors proposed a potentially useful computational approach to better understand the effect of heterogeneity in a system's dynamic response to a drug treatment (i.e., a perturbation), there are important weaknesses in the manuscript in its current form:

      (1) It is unclear how the random parameter sets (i.e., model instances) and initial conditions are generated, and how this choice biases or limits the general conclusions for the case studied. Particularly, it is not evident how the kinetic rates are related to any biological data, nor if the parameter distributions used in this study have any biological relevance.<br /> (2) Related to this problem, it is not clear whether the considered 100,000 random parameter samples sufficiently explore parameter space due to the combinatorial explosion that arises from having 94 free parameters, nor 100,000 random initial conditions for a system with 50 species (variables).<br /> (3) Moreover, the authors filter out all the cases with stiff behaviour. This filtering step appears to select model parameters based on computational convenience, rather than biological plausibility.<br /> (4) Also, it is not clear how exactly the drug effect is incorporated into the model (e.g., molecular inhibition?), nor how it is evaluated in the dynamic simulations (e.g., at the beginning of the simulation?). Moreover, in a complex network, the results may differ depending on whether the inhibition is applied from the start or after the network has reached a stable state.<br /> (5) On the same line, the conclusions need to be discussed in the context of stability, particularly when evaluating the role of initial conditions. As stable steady states are determined by the model parameters, once again, the details of how the perturbation effect is evaluated on the simulation dynamics are critical to interpret the results.<br /> (6) The presented validation of the model results (Fig. 7) is only qualitative, and the interpretation is not carefully discussed in the manuscript, particularly considering the comparison between fold-change responses without specifying the baseline states.

      We thank the reviewers for their thoughtful and constructive comments. In response to their comments, we have undertaken a substantial revision to address all the comments, improve clarity, transparency, and robustness while preserving the paper’s core contribution: a principled, scalable framework (MDN) for mapping how molecular heterogeneity and network architecture shape adaptive drug-response dynamics. At a high level, we clarified the study design and analysis goals, tightened definitions, and added methodological detail where it most advances interpretability. Importantly, these updates leave the analytical pipelines and major conclusions unchanged.

      Conceptually, we now make explicit that our objective is coverage of the output space of qualitative dynamics supported by the network topology, not exhaustive enumeration of parameter space. To support this, we added a convergence analysis and clarified that “triplicates” refers to independent ensembles used to demonstrate reproducibility. We also refined how we describe and implement initial conditions (as conserved total abundances that encode expression heterogeneity) and reframed filtering as minimal numerical/feasibility checks, using rejection sampling to obtain the prespecified ensemble size. Solver choices and input modelling (constant step mitogen/drug) are now spelled out succinctly.

      We expanded the model specification and rationale (complete reaction list with rate laws and brief biological justifications in the Supplement) and unified terminology throughout. Figures and legends have been overhauled for readability and accuracy, with missing labels added and ordering corrected. For validation, we clarified the nature of the single-cell reporter readout, improved Figure 7’s presentation, and emphasised - consistent with our aims - that comparisons are qualitative.

      Finally, we have rewritten the Discussion to centre on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.

      We believe these revisions materially strengthen the manuscript and fully address all the reviewers’ comments. A detailed, point-by-point response follows.

      Joint Recommendations for the Authors:

      (1) It is confusing exactly what are the different sets evaluated in each cases, e.g. "generated 100,000 model instances, each with the same set of ICs but a unique set of randomly generated parameter values" (lines 299-300), "generated 100,000 model instances (in triplicate), each with the same set of 'nominal' parameter values (see supplementary Table S1), and a unique set of ICs, and repeated the analysis as performed previously" (lines 366-368), "combined the 1000 IC sets with each parameter set to create 1000 model instances" (lines 382-383), "repeated for 1000 parameter sets, allowing us to observe how frequently IC variation induced adaptive resistance independent of the chosen parameter set" (lines 386-387). A small table or just a clearer explanation is needed.

      In response to these comments, we have revised the main text to clarify the process of model instance generation. Specifically, we have made changes at page 7: line 297 - page 8: line 302, page 8: lines 305 - 310, page 9: lines 372-378, and page 9: line 384 – page 10: line 399 in the revised main text.

      We have also added a new Figure (Figure S1) to the supplementary file to allow readers to visualise the model generation process for each relevant set of experiments. Supplementary figures are referenced in the main text where appropriate.

      (2) The authors mentioned performing each simulation in triplicate, which is puzzling as the model is based on deterministic ODEs with fixed parameters for each simulation. Under such conditions, one would anticipate identical results from multiple simulations with the same initial conditions and fixed parameters. Perhaps the authors expect the model to exhibit chaos or aim to assess the precision of the parameter estimates through triplicate simulations. Further clarification from the authors would be valuable to comprehend the rationale behind conducting triplicate simulations in a deterministic setting.

      We agree that repeating deterministic ODE simulations with identical inputs would be redundant. In our study, “triplicate” referred instead to generating three independent ensembles of 100,000 unique model instances each, where model parameters (or initial conditions) were randomly resampled. These ensembles were analysed separately to assess whether the inferred meta-dynamic distributions converged robustly. Indeed, the distributions from the three replicates were nearly indistinguishable, confirming that the results are reproducible and not artefacts of a particular random draw.

      We have revised the main text to clarify this distinction (page 8: lines 305 - 310) and added an extended explanation for meta-dynamic behaviour convergence in the new section Error Convergence in the supplementary text (page 6: lines 184 - 210).

      (3) While the lack of a connection between model parameters and biological data (mentioned in the public review) may not be a fatal flaw in the manuscript, the concern about the 100,000 random samples being insufficient to explore the parameter space is valid. In a thought experiment, considering the high and low rate for each parameter and the combinatorial explosion of possibilities (2^94), the number of simulations performed (100,000) represents only an extremely small fraction of the entire parameter space (~1/10^(23)). This limitation might not accurately capture the true heterogeneity present inside a solid tumour. One potential solution is to determine biological bounds on model parameters through data fitting, which can provide more meaningful constraints for the simulations. Alternatively, increasing the number of simulations and adopting more efficient sampling techniques can enhance the coverage of possible parameter sets.

      We thank the reviewer for this insightful comment. We agree that the 94-dimensional parameter space is vast, and that 100,000 simulations represent only a fraction of the total combinatorial possibilities. However, the objective of our study is not to exhaustively sample the entire parameter space, but rather to sufficiently sample the ‘output space’ - that is, the complete spectrum of qualitative dynamic behaviours the network topology can generate. The key question is whether 100,000 model instances are sufficient for the distribution of these output dynamics to converge.

      To formally address this, we have performed a convergence analysis, which is now detailed in the new supplementary section "Error Convergence" (Supplementary text page 6: lines 184 - 210) and illustrated in Supplementary Figure S12. This analysis demonstrates that the mean squared error (MSE) between dynamic distributions from N and 2N simulations exponentially decreases as N increases, and the distribution of protein dynamics changes negligibly well before reaching 100,000 instances. Furthermore, performing the entire analysis in triplicate with independent random seeds yielded nearly identical meta-dynamic maps (average standard deviation < 0.04%), giving us high confidence that we have robustly captured the network's behavioural repertoire.

      We believe this convergence occurs because the system is degenerate: many distinct parameter sets within the high-dimensional space map to the same qualitative outcome (e.g., 'rebound' or 'decreasing'). Our goal was to capture the set of possible outcomes, not every unique parameter combination that leads to them.

      Regarding the parameter range, we intentionally chose a broad, unbiased range (10<sup>-5</sup> to 10<sup4></sup>)as a proof-of-concept to delineate the theoretical upper limit of heterogeneity the network can support, thereby capturing even rare but potentially critical resistance dynamics. We agree with the reviewer that a future direction is to constrain these ranges using biological data. Such an approach would transition from defining what is possible (the focus of this manuscript) to predicting what is probable in a specific biological context. We have added this important point to the Discussion (page 16: lines 663-679) to highlight this avenue for future work.

      (4) One of the manuscript's main results indicates that protein interactions play a more significant role in driving adaptive resistance than protein expression. To explore the impact of protein expression, the authors fixed a nominal parameter set and generated 100,000 initial concentrations of the 50 proteins in the ODE model. However, the simulations' equilibrium concentrations in the "starvation" and "fed" phases, which form the initial condition for the treated phase, are uniquely determined by the nominal model's kinetic parameters and not the initial conditions, which remain identical for each simulation. From a dynamical systems perspective, stable steady states are determined by the model parameters and attract all initial conditions within their basin of attraction. As a result, a random sampling of the initial conditions has a limited impact on the model dynamics. The authors' conclusion that "the ability of expression to induce resistance also seems to be dependent on the master parameter set" can be explained by this dynamical systems perspective, where the resistance state corresponds to a stable steady state determined by the master parameter set. Considering this, the evidence presented in the manuscript may not fully support the authors' conclusion regarding the importance of protein expressions relative to protein dynamics. The discrepancy might be attributed to a possible misunderstanding of this point, and further clarification from the authors could be helpful.

      We thank the reviewer for the thoughtful perspective. We agree that, in a monostable system with fixed kinetic parameters and fixed conserved totals, varying only the initial split among moieties (e.g., X vs pX) will not change the final steady state; trajectories converge to the same attractor. In our analysis, however, “initial conditions” predominantly refer to total protein abundances (e.g., X_tot = X + pX + complexes), used as a proxy for expression heterogeneity. These totals are invariants on the simulated timescale (no synthesis/degradation in the pre-equilibration phases), and therefore alter the value of the steady state under a given parameter set. In other words, our IC sampling mostly varies conserved totals rather than merely redistributing a fixed total; hence the equilibrium reached after the starvation/fed pre-equilibrations depends on the sampled totals and the kinetics. This can be seen in the new Supplementary Figure S4, showing that changing the ICs does shift the eventual steady state even when kinetic parameters are fixed.

      We have revised the text to: (1) define ICs explicitly as total abundances for multi-state species, (2) distinguish “initial split” from “conserved totals,” and (3) clarify that expression effects are context-dependent rather than universally dominant (page 4: lines 139-141 and page 10: lines 413-416)

      (5) Additionally, it is important to note that the random sampling of 100,000 initial concentrations might not sufficiently explore the vast space of possible initial conditions. In the thought experiment mentioned earlier, where each protein can have high or low expression concentrations, there are approximately 2^(50) = ~10^(15) possible combinations of initial concentrations. Thus, the 100,000 random simulations only represent around ~1/10^(10) of the possible initial conditions in this simplistic scenario. Consequently, this limited sampling of initial conditions may not provide enough information to draw meaningful conclusions, even if the initial conditions were more directly linked to kinetic rates.

      Please see our response to Comment (3). Briefly, our ICs are continuous total abundances (conserved moieties), not binary high/low states; many IC configurations converge to the same qualitative attractors, so we estimate distributional properties rather than enumerate all combinations. Our convergence diagnostics (independent replicates and sample-size doubling) show that the meta-dynamic distributions stabilise well before N=100,000 (see Supplementary Figure S12). We have clarified this in the Supplementary Information (Error Convergence section) with the new convergence results.

      (6) The authors implement a parameter selection step in the manuscript, where they filter out parameter sets that lead to what they term non-biological simulations. However, the rationale for determining if a given parameter set results in a stiff system of ODEs remains unclear. The authors cite references [38,39] to support the claim that stiff equations are not biologically plausible. Still, upon review, it is evident that [38] does not include the term "stiff," and [39] discusses using implicit methods to simulate stiff ODE models without specifically commenting on the biological plausibility of stiff systems. The manuscript lacks direct evidence to justify the conclusion that filtering out parameter sets that result in stiff ODE systems is reasonable. Since the filtering step accounts for the majority of discarded parameter sets, a stronger foundation is required to support the statement that stiff equations are non-biological.

      We thank the reviewer for pointing out the issue in our original justification. The reviewer is correct: stiff systems are a common feature of biological models, and our claim that they are likely ‘biologically implausible’ was not well substantiated. The filtering of these model instances was, in fact, due to a computational limitation rather than a biological principle. The issue was that these parameter sets produced systems of ODEs that were so numerically stiff they were unsolvable within a reasonable timeframe by the SUNDIALS ODE solver suite, which is specifically designed for such systems.

      Following the reviewer's comment, we investigated the source of this prohibitive stiffness. We discovered it was not an intrinsic property of the parameter sets themselves, but rather an artifact of our simulation setup. The extreme stiffness occurred almost exclusively during the initial integration timesteps, caused by the large initial discrepancy between the concentrations of active and inactive protein forms. This large discrepancy created the conditions for overtly stiff solutions i.e. unsolvable with implemented ODE solve settings. To overcome this problem, we set a large maximum number of steps in the ODE solver for the first couple of time points, enabling the solver to overcome the excessively stiff portion of the solve. We found that the vast majority of the previously 'unsolvable' model instances could now be successfully simulated. Consequently, the number of parameter sets discarded due to solver failure is now negligible (< 1%), and this filtering step no longer accounts for the majority of discarded parameter sets. Most importantly, the distributions of dynamics were not significantly altered by this adaptation.

      We have revised the " Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section to reflect this more accurate understanding. We have corrected our original claim regarding the biological plausibility of stiff systems and corrected our use of the references. Ref [38] was included to demonstrate that models of biological systems are stiff, which was a major conclusion of that paper, and [39] was originally included to demonstrate that solving ODEs is reliant on solvers that can integrate stiff systems. Upon review, ref [39] has been removed.

      Overall, this investigation has made our analysis more robust by allowing us to include a wider, more representative range of parameter sets, and has tangibly improved the quality of our study.

      (7) Additionally, it is important to consider the standard method for accounting for stiff systems, as presented in [39], which involves using implicit numerical methods for ODE simulation. The authors mention using numerical methods from the SUNDIALS suite, which includes implicit methods, but the specific numerical method used remains unclear. Furthermore, it would be valuable for the authors to disclose the number of parameter sets that were filtered to obtain the final set of 100,000 accepted parameter sets. This information would provide insights into the extent of filtering and the proportion of parameter sets that were excluded during the selection process.

      We apologise for the lack of specific detail and have now updated the text. To clarify, all ODE simulations were performed using the CVODE solver from the SUNDIALS suite. This solver employs an implicit, variable-order, variable-step Backward Differentiation Formula (BDF) method, which is robust and specifically designed for handling the stiff systems common in biological network modelling. We have now explicitly stated this in the "ODE model construction, modelling, and simulations (page 4: lines 162 – 164)" section of the Methods.

      Regarding the filtered parameters, we have included a revised and detailed discussion of this in the "Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section (see our response to comment (6) above). Briefly, after applying the filters, ~40–45% of instances did not reach steady state within the simulation timeframe, and ~50–55% did not meet the minimum drug-response criterion. Approximately 10% satisfied all criteria and were retained for analysis. Importantly, we employed ‘rejection sampling’ and continued drawing until we had N = 100,000 accepted instances that satisfied all the criteria.

      (8) An important step in the simulation process described by the authors is the simulation of the "fasted" and "fed" states until an equilibrium is reached. However, it is not clear how the authors determine if the system has reached an equilibrium. It would be helpful if the authors could provide more information regarding the criteria used to assess equilibrium in the simulations. Regarding the "fed" state, it is not explicitly stated whether the mitogen stimulus is assumed to be constant throughout the "fed" experiment. Considering the dynamic nature of mitogen stimulation in biological systems, it would be beneficial if the authors could clarify this assumption and discuss its biological relevance.

      We apologise for the lack not specifying this in the original text. A simulation was considered to have reached equilibrium when the concentration of every protein species changed by < 1% over the final 100 time steps of the simulation phase. We have now added this criterion to the "Sampling and filtering of model instances (page 5: lines 177 – 179)" part of the Methods section.

      Regarding the second part of the comment, in our simulations, both the mitogenic and the drug inputs were modelled as constant, stepwise functions that, once turned on, remained at a fixed concentration for the remainder of the simulation. The biological rationale for this choice was to rigorously test for bona fide adaptive resistance. By maintaining a constant mitogenic and drug pressure, we can ensure that any observed recovery in the activity of downstream proteins is due to the internal rewiring and adaptation of the signalling network itself, rather than an artefact of the removal or decay of the external stimulus/drugs. We have now clarified this rationale in the "ODE model construction, modelling, and simulations (page 4: lines 168 – 171)" part of the Methods section.

      (9) The "Description of Model Scope and Construction" section in the Supplementary Information should include explicitly the model reactions and some discussion about their specific form (e.g., why is '(((kc2f1*pIR*PI3K) / (1 + (pS6K/Ki2))) + (kc2f2*pFGFR*PI3K))' representing the phosphorylation rate of PI3K, with pS6K in the denominator?).

      The reviewer is right to ask for model justification. We have expanded the Supplementary “Description of Model Scope and Construction” section (page 2: line 63 – page 5: line 185) to include a complete reaction list with rate laws and a brief rationale for each. We also explain the specific PI3K phosphorylation term: activation by pIR and pFGFR is attenuated by pS6K via a denominator, which captures the well-described S6K-mediated negative feedback that reduces activation (e.g., via IRS1 phosphorylation).

      (10) In line 349, the statement "Given that CDK46cycD is only strongly suppressed in just under 60% of the model instances (Figure 3C)" lacks clarity regarding where to look to interpret the 60% value. If this means that 4 out of the 7 model instances are resistant, and the other 2 proteins also have the same percentage of resistance, then there is no apparent reason to focus solely on CDK46cycD.

      The reviewer is correct; the figure reference was an error, which has been rectified in the main text (page 9: line 355). The actual figure reference was to Supplementary Figure 2A, which shows the heatmap of all the frequencies for each protein dynamics for all the active protein forms. CDK4/6cycD shows a sustained decreasing dynamic for 59.93% of model instances, which is where this number was derived. We have also now explicitly referenced this number in the supplementary Figure 2A legend.

      We focus on CDK4/6cycD because it is the direct pharmacological target of CDK4/6 inhibitors. Our point was to suggest that even when the target is suppressed in the majority of instances (~60%), this does not reliably propagate to uniform downstream inhibition across the network, thus highlighting emergent, network-driven adaptive responses.

      (11) We observed that in Fig. 5A, the authors show that multiple pathways are blocked. However, it is unclear whether they reduced the value of one parameter in the experiment or simulated multiple combinations of parameter inhibition. Considering the large number of parameters (94) in the model, if the authors simulated all possible combinations of parameter inhibition, the number of combinations would be significantly more than 94. An actual inhibitor typically has an inhibitory effect on multiple molecules. Therefore, it would be necessary to identify the parameters that lead to drug resistance when multiple molecules are inhibited. However, examining the inhibition patterns for all 94 parameters would be practically impossible. As a potential approach, we suggest using ensemble learning techniques, such as random forests, to handle this problem efficiently. With a dataset of binary outputs indicating the presence or absence of resistance for a sufficient number of inhibition patterns, ensemble learning can be applied to find the parameters that contribute to drug resistance. Popular feature selection algorithms like Boruta could be utilised to identify the most relevant parameters. The results obtained by ensemble learning are similar to the ranking in Fig. 5C, potentially providing a more robust validation of the authors' findings. By incorporating these additional analyses, the authors could strengthen the reliability and significance of their results related to parameter inhibition and drug resistance.

      We appreciate the suggestion and the opportunity to clarify. Figure 5A depicts multiple pathways were interrogated, but in the analysis, parameters were inhibited one at a time (OAT) - not in combination. We have revised the figure legend and added a section named “Protein knockdown perturbation analyses (page 6: lines 228 – 233)” in the Methods section to make this explicit. Moreover, some additional text in the main text has been slightly modified to make this clearer (page 11: lines 462-463, page 24: lines 856-857).

      We chose the OAT design intentionally to obtain causal, first-order attribution of control points across a broad parameter ensemble without confounding from simultaneous co-inhibition. This provides an interpretable ranking of primary drivers (Figure 5C) that is consistent with the paper’s mechanistic focus. We agree that a multi-target inhibition approach could be a useful next step; however, an exhaustive combinatorial screen is beyond the scope of this proof-of-concept. In such future studies, the ensemble learning, as suggested by the reviewer, could be layered onto our MDN framework to assess robustness of the ranking under co-inhibition.

      (12) In explaining the parameterization of the model, we find an implication of a quantitative model. However, upon examining the results in Fig. 7D, we observe that they are only qualitatively correct. When comparing Figs. 7A and 7C, we note that many model instances are immediately suppressed, and the time scale remains unknown. We believe it would be essential for the authors to explain how the model of this study maintains its quantitative nature despite the results in Fig. 7. If such an explanation cannot be provided, it raises concerns regarding the biological reliability of several findings within this study.

      While our framework is built on quantitative ODEs, the validation we present in Figure 7 is indeed qualitative. This is an intentional and key feature of our study's design. Our goal was not to build a calibrated, quantitative model of a specific cell line (e.g., MCF10A), but rather to establish a proof-of-concept theoretical framework that systematically explores the full spectrum of dynamic behaviours a given network topology can possibly generate. To achieve this, we intentionally sampled parameters from a very broad, unbiased range to delineate the theoretical upper limit of heterogeneity. This in silico population is therefore designed to be far more heterogeneous than any single isogenic cell line.

      The striking qualitative agreement seen between our meta-dynamic distributions and the single-cell data in Figure 7D is thus not a failure of quantitative prediction, but rather a strong validation of our core premise: that a significant degree of signalling heterogeneity exists in cell populations and that our framework can effectively capture its emergent properties.

      Regarding the specific comment on Figure 7C, we apologise for the lack of clarity. Nominally, we chose to simulate for 24 hours however, the x-axis in our simulations represents arbitrary time units, as the timescale is dependent on the meaning/units of the parameter values. The goal is to compare the qualitative shape of the response (e.g., rebound, sustained decrease), not the absolute time in hours. Moreover the rapid initial suppression seen in many of our model instances (Fig 7C) is a direct parallel to the rapid suppression seen in the experimental data (Fig 7A). This initial phase is followed by a wide variety of adaptive behaviours (or lack thereof) in both our simulations and the real cells, which is the key phenomenon we are studying.

      We have revised the text (page 14: lines 598-601) and Figure 7’s legend to state more explicitly that our validation is qualitative and to clarify the purpose of our broad, uncalibrated approach. We have also added a note in the Discussion (page 18: lines 744-747) that calibrating this framework with cell-line-specific data is a natural next step for generating quantitative, context-specific predictions.

      (13) Related to the previous point, the experimental data is presented as fold-change during CDK4/6 inhibition, and we notice that the initial fold-change at time 0 varies between 1 and 1.8. The difference in initial fold-change is unclear to us, as our understanding of fold-change typically corresponds to the change from baseline, typically represented by the protein concentration at time 0.

      Furthermore, while the experimental data exhibits uniformly decreasing CDK4/6 activity, a substantial number of simulations indicate constant CDK4/6cycD, showing a significant qualitative discrepancy between the simulations and experimental findings. This disparity makes it difficult for us to interpret the comparison between the two datasets effectively, given the complexities in comprehending the experimental fold-change figure.

      As Figure 7 serves as the primary validation of model simulations in the manuscript, we believe that the current presentation may not provide a compelling reason to believe that the model accurately captures experimental data. To enhance clarity and validation, we suggest overlaying the experimental data over the simulations or considering the median and 10/90% percentile of the experimental data, which may potentially offer improved readability and facilitate a more robust interpretation of the comparison.

      The experimental data from Yang et al. (ref 55, main text) measures kinase activity using a nucleus-to-cytoplasm translocation reporter system, wherein a bait protein is phosphorylated by the target kinase causing it to translocate from the nucleus to the cytoplasm. Hence, the y-axis represents the ratio of nuclear vs. cytoplasmic fluorescence, not a fold-change from a t=0 baseline. The variation in the starting value (between 1 and 1.8) reflects the inherent heterogeneity in the reporter's localization across individual cells even before the drug is added. We have updated the y-axis label and revised Fig. 7’s legend to state this explicitly.

      The most likely explanation for the discrepancy between experimental dynamics and our simulation dynamics is that the experimental data comes from an isogenic cell line that is largely sensitive to CDK4/6 inhibition. Our simulations are derived from a very wide parameter sweep, where the intent is to represent all possible cell states. It is quite striking that that there is such a high correlation between the experimental data and simulations, indicating that perhaps the heterogeneity of even isogenic cell lines is significantly greater than might be intuited; a point we now mention in the revised Discussion (page 17: lines 716-727).

      It is worth noting again, that our analysis is intentionally constructed to be as heterogeneous as possible, and is not trained on any biological data that might otherwise constrain the output-behaviour space. The isogenic cell line almost certainly represents a much more constrained output-behaviour space than our analysis.

      The y-axis label has also been updated accordingly. As mentioned in (12) this result is intended as a qualitative validation, showing that cell lines indeed have highly variable signalling dynamics. Given the range of parameters tested, we think it is surprising that the degree of agreement between the experiment and our analysis is as high as it is. Again, we believe this suggests that heterogeneity may be more prevalent than is intuited. We do not believe we have made any strong quantitative claims in the main text, and we certainly aim to work towards biological, quantitative validation in the future. Finally, we altered the wording of the results heading (page 14: line 562) to make it clear that we are only making qualitative claims and removed the claim that the evidence was strong.

      With these clarifications and corrections, we believe the validation is now much more compelling. The key point is not a perfect quantitative match, but the strong similarity in the distribution of heterogeneous behaviours.

      (14) The authors mention simulating treatment with 10nM of CDK4/6i or Ei, but specific details on how this treatment is included in the model simulations are not provided. This lack of information makes it challenging to fully evaluate the comparison between model simulations and experimental evidence in Figure 7. It would be highly appreciated if the authors could clarify how the treatment with CDK4/6i or Ei is incorporated into the simulations to facilitate a better understanding and interpretation of the results.

      To clarify, the effects of the inhibitors were incorporated directly into the kinetic rate laws of their respective target reactions.

      CDK4/6 inhibitor (CDK4/6i): This was modelled as an inhibitor of the formation of the active CDK4/6-cyclin D complex. We have now explicitly detailed this in the description for reaction R27 in the "Description of Model Scope and Construction" section of the Supplementary Information.

      Estrogen Receptor inhibitor (Ei): This was modelled as an inhibitor of the estrogen-dependent activation of the Estrogen Receptor. This is now explicitly detailed in the description for reaction R15 in the same supplementary section.

      It is however important to reiterate that our goal in Figure 7 is qualitative, shape-based comparison; therefore, we used a fixed fractional inhibition (reported in Methods) rather than a calibrated IC50/Hill model.

      (15) The authors state strong support for their modelling conclusions based on the literature. However, we still have concerns regarding the validation of the model against CDK2 or CDK4/6 data in Figure 7, as it appears less convincing to us. Furthermore, the authors list known resistance mechanisms that are replicated in their modelling. Nevertheless, we find the conclusion somewhat weakened by Figure S10, where approximately 80% of the nodes are implicated in some form of resistance pathway. This raises questions about the model's selectivity, as many proteins included in the model seem to drive resistance in some manner. In the Supplementary Information, the authors mention excluding or abstracting some protein species from the mitogenic and cell cycle pathways to manage computational resources effectively. This abstraction makes it difficult to determine if the proteins identified as potential drivers of resistance genuinely drive resistance or might represent abstractions of other potential drivers. To enhance the manuscript's clarity and address potential concerns about the model's selectivity and abstraction, we suggest providing more details and discussion in the main text.

      The reviewer's observation that a large number of nodes are implicated in resistance pathways in Figure S10 is correct. However, we argue this is not a weakness of the model's selectivity, but rather a key finding that reflects the biological reality of adaptive resistance. The literature is replete with a wide and growing number of distinct mechanisms of resistance even to a single class of drugs (1,2), which supports the idea that cancer can co-opt a wide variety of network nodes to survive.

      Figure S10 is not a binary map where every implicated node is equal, instead it is a likelihood map, where the colour and weight of the connections represent how often a particular interaction participates in driving resistance across the theoretical full range of possible network dynamics. The figure shows that while many nodes can contribute to resistance, they do so in a hub-like manner i.e. small subsets of nodes coordinate to drive resistance. This provides a rationalised, data-driven prioritisation of the most dominant and recurrent resistance strategies. We draw two important conclusions from this work 1) Resistance likely occurs due to resistance hubs, not individual proteins, and 2) that the frequency of a resistance hub in an MDN analysis is likely proportional to the frequency of that hub emerging as a resistance mechanism in a population of cells and patients.

      Regarding the issue of abstraction, the reviewer is correct that this is an inherent feature of any tractable systems model. In our case, several species in the mitogenic/cell-cycle pathways are module-level proxies to control model size. The highly implicated "hub" nodes in our model likely represent critical cellular processes that are themselves composed of several individual protein interactions.

      To address these concerns, we have significantly revised the Discussion (page 16: lines 681 – 694) to: (1) frame resistance as a network-level phenomenon; (2) show that our frequency-based ranking is selective, prioritising the most probable, recurrent mechanisms; and (3) clarify that - given model abstraction -our findings implicate critical processes (modules), not just single proteins, as the drivers.

      Overall, these changes do not alter our main conclusions: adaptive resistance is an emergent, network-level property; many routes exist, but a smaller set of nodes/modules consistently carry the largest influence across heterogeneous contexts.

      (16) We consider that the figures and legends, including the supplementary information, are inadequately explained. The information provided is insufficient for us to comprehend the figures fully, leading to the need for interpretation on our part as readers. This could potentially introduce biases when trying to understand the claims made by the authors. To improve our understanding, it would be essential for the authors to assign appropriate labels to the figures and provide comprehensive explanations in the legends. For example, in Fig 3, we suggest labelling the tree diagrams in panels A and B, as well as the colour bars. We also recommend applying the same approach to other figures, adding accurate axis labels and descriptions of colour gradients to enhance clarity.

      We thank the reviewer for this critical feedback. To address this comment, the figure legends have been revised where appropriate and greatly expanded to improve their comprehension. Moreover, we have added explicit labels to all previously unlabelled components, such as the cluster dendrograms and colour code bars in Figure 3A, B.

      (17) To enhance readability, we recommend interchanging the order of Figures 1 and 2 in the sequence they appear in the main text. Alternatively, the text can be adjusted to refer to the figures in the correct order. Additionally, attention should be given to the bottom of Fig 1, which appears to be cropped or cut off. Furthermore, the incorrect word spacing in some figure elements, such as Fig. 3A title, Fig. 5B title, and Fig. 6B y-label, should be corrected for improved visual presentation.

      Following the reviewer’s comment, the order of Figures 1 and 2 has been switched to reflect the order in which they are referred to in the main text. These Figures have been re-exported to fix unintentional word spacing errors.

      (18) We recommend that the language used to refer to the initial conditions in the manuscript is clarified and homogenised. Currently, the authors use different terms such as "basal expression," "protein expression," "state variable values," or "initial conditions" to refer to them. This variation in terminology can be confusing for readers. In particular, the use of "basal expression" is problematic, as it typically refers to the leaky value of a reaction in the absence of an inducer, making it another biophysical parameter of the system rather than an initial condition. To enhance clarity and consistency, we suggest the authors decide on a single term to refer to the initial conditions throughout the manuscript and provide a clear explanation of its meaning to avoid any confusion. This will help readers better understand the concept being discussed and prevent any potential misinterpretations.

      We thank the reviewer for this very helpful suggestion. To resolve this and improve clarity, we have homogenized the language throughout the manuscript. We now clarify the use the following 3 terms in their specific contexts:

      We use “protein abundances” exclusively for the conserved total abundances of multi-state species (e.g., Xtot = X + pX + complexes) that are sampled across instances to represent expression heterogeneity.

      We use ‘initial conditions’ to refer to initial values of the state variables in a model simulation. This term is related to protein abundance as the setting of initial conditions for conserved species sets the protein abundance. This is explicitly stated in the text (page 3: lines 87 - 91).

      We use “state variables” to refer to the time-dependent model species.

      We avoid the term “basal expression” in technical descriptions. Where a biology-facing phrase is helpful, we use “protein expression level”. This is used when referring to the biological concept that the initial conditions are intended to represent, i.e. the heterogeneity in protein amounts across a cell population.

      We have performed a thorough search-and-replace to ensure this new convention is applied consistently and have removed the potentially confusing term "basal expression" from the revised manuscript.

      (19) Why are saturable functions (e.g., Michaelis-Menten functions) ignored in the model? What are the potential consequences?

      The main objective of this work was to perform a large-scale, systematic exploration of a high-dimensional parameter space (94 parameters) to map the full repertoire of qualitative dynamic behaviours a network topology can support. Using saturable functions like Michaelis-Menten kinetics would have roughly doubled the number of parameters to be explored (from k to Vmax and Km for each enzymatic reaction), making a parameter sweep of this scale computationally intractable. We therefore prioritised the breadth of the parameter search over the depth of kinetic detail, which we believe is the appropriate choice for a proof-of-concept study focused on heterogeneity.

      This simplification has potential consequences. A major one is that our model cannot capture phenomena that arise specifically from enzyme saturation, such as zero-order kinetics or certain forms of ultrasensitivity (switch-like responses). However, we argue that this is an acceptable trade-off for two main reasons: (1) Our analysis is based on classifying broad, qualitative response shapes (increasing, decreasing, rebound, etc.). Mass-action kinetics are fully capable of generating this rich spectrum of behaviours; and (2) by varying the mass-action rate constants over nine orders of magnitude (from 10<sup>-5</sup> to 10<sup4></sup>), our parameter sweep effectively samples a vast range of reaction efficiencies. A very low rate-constant can approximate the behaviour of a saturated, low-efficiency enzyme, while a high rate-constant can approximate a highly efficient, non-saturated one. In this way, the broad sweep of the rate parameter partially reflects the effects that would be captured by varying Vmax and Km.

      For transparency, we have added a brief rationale to the “ODE model construction, modelling, and simulations” part of the Methods (revised main text, page 4: lines 153-155) and the "Description of Model Scope and Construction" section in the Supplementary file (Supplementary text page 2: lines 63-73).

      (20) Given the relevance of the concept of "heterogeneity" in this work, a short discussion about biochemical noise and its implications on the analysis (e.g., why it is not included, and if it will be a next step) would be appreciated.

      Our MDN modelling framework represents heterogeneity by creating an ensemble of deterministic models, where each model instance has a unique set of kinetic parameters and/or initial protein abundances. We propose that this is a powerful way to mechanistically represent the functional consequences of all sources of cellular variation. Over time, the effects of genetic mutations, epigenetic states, and even the time-averaged impact of intrinsic biochemical noise will manifest as changes in the effective interaction strengths and protein concentrations within a cell. Our large-scale parameter/IC sweep is designed to systematically explore the full range of dynamic behaviours that can emerge from this underlying biological variation. Therefore, our approach does not compete with stochastic modelling but is complementary to it. While stochastic simulations can capture the dynamic trajectories of single cells, our framework provides a panoramic view of the entire spectrum of possible stable phenotypes that can emerge at the population level. We agree that modelling intrinsic biochemical noise (stochasticity arising from finite copy numbers), e.g. using chemical Langevin or SSA, is a possible extension in future work but expected to be very computationally expensive. We have added a brief discussion on this as future direction in the revised Discussion.

      (21) We have noticed that the first four paragraphs of the Discussion section overlap with the Introduction, as they mainly reiterate the significance of the study itself rather than focusing on the specific results obtained. To avoid redundancy and provide a more cohesive and informative discussion, we recommend that the authors shift the focus of the Discussion section towards presenting potential interpretations, even if they are not definitive, of the results obtained. By doing so, the Discussion will serve as a valuable platform for deeper analysis and insightful observations, allowing readers to better comprehend the implications and significance of the research findings.

      We thank the reviewer for this structural feedback. Following the reviewer's feedback, we have significantly rewritten and restructured the Discussion section. The redundant introductory material has been removed.

      The rewritten Discussion centres on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.

      We believe this substantial revision has transformed the Discussion into a much more insightful and valuable part of the manuscript that directly addresses the reviewer's concerns.

      (22) The supplemental text file containing the model equations can be a bit challenging to read and understand. It would be greatly beneficial if the authors could consider generating a file using a typesetting program.

      We have now included a typeset list of state variable equations and ODEs, along with the original model files.

      (23) The authors mentioned that some model parameterizations result in negative solutions, which is surprising. Access to the model equations would help understand why this happens and is crucial for researchers who may want to use this approach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.ach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.

      The reviewer is correct to be surprised by the mention of negative solutions, as negative concentrations are physically impossible. We clarify that these are not a result of any structural flaw in our model's equations but are a well-known, although rare, numerical artifact of floating-point arithmetic in computational solvers.

      Our model is constructed using standard mass-action and first-order kinetics, which structurally guarantee non-negativity. However, when a species' concentration approaches the limits of machine precision (i.e., becomes a very small number extremely close to zero), the ODE solver can, in rare instances, numerically undershoot zero, resulting in a small negative value. If this occurs, it can lead to instability in subsequent integration steps.

      This is not a biological phenomenon but a computational one. Therefore, the standard and appropriate procedure, which we follow, is to implement a filter that discards any simulation trajectory where such a numerical instability occurs.

      (24) The reference listed for the CDK4/6 and CDK2 measurements is Yang et al. [55] in the figure caption, but as Xe et al. in lines 559-561 of the manuscript.

      The text has been updated to match citation.

      (25) We suggest that the authors revise and cite a previous study conducted by Yamada et al. (Scientific Reports, 2018), which presents an approach to expressing cell heterogeneity as a probability distribution of model parameters.

      Following this suggestion, we have revised the Discussion (see response to comment (21)) to include and discuss Yamada et al. (Scientific Reports, 2018), which models cell heterogeneity as a probability distribution over parameter values.

      (26) In the manuscript, on line 677, the authors state, "This indicates that there is an upper limit to the degree to which parameter sets can influence the qualitative shape of a protein's dynamic within a given network topology." We wish to highlight that this finding may not be particularly surprising. Given that the parameters were randomly determined within a specific range, it is understandable that altering the number of parameter samples would not substantially impact the distribution of model instances.

      We thank the reviewer for this insightful comment, which allows us to clarify the significance of this finding. While it is true that any sampling from a fixed distribution will eventually converge statistically, our conclusion is not about statistics but about the intrinsic, constraining properties of the network's topology. The novelty is not that the distribution converges, but that it converges to a surprisingly limited and finite repertoire of qualitative dynamic behaviours. A complex, non-linear network with nearly 100 free parameters could theoretically generate an almost endless variety of complex dynamics. Our finding is that this specific biological topology acts as a powerful filter, robustly channelling the vast majority of the near-infinite parameter combinations into a small, recurring set of functional outputs (increasing, decreasing, rebound, etc.).

      The reason for this finite limit is mechanistic, as the reviewer's comment prompted us to investigate further. Our parameter sweep already covers an extremely wide, 9-order-of-magnitude range. As we pushed parameter values to even greater extremes in exploratory simulations, we found they do not generate novel, complex dynamic shapes. Instead, they tend to drive network nodes into saturated states- either permanently "on" (maximally activated) or permanently "off" (minimally activated). In both cases, the node becomes unresponsive to upstream perturbations.

      Therefore, further expanding the parameter range would be unlikely to uncover new behavioural categories; it would simply increase the proportion of model instances classified as "no-response." This demonstrates a fundamental principle: the network topology itself enforces an upper limit on its dynamic complexity. We think this inherent robustness is what allows for reliable cellular signalling in the face of constant biological variation. We believe this is a non-trivial finding, and we have revised the Discussion (page 16: lines 664 - 680) to state this conclusion and its implications more clearly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript titled "Dynamic Architecture of Mycobacterial Outer Membranes Revealed by All-Atom 1 Simulations", Brown et al describe outcomes of all-atom simulation of a model outer membrane of mycobacteria. This compelling study provided three key insights:

      (1) The likely conformation of the unusually long chain alpha-branched beta-methoxy fatty acids, mycolic acids in the mycomembrane, to be the extended U or Z type rather than the compacted W-type. (2) Outer leaflet lipids such as PDIM and PAT provide regional vertical heterogeneity and disorder in the mycomembrane that is otherwise prevented in a mycolic acid-only bilayer. (3) Removal of specific lipid classes from the symmetric membrane systems leads to significant changes in membrane thickness and resilience to high temperatures.

      In addition to the three key insights, we would like to add one more; (4) asymmetric mycomembrane presents a phase transition from a disordered outer leaflet to an ordered inner leaflet.

      Strengths:

      The authors take a step-wise approach in building the complexity of the membrane and highlight the limitations of each of the approaches. A case in point is the use of supraphysiological temperature of 333 K or even higher temperatures for some of the simulations. Overall, this is a very important piece of work for the mycobacterial field, and will help in the development of membrane-disrupting small molecules and provide important insights for lipid-lipid interactions in the mycomembrane.

      We appreciate Reviewer’s positive view on our work.

      Weaknesses:

      (1) The authors used alpha-mycolic acids only for their models. The ratios of alpha, keto, and methoxy-mycolic acids are known in the literature, and it may be worth including these in their model. Future studies can be aimed at addressing changes in the dynamic behavior of the MOM by altering this ratio, but the inclusion of all three forms in the current model will be important and may alter the other major findings of the current study.

      We agree that adjusting the ratios of mycolates may impact the dynamic behavior of the MOM. However, including various ratios of these lipids would require much work and introduce unnecessary complexity to our model; believe or not, the current work took more than 3 years. Investigations into the effects of mycolate structure in the MOM would be interesting and suitable for future studies.

      (2) The findings from the 14 different symmetric membrane systems developed with the removal of one complex lipid at a time are very interesting but have not been analysed/discussed at length in the current manuscript. I find many interesting insights from Figures S3 and S5, which I find missing in the manuscript. These are as follows:

      (a) Loss of PDIM resulted in reduced membrane thickness. This is a very important finding given that loss of PDIM can be a spontaneous phenomenon in Mtb cultures in vitro and that this is driven by increased nutrient uptake by PDIM-deficient bacilli (Domenech and Reed, 2009 Microbiology). While the latter is explained by the enhanced solute uptake by several PE/PPE transporter systems in the absence of PDIM (Wang et al, Science 2020), the findings presented by Brown et al could be very important in this context. A discussion on these aspects would be beneficial for the mycobacterial community.

      Following Reviewer’s suggestion, we have added the following to the Discussion section.

      “The outer leaflet symmetric bilayers, comprised of trehalose-derived glycolipids and PDIMs, reveal PDIM-dependent thickness. As observed in both symmetric outer leaflet systems and asymmetric systems, PDIM migrates to the bilayer midplane, causing the upper leaflet to bulge and increasing the overall thickness. Reduced thickness in the systems lacking PDIM, an important virulence factor for Mtb, may allow for higher nutrient uptake. This corroborates a 2009 study in which Domenech and Reed found a correlation between PDIM absence in vitro and attenuated virulence (Domenech and Reed, 2009).”

      (b) I find it interesting that loss of PAT or DAT does not change membrane thickness (Figure S3). While both PAT and PDIM can migrate to the interleaflet space, loss of PDIM and PAT has a different impact on membrane thickness. It is worth explaining what the likely interactions are that shape membrane thickness in the case of the modelled MOM.

      We have added the following to the section titled “Outer leaflet lipids drive unexpected membrane heterogeneity and softness of the Mycomembrane”.

      “Although PAT also migrates to the bilayer midplane, the PAT-deficient bilayers did not exhibit reduced thickness as the PDIM-deficient thickness did (Supporting Information Table S1). This may be due to fewer PAT than PDIM moving to the bilayer midplane. In the All_Lipids systems, PDIM migrates first, bulging the upper leaflet and reducing lipid headgroup crowding (Supporting Information Figs. S5, S6). In this slightly less crowded environment, hydrophobic forces from PAT’s tails overcome the hydrophilic forces from the trehalose headgroup, causing some PATs to move deeper into the hydrophobic region.”

      (c) Figure S5: Is the presence of SGL driving PDIM and PAT to migrate to the inter-leaflet space? Again, a discussion on major lipid-lipid interactions driving these lipid migrations across the membrane thickness would be useful.

      We have added the following to the section titled “Outer leaflet lipids drive unexpected membrane heterogeneity and softness of the Mycomembrane”.

      “Additionally, in SGL-deficient bilayers, fewer PDIMs and PATs move to the bilayer midplane. This may be due to the highly methylated lipid tails of SGL. When present in the bilayer, these methyl groups may disrupt lipid packing and increase fluidity, allowing more PDIMs to move into the hydrophobic region. Supporting Information Figure S8 shows the average lipid order parameter along each lipid tail for all outer leaflet symmetric systems. Without SGL, lipid tails are consistently more ordered, supporting the notion that SGL’s methylated tails are disrupting lipid packing. Further studies are necessary to investigate the effect of glycolipid-deficient compositions on the dynamic properties of the asymmetric MOM.”

      Reviewer #2 (Public review):

      Summary:

      The manuscript reports all-atom molecular dynamics simulations on the outer membrane of Mycobacterium tuberculosis. This is the first all-atom MD simulation of the MTb outer membrane and complements the earlier studies, which used coarse-grained simulation.

      The Reviewer is correct in that this is the first MD simulation of the Mtb outer membrane with diverse lipide types.

      Strengths:

      The simulation of the outer membrane consisting of heterogeneous lipids is a challenging task, and the current work is technically very sound. The observation about membrane heterogeneity and ordered inner leaflets vs disordered outer leaflets is a novel result from the study. This work will also facilitate other groups to work on all-atom models of mycobacterial outer membrane for drug transport, etc.

      We appreciate Reviewer’s positive view on our work.

      Weaknesses:

      Beyond a challenging simulation study, the current manuscript only provides qualitative explanations on the unusual membrane structure of MTb and does not demonstrate any practical utility of the all-atom membrane simulation. It will be difficult for the general biology community to appreciate the significance of the work, based on the manuscript in its current form, because of the high content of technical details and limited evidence on the utility of the work.

      Major Points:

      (1) The simulation by Basu et al (Phys Chem Chem Phys 2024) has studied drug transports through mycolic acid monolayers. Since the authors of the current study have all atom models of MTb outer membrane, they should carry out drug transport simulations and compare them to the outer membranes of other bacteria through which drugs can permeate. In the current manuscript, it is only discussed in lines 388-392. Can the disruption of MA cyclopropanation be simulated to show its effect on membrane structure?

      We acknowledge the potential for simulations of drug transport through our MOM model. However, we believe with the current timescale, these simulations may be better suited for a coarse-grained model of the MOM. We plan to do this in the future, but it is out of the scope of the current study. We have added the following to the Discussion section to address this point.

      “Additionally, coarse-grained models of the outer membrane could aid in drug-transport studies, potentially revealing energetic pathways by which novel antibiotics penetrate the complex cell envelope over larger timescales.”

      (2) In line 277, the authors mention about 6 simulations which mimic lipid knockout strains. The results of these simulations, specifically the outcomes of in silico knockout of lipids, are not described in detail.

      We have added the following to the Discussion section to show the effect of glycolipid composition on the deuterium order parameter.

      “The outer leaflet symmetric bilayers, comprised of trehalose-derived glycolipids and PDIMs, reveal PDIM-dependent thickness. As observed in both symmetric outer leaflet systems and asymmetric systems, PDIM migrates to the bilayer midplane, causing the upper leaflet to bulge and increasing the overall thickness. Reduced thickness in the systems lacking PDIM, an important virulence factor for Mtb, may allow for higher nutrient uptake. This corroborates a 2009 study in which Domenech and Reed found a correlation between PDIM absence in vitro and attenuated virulence (Domenech and Reed, 2009). Although PAT also migrates to the bilayer midplane, the PAT-deficient bilayers did not exhibit reduced thickness as the PDIM-deficient thickness did. This may be due to fewer PAT than PDIM moving to the bilayer midplane. In the All_Lipids systems, PDIM migrates first, bulging the upper leaflet and reducing lipid headgroup crowding. In this slightly less crowded environment, hydrophobic forces from PAT’s tails overcome the hydrophilic forces from the trehalose headgroup, causing some PATs to move deeper into the hydrophobic region. Additionally, in SGL-deficient bilayers, fewer PDIMs and PATs move to the bilayer midplane. This may be due to the highly methylated lipid tails of SGL. When present in the bilayer, these methyl groups may disrupt lipid packing and increase fluidity, allowing more PDIMs to move into the hydrophobic region. Supporting Information Figure S8 shows the average lipid order parameter along each lipid tail for all outer leaflet symmetric systems. Without SGL, lipid tails are consistently more ordered, supporting the notion that SGL’s methylated tails are disrupting lipid packing. Further studies are necessary to investigate the effect of glycolipid-deficient compositions on the dynamic properties of the asymmetric MOM.”

      (3) Figure 5 shows PDIM and PAT-driven lipid redistribution, which is a significant novel observation from the study. However, comparison of 3B and 3D shows that at 313K, the movement of the PDIM head group is much less. Since MD simulations are sensitive to random initial seeds, repeated simulations with different random seeds and initial structures may be necessary.

      The difference in headgroup movement at different temperatures can be attributed to higher kinetics at 333K, causing the lipids to move faster. The relatively slow speed and computational load of running all-atom simulations make it difficult to simulate these lower temperatures on the timescales necessary to observe full aggregation of PDIM. However, CG simulations may be sufficient to sample these events. We have addressed this by adding the following to the Results section.

      “We also observed a stark difference in the speed with which PDIM and PAT migrate to the center at different temperatures. PDIM molecules do not fully aggregate at the membrane center until about 1500 ns at 313K, whereas they accumulate within 500 ns at 333K (Fig. 5B, 5D). This can be attributed to higher kinetics at 333K, causing the lipids to move faster. Coarse-grained models may be sufficient to observe full aggregation of hydrophobic species at the membrane midplane at lower temperatures.”

      (4) As per Figure 1, in the initial structure, the head group of PAT should be on the membrane surface, similar to TDM and TMM, while PDIM is placed towards the interior of the outer membrane. However, Figure 5 shows that at t=0, PAT has the same Z position as PDIM. It will be necessary to provide Z-position Figures for TMM and TDM to understand the difference. Is it really dependent on the chemical structure of the lipid moiety or the initial position of the lipid in the bilayer at the beginning of the simulation?

      We have added the following to the Results section to address this comment.

      “In all symmetric outer leaflet simulations, PDIM and PAT sit just below the headgroups of other lipids at the start of production, due to our equilibration scheme. During the last step of equilibration, lipid headgroups are allowed to move freely, which initiates migration to the membrane center and causes the slight difference between PDIM/PAT and the other lipids’ headgroup positions (Supporting Information Figs. S5, S6).”

      Minor Point:

      In view of the complexity of the system undertaken for the study, the manuscript in its current form may not be informative for readers who are not experts in molecular simulations.

      This work represents the first atomistic simulation of the mycobacterial outer membrane. While not perfectly realistic, as it does not include arabinogalactan or peptidoglycan, it does have extensive descriptions of each lipid simulated and their relevance to the survival of Mtb.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) The interface to build and set up all atom coordinates of the outer membrane of Mycobacterium tuberculosis should be available from CHARMM-GUI.

      The current manuscript is meant as a proof of concept for simulating bilayers composed of complex mycobacterial lipids. The current study itself took more than 3 years. Since we have developed CHARMM-GUI, the lipids described in this paper may be available in CHARMM-GUI in the future, but that is not the aim of this paper. Initial structures and final 50 ns of the simulations are available to readers (see Data Acknowledgements).

      (2) The difference between symmetric and asymmetric systems in Figures 2K and 2L is not at all clear, neither in the legend to the figure nor in the manuscript text. The color codes in 2K and 2L should be described with clarity. The authors should provide schematic diagrams similar to Figure 1 to explain each of the simulation systems they are discussing. This will clarify the difference between symmetric and asymmetric systems.

      We have updated Figure 1 to clearly show which systems are symmetric and which are asymmetric.

      (3) The first two sub-sections of the RESULT section discuss symmetric mycolic acid bilayers. The observations on thermal resilience and phase transitions are interesting, but the relevance of symmetric mycolic acid bilayers (Figures 3 & 4) to the major focus of the current manuscript (i.e., outer membrane consisting of multiple lipids) is not clear.

      Most previous simulations only focused on monolayers of mycolic acids. Our symmetric bilayers are used to provide reasonable APL and system compositions for the asymmetric membrane, so as to avoid area mismatch. We can also gain insights into how these unique lipids behave in symmetric bilayers, which may be useful to scientists aiming to study simpler membranes in the context of drug permeation or pore formation. These points have been addressed in the following addition to the Introduction section.

      “We have also used the equilibrated symmetric bilayers to estimate reasonable areas per lipid and facilitate the modeling of stable asymmetric systems.”

    1. Author response:

      General Statements

      First, we would like to thank the editor at Review Commons for the efficient handling of our manuscript. We also apologize for our delayed response.

      We would like to thank all three reviewers for their careful evaluation of our work and their constructive feedback, which will provide a valuable basis for improving the figures and the text, as described below. We expect to be able to complete the revision following the plan described below quickly.

      We would like to note that the reviewer reports (Rev. #1 and Rev. #3) made us realize that the manuscript text was misleading on the following point. Although we used the purified ATP hydrolysis–deficient Smc protein for sybody isolation, this does not restrict the selection to a specific conformation. As described in detail in Vazquez-Nunez et al. (Figure 5), this mutant displays the ATP-engaged conformation only in a smaller fraction of complexes (~25% in the presence of ATP and DNA), consistent with prior in vivo observations reported by Diebold-Durand et al. (Figure 5). Rather than limiting the selection to a particular configuration, our aim was to reduce the prevalence of the predominant rod state in order to broaden the range of conformations represented during sybody selection. Consistent with this interpretation, only a small number of isolated sybodies show strong conformation-specific binding in the presence or absence of ATP/DNA, as observed by ELISA (now included in the manuscript). We will revise the manuscript text accordingly to clarify this point.

      Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Gosselin et al., develop a method to target protein activity using synthetic single-domain nanobodies (sybodies). They screen a library of sybodies using ribosome/ phage display generated against bacillus Smc-ScpAB complex. Specifically, they use an ATP hydrolysis deficient mutant of SMC so as to identify sybodies that will potentially disrupt Smc-ScpAB activity. They next screen their library in vivo, using growth defects in rich media as a read-out for Smc activity perturbation. They identify 14 sybodies that mirror smc deletion phenotype including defective growth in fast-growth conditions, as well as chromosome segregation defects. The authors use a clever approach by making chimeras between bacillus and S. pnuemoniae Smc to narrow-down to specific regions within the bacillus Smc coiled-coil that are likely targets of the sybodies. Using ATPase assays, they find that the sybodies either impede DNA-stimulated ATP hydrolysis or hyperactivate ATP hydrolysis (even in the absence of DNA). The authors propose that the sybodies may likely be locking Smc-ScpAB in the "closed" or "open" state via interaction with the specific coiled-coil region on Smc. I have a few comments that the authors should consider:

      Major comments:

      (1) Lack of direct in vitro binding measurements:

      The authors do not provide measurements of sybody affinities, binding/ unbinding kinetics, stoichiometries with respect to Smc-ScpAB. Additionally, do the sybodies preferentially interact with Smc in ATP/ DNA-bound state? And, do the sybodies affect the interaction of ScpAB with SMC?

      It is understandable that such measurements for 14 sybodies is challenging, and not essential for this study. Nonetheless, it is informative to have biochemical characterization of sybody interaction with the Smc-ScpAB complex for at least 1-2 candidate sybodies described here.

      We agree with the reviewer that adding such data would be reassuring and that obtaining solid data using purified components is not easy even for a smaller selection of sybodies. We have data that show direct binding of Smc to sybodies by various methods including ELISA, pull-downs and by biophysical methods (GCI). Initially, we omitted these data from the manuscript as we are convinced that the mapping data obtained with chimeric SMC proteins is more definitive and relevant.  During the revision we will incorporate the ELISA data showing direct binding and also indicating a lack of preference for a specific state of Smc.

      (2) Many modes of sybody binding to Smc are plausible

      The authors provide an elaborate discussion of sybodies locking the Smc-ScpAB complex in open/ closed states. However, in the absence of structural support, the mechanistic inferences may need to be tempered. For example, is it also not possible for the sybodies to bind the inner interface of the coiled-coil, resulting in steric hinderance to coiled-coil interactions. It is also possible that sybody interaction disrupts ScpAB interaction (as data ruling this possibility out has not been provided). Thus, other potential mechanisms would be worth considering/ discussing. In this direction, did AlphaFold reveal any potential insights into putative binding locations?

      We have attempted to map the binding by structure prediction, however, so far, even the latest versions of AlphaFold are not able to clearly delineate the binding interface. Indeed, many ways of binding are possible, including disruption of ScpAB interaction. However, since the main binding site is located on the SMC coiled coils, the later scenario would likely be an indirect consequence of altered coiled coil configuration, consistent with our current interpretation.

      (3) Sybody expression in vivo

      Have the authors estimated sybody expression in vivo? Are they all expressed to similar levels?

      We have tagged selected sybodies with gfp and performed live cell imaging. This showed that they are all roughly equally expressed and that they localize as foci in the cell presumably by binding to Smc complexes loaded onto the chromosome at ParB/parS sites. We will include this data in the revised version of the manuscript.

      (4) Sybodies should phenocopy ATP hydrolysis mutant of Smc

      The sybodies were screened against an ATP hydrolysis deficient mutant of Smc, with the rationale that these sybodies would interfere this step of the Smc duty cycle. Does the expression of the sybodies in vivo phenocopy the ATP hydrolysis deficient mutant of Smc? Could the authors consider any phenotypic read-outs that can indicate whether the sybody action results in an smc-null effect or specifically an ATP hydrolysis deficient effect?

      As eluded to above, we think that our selection gave rise to sybodies that bind various, possibly multiple Smc conformations. Consistent with this idea, the phenotypes are similar to null mutant rather than the ATP-hydrolysis defective EQ mutant, which display even more severe growth phenotypes. We will add the following notes to the text:

      “These conditions favour ATP-engaged particles alongside the typically predominant ATP-disengaged rod-shaped state (add Vazquez Nunez et al., 2021).”

      “ELISA data confirm that nearly all clones bind Smc-ScpAB; however, their binding shows little or no dependence on the presence of ATP or DNA.”

      Minor comments:

      (1) It was surprising that no sybodies were found that could target both bacillus and spneu Smc. For example, sybodies targeting the head regions of Smc that might work in a more universal manner. Could the authors comment on the coverage of the sybodies across the protein structure?

      It is rather common that sybodies (like antibodies and nanobodies) exhibit strong affinity differences between highly conserved proteins (> 90 % identity). The underlying reasons for such strong discrimination are i) location of less conserved residues primarily at the target protein surface and ii) the large interaction interface between sybody and target which offers multiple vulnerabilities for disturbance, in particular through bulky side chains resulting in steric clashes. Another frequently observed phenomenon is sybody binding to a dominant epitope, which also often applies to nanobodies and antibodies. A great example for this are the dominant epitopes on SARS-CoV-2 RBDs.

      (2) Growth curves (Fig. S3) show a large jump in recovery in growth under sybody induction conditions. Could the authors address this observation here and in the text?

      We suppose that this recovery represents suppressor mutants and/or (more likely) improved growth in the absence of functional Smc during nutrient limitation (see Gruber et al., 2013 and Wang et al., 2013). We will add this statement to the text.

      (3) L41- Sentence correction: Loop can be removed.

      Ah, yes, sorry for this confusing error. Thank you.

      (4) L525 - bsuSmc 'E' :extra E can be removed.

      To do. Thank you.

      (5) References need to be properly formatted.

      To do. Thank you.

      (6) The authors should add in figure legend for Fig 1i) details on representation of the purple region, and explain the grey strokes for orientation of the loop.

      To do.

      (7) How many cells were analysed in the cell biological assays? Legends should include these information.

      To Be Included.

      Reviewer #1 (Significance):

      Overall, this is an impressive study that uses an elegant strategy to find inhibitors of protein activity in vivo. The manuscript is clearly written and the experiments are logical and well-designed. The findings from the study will be significant to the broad field of genome biology, synthetic biology and also SMC biology. Specifically, the coiled coil domain of SMC proteins have been proposed to be of high functional value. The authors have elegantly identified key coiled-coil regions that may be important for function, and parallelly exhibited potential of the use of synthetic sybody/designed binders for inhibition of protein activity.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Review: "Single Domain Antibody Inhibitors Target the Coiled Coil Arms of the Bacillus subtilis SMC complex" by Ophélie Gosselin et al, Review Commons RC-2025-03280 Structural Maintenance of Chromosome proteins (SMCs), a family of proteins found in almost all organisms, are organizers of DNA. They accomplish this by a process known as loop extrusion, wherein double-stranded DNA is actively reeled in and extruded into loops. Although SMCs are known to have several DNA binding regions, the exact mechanism by which they facilitate loop extrusion is not understood but is believed to entail large conformational changes. There are currently several models for loop extrusion, including one wherein the coiled coil (CC) arms open, but there is a lack of insightful experimentation and analysis to confirm any of these models. The work presented aims to provide much-needed new tools to investigate these questions: conformation-selective sybodies (synthetic nanobodies) that are likely to alter the CC opening and closing reactions.

      The authors produced, isolated, and expressed sybodies that specifically bound to Bacillus subtilis Smc-ScpAB. Using chimeric Smc constructs, where the coiled coils were partly replaced with the corresponding sequences from Streptococcus pneumoniae, the authors revealed that the isolated sybodies all targeted the same 4N CC element of the Smc arms. This region is likely disrupted by the sybodies either by stopping the arms from opening (correctly) or forcing them to stay open (enough). Disrupting these functional elements is suggested to cause the Smc-dependent chromosome organization lethal phenotype, implying that arm opening and closing is a key regulatory feature of bacterial Smc-ScpAB.

      In summary, the authors present a new method for trapping bacterial Smc's in certain conformations using synthetic antibodies. Using these antibodies, they have pinpointed the (previously suggested) 4N region of the coiled coils as an essential site for the opening and closing of the Smc coiled coil arms and that hindering these reactions blocks Smc-driven chromosomal organization. The work has important implications for how we might elucidate the mechanism of DNA loop extrusion by SMC complexes.

      Some specific comments:

      Line 75: "likely stabilizing otherwise rare intermediates of the conformational cycle." - sorry, why is that being concluded? Why not stabilizing longer-lived oncformations?

      We will clarify this statement!

      Line 89: Sorry, possibly our lack of understanding: why first ribosome and then phage display?

      Ribosome display offers to screen around 10^12 sybodies per selection round (technically unrestricted library size), while for phage display, the library size is restricted to around 10^9 sybodies due to the fact that production of a phage library requires transformation of the phagemid plasmid into E. coli, thereby introducing a diversity bottleneck. This is why the sybody platform starts off with ribosome display. It switches to phage display from round 2 onwards because the output of the initial round of ribosome display is around 10^6 sybodies, which can be easily transferred into the phage display format. Phage display is used to minimize selection biases. For more information, please consult the original sybody paper (PMID: 29792401).

      Line 100: Why was only lethality selected? Less severe phenotypes not clear enough?

      Yes, colony size is more difficult to score robustly, as the sizes of individual transformant colonies can vary quite widely. The number of isolated sybodies was at the limit of further analysis.

      Line 106: Could it be tested somehow if convex and concave library sybodies fold in Bs?

      We did not focus on the non-functional sybody candidates and only sybodies of the loop library turned out to cause functional consequences at the cellular level. Notably, we will include gfp-imaging showing that non-lethal sybodies are expressed to similar levels that toxic sybodies. Given the identical scaffold of concave and loop sybodies (they only differ in their CDR3 length), we expect that the concave sybodies fold in the cytoplasm of B. subtilis. For the convex sybodies exhibiting a different scaffold, this will be tested.

      Line 125: Could Pxyl be repressed by glucose?

      To our knowledge and experience, repression by glucose (catabolite repression) does not work well in this context in B. subtilis.

      Line 131: The SMC replacement strain is a cool experiment and removes a lot of doubts!

      Thank you! (we agree).

      Line 141: The mapping is good and looks reliable, but looks and feels like a tour de force? Of course, some cryo-EM would have been lovely (lines 228-229 understood, it has been tried!).

      Yes, we have made several attempts at structural biology. Unfortunately, Smc-ScpAB is not well suited for cryo-EM in our hands and crystallography with Smc fragments and sybodies did not yield well-diffracting crystals.

      Line 179: Mmmh. Do we not assume DNA binding on top of the dimerised heads to open the CC (clamp)?

      We will clarify the text here.

      Line 187: Having sybodies that presumably keep the CC together (closing) and some that do not allow them to come together correctly (opening) is really cool and probably important going forward.

      Thank you!

      Figure 1 Ai is not very colour-blind friendly.

      We are sorry for this oversight. We will try to make the color scheme more inclusive. Thank you for the notification.

      Optional: did the authors see any spontaneous mutations emerge that bypass the lethal phenotype of sybody expression?

      No, we did not observe spontaneous mutations suppressing the phenotype, possibly due to the limited number of cell generations observed. We tried to avoid suppressors by limiting growth, but this may indeed be a good future approach for further fine map the binding sites and to obtain insights into the mechanism of inhibition.

      Optional: we think it would be nice to try some biochemical experiment with BMOE/cysteine-crosslinked B. subtilis Smc in the mid-region (4N or next to it) of the Smc coiled coils to try to further strengthen the story. Some of the authors are experts in this technique and strains might already exist?

      We have indeed tried to study the impact of sybody binding on Smc conformation by cysteine cross-linking. However, we were not convinced by the results and thus prefer not to draw any conclusions from them. We will add a corresponding note to the text.

      Reviewer #2 (Significance):

      The authors present a new method for trapping bacterial Smc's in certain conformations using synthetic antibodies. Using these antibodies, they have pinpointed the (previously suggested) 4N region of the coiled coils as an essential site for the opening and closing of the Smc coiled coil arms and that hindering these reactions blocks Smc-driven chromosomal organization. The work has important implications for how we might elucidate the mechanism of DNA loop extrusion by SMC complexes.

      Thank you!

      Reviewer #3 (Evidence, reproducibility and clarity):

      Gosselin et al. use the sybody technology to study effects of in vivo inhibition oft he Bacillus subtilis SMC complex. Smc proteins are central DNA binding elements of several complexes that are vital for chromosome dynamics in almost all organisms. Sybodies are selected from three different libraries of the single domain antibodies, using the „transition state" mutant Smc. They identify 14 such mutant sybodies that are lethal when expressed in vivo, because they prevent proper function of Smc. The authors present evidence suggesting that all obtained sybodies bind to a coiled-coil region close to the Smc „neck", and thereby interfere with the Smc activity cycle, as evidenced by defective ATPase activity when Smc is bound to DNA.

      The study is well done and presented and shows that the strategy is very potent in finding a means to quickly turn off a protein's function in vivo, much quicker than depleting the protein.

      The authors also draw conclusions on the molecular mode of action of the SMC complex. The provide a number of suggestive experiments, but in my view mostly indirect evidence for such mechanism.

      My main criticism ist hat the authors have used a single - and catalytically trapped form of SMC. They speculate why they only obtain sybodies from one library, and then only idenfity sybodies that bind to a rather small part oft he large Smc protein. While the approach is definitely valuable, it is biassed towards sybodies that bind to Smc in a quite special way, it seems. Using wild type Smc would be interesting, to make more robust statements about the action of sybodies potentially binding to different parts of Smc.

      As explained above, we are quite confident the Smc ATPase mutation did not bias the selection in an obvious way. The surprising bias towards coiled coil binding sites has likely other explanations, as they likely form a preferred epitope recognized by sybodies.

      Line 105: Alternatively, the other libraries did not produce good binders or these sybodies were 106 not stably expressed in B. subtilis. This could be tested using Western blotting - I am assuming sybody antibodies are commercially available. However, this test is not important for the overall study, it would just clarify a minor point.

      While there are antibody fragments available to augment the size of sybodies (PMID: 40108246), these recognize 3D-epitopes and are thus not suited for Western blotting. We did not follow up on the negative results much, but would like to point out again that there are several biases that likely emerge for the same reason (bias to library, bias to coiled coil binding site). If correct, then likely few other sybodies are effectively lethal in B. subtilis, with the exception of the ones isolated and characterized. We have added this notion to the manuscript. We have also tested the expression of non-lethal sybodies by gfp-tagging and imaging. These results will be included in the revision.

      Fig. 2B: is is odd to count Spo0J foci per cells, as it is clear from the images that several origins must be present within the fluorescent foci. I am fine with the „counting" method, as the images show there is a clear segregation defect when sybodies are expressed, I believe the authors should state, though, that this is not a replication block, but failure to segregate origins.

      We agree that this is an important point and will add a corresponding comment to the text.

      Testing binding sites of sybodies tot he SMC complex is done in an indirect manner, by using chimeric Smc constructs. I am surprised why the authors have not used in vitro crosslinking: the authors can purify Smc, and mass spectrometry analyses would identify sites where sybodies are crosslinked to Smc. Again, I am fine with the indirect method, but the authors make quite concrete statements on binding based on non-inhibition of chimeric Smc; I can see alternative explanations why a chimera may not be targeted.

      We have made several attempts of testing direct binding with mixed outcomes and decided to not include those results in the light of the stronger and more relevant in vivo mapping. However, we will add ELISA results and briefly discuss grating coupled interferometry (GCI) data and pull-downs.

      Smc-disrupting sybodies affect the ATPase activity in one of two ways. Again, rather indirect experiments. This leads to the point Revealing Smc arm dynamics through synthetic binders in the discussion. The authors are quite careful in stating that their experiments are suggestive for a certain mode of action of Smc, which is warranted.

      In line 245, they state More broadly, the study demonstrates how synthetic binders can trap, stabilize, or block transient conformations of active chromatin-associated machines, providing a powerful means to probe their mechanisms in living cells. This is off course a possible scenario for the use of sybodies, but the study does not really trap Smc in a transient conformation, at least this is not clearly shown.

      We agree and will carefully rephrase this statement. Thank you.

      Overall, it is an interesting study, with a well-presented novel technology, and a limited gain of knowledge on SMC proteins.

      We respectfully disagree with the last point, since our unique results highlight the importance of the Smc coiled coils, which are otherwise largely neglected in the SMC literature, likely (at least in part) due the mild effect of single point mutations on coiled coil dynamics.

      Reviewer #3 (Significance):

      The work describes the gaining and use of single-binder antibodies (sybodies) to interfere with the function of proteins in bacteria. Using this technology for the SMC complex, the authors demonstrate that they can obtain a significant of binders that target a defined region is SMC and thereby interfere with the ATPase cycle.

      The study does not present a strong gain of knowledge of the mode of action of the SMC complex.

      As pointed out above, we respectfully disagree with this assertion.

      Description of analyses that authors prefer not to carry out

      As pointed out above, there are a few minor points that we prefer not to experimentally address. In particular, we do not consider it as necessary to determine the expression levels of sybodies which were non-inhibitory. We also wish to note that we attempted to obtain structural additional biochemical data and to that end performed cryo-EM, crystallography and cysteine cross-linking experiments. Unfortunately, we did not obtain sybody complex structures and the cross-linking data were unfortunately not conclusive.  We also wish to note that the first author has finished her PhD and left the lab, which limits our capacity to add additional experiments. However, as the reviewers also pointed out, the main conclusions are well supported by the data already.

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Tkacik et al describe their efforts to reconstitute and biochemically characterize ARAF, BRAF, and CRAF proteins and measure their ability to be paradoxically activated by current clinical and preclinical RAF inhibitors. Paradoxical activation of MAPK signaling is a major clinical problem plaguing current RAF inhibitors, and the mechanisms are complex and relatively poorly understood. The authors utilize their preparations of purified ARAF, BRAF, and CRAF kinase domains to measure paradoxical activation by type I and type II inhibitors, utilizing MEK protein as the substrate, and show that CRAF is activated in a similar fashion to BRAF, whereas ARAF appears resistant to activation. These data are analyzed using a simple cooperativity model with the goal of testing whether paradoxical activation involves negative cooperativity between RAF dimer binding sites, as has been previously reported. The authors conclude that it does not. They also test activation of B- and CRAF isoforms prepared in their full-length autoinhibited states and show that under the conditions of their assays, activation by inhibitors is not observed. In a particularly noteworthy part of the paper, the authors show that mutation of the N-terminal acidic (NtA) motif of ARAF and CRAF to match that of BRAF enhances paradoxical activation of CRAF and dramatically restores paradoxical activation of ARAF, which is not activated at all in its WT form, indicating a clear role for the NtA motif in the paradoxical activation mechanism. Additional experiments use mass photometry to measure BRAF dimer induction by inhibitors. The mass photometry measurements are a relatively novel way of achieving this, and the results are qualitatively consistent with previous studies that tracked BRAF dimerization in response to inhibitors using other methods. Overall, the paper establishes that WT CRAF is paradoxically activated by the same inhibitors that activate BRAF, and that ARAF contains the latent potential for activation that appears to be controlled by its NtA motif. The biochemical activation data for BRAF are qualitatively consistent with previous work.

      Strengths:

      While previous studies have put forward detailed molecular mechanisms for paradoxical activation of BRAF, comparatively little is known about the degree to which ARAF and CRAF are prone to this problem, and relatively little biochemical data of any sort are available for ARAF. Seen in this light, the current work should be considered of substantial potential significance for the RAF signaling field and for efforts to understand paradoxical activation and design new inhibitors that avoid it.

      Weaknesses:

      There are, unfortunately, some significant flaws in the data analysis and fitting of the RAF activation data that render the primary conclusion of the paper about the detailed activation mechanism, namely that it does not involve negative cooperativity between active sites, unjustified. This claim is made repeatedly throughout the manuscript, including in the title. Unfortunately, their data analysis approach is overly simplistic and does not probe this question thoroughly. This is the primary weakness of the study and should be addressed. A full biochemical modeling approach that accurately captures what is happening in the experiment needs to be applied in order for detailed inferences to be drawn about the mechanism beyond just the observation of activation.

      The authors' analysis of their RAF:MEK "monomer" paradoxical activation data (Figures 1, 3, and Tables 1, 2) suffers from two fundamental flaws that render the resulting AC50/IC50 and cooperativity (Hill) parameters essentially uninterpretable. Without explaining or justifying their choice, the authors use a two-phase cooperative binding model from GraphPad Prism to fit their activation/inhibition data. This model is intended to describe cooperative ligand binding to multiple coupled sites within a preformed receptor assembly, and does not provide an adequate description of what is happening in this complicated experiment. Specifically, it has two fundamental flaws when applied to the analysis in question:

      (a) It does not account for ligand depletion effects that occur with high-affinity drugs, and that profoundly affect the shapes of the dose-response curves, which are what are being fit 

      The chosen model is one of a class of ligand-binding models that are derived by assuming that the free ligand concentration is effectively equal to the total ligand concentration. Under these conditions, binding curves have a characteristic steepness, and the presence of cooperativity can be inferred from changes in this steepness as described by a Hill coefficient. However, many RAF inhibitors, including most of the type II inhibitors in this study, bind to the dimerized forms of at least one of the RAF isoforms with ultra-high affinity in the picomolar range (particularly apparent in Figure 1 with LY inhibiting BRAF). Under these conditions, the model assumption is not valid. Instead, binding occurs in the high-affinity regime in which the drug titrates the receptor and effectively all the added drug molecules bind, so there is hardly any free ligand (see e.g. Jarmoskaite and Herschlag eLife 2020 for a full description of this "titration" regime). The shapes of the curves under these conditions reflect the total amount of RAF protein (and to some extent drug affinity), rather than the presence of cooperativity. Fitting dose response curves with the chosen model under these conditions will result in conflating binding affinity and protein concentration with cooperativity.

      (b) It does not model the RAF monomer-dimer equilibrium, which is dramatically modulated by drug binding, rendering the results RAF-concentration dependent in a manner not accounted for by the analysis.

      The chosen analysis model also fails to consider the monomer-dimer equilibrium of RAF. This has two ramifications. Since drug binding is coupled to dimerization to a very strong degree, the observed apparent affinities of drug binding (reflected in AC50 and IC50 values) are functions of the concentration of RAF molecules used in the experiment. Since dimerization affinities are likely different for ARAF, BRAF, and CRAF, the measured AC50 values also cannot be compared between isoforms. This concentration dependence is not addressed by the authors. A related issue is that the model assumes drug binding occurs to two coupled sites on preformed dimers, not to a mixture of monomers and dimers. "Cooperativity" parameters determined in this manner will reflect the shifting monomer-dimer equilibrium rather than the cooperativity within dimers. Additionally, the inhibition side of the activation/inhibition curves is driven by binding of the drug to the single remaining site on the dimer, not to two coupled sites, and so one cannot determine cooperativity values for this process in this manner.

      As a result of both of these issues, the parameters reported in the tables do not correctly reflect cooperativity and cannot be used to infer the presence or absence of negative cooperativity between RAF dimer subunits. To address these major issues, the authors would need to apply a data analysis/fitting procedure that correctly models the biochemical interactions occurring in the sample, including both the monomer-dimer equilibrium and how this equilibrium is coupled to drug binding, such as that developed in e.g., Kholodenko Cell Reports 2015. Alternatively, the authors should remove the statements claiming a lack of negative cooperativity from the manuscript and alter the title to reflect this.

      The bell-shaped dose response model that we employed models the sum of two dose-response curves – one that activates and one that inhibits. That is a simple way of capturing the essence of paradoxical activation -- the superposition of drug-induced activation at low inhibitor concentrations with inhibition at higher concentrations. That said, we agree completely with the reviewer that the model does not capture the complexity of what is happening in the experiment. We worked extensively with the Kholodenko model (which we implemented in Kintek Explorer), which accounts for the effect of drug on the monomer/dimer equilibrium and for the affinity of drug for each protomer of a dimer (and can therefore model positive or negative cooperativity as well as non-cooperative binding). We could obtain excellent fits with this model with positive cooperativity – perhaps not surprising considering that this is a 12 parameter model – with reasonable Kd values for drug binding and monomer/dimer equilibrium. However, we ultimately chose not to include this analysis when we realized that the fits were not at steady-state. The underlying Kon and Koff rates for the reasonable Kd’s for monomer/dimer formation were unreasonably slow. We could also obtain superficially reasonable fits with negative or non-cooperative binding, but close inspection revealed that they did not accurately fit the steepness of the inhibition phase of the dose-response curves for type II inhibitors. Even the Kholodenko model does not capture all the key aspects of our experiment. Perhaps most notably competition with ATP, the effect of ATP on the monomer dimer equilibrium, and the divergent conformations of the kinase required for binding ATP vs a type II inhibitor. We put some effort into explicitly including ATP in the model, but quickly decided that it was beyond our modeling expertise (and it also was not feasible to implement in Kintek explorer). In the end, we settled on the bell-shaped dose-response model because it was the simplest model that fit the data. We expect to include a supplemental figure/note in the revised manuscript to discuss our work with the Kholodenko model. We will also acknowledge the limitations of the bell-shaped dose response model.

      This reviewer is also concerned that the steepness of the inhibition phase of the curves may be the result of enzyme-titration with these tight-binding inhibitors, rather than a result of positive cooperativity. We are reasonably sure that this is not the case. The shape of these curves and the IC50/AC50 values obtained is relatively insensitive to enzyme concentration, and we will include additional data in our revision to demonstrate this. Also, the steep hill slopes are unique to the type II inhibitors, which require a distinct inactive conformation of the kinase. Type I inhibitor SB590885 is similarly potent to the type II inhibitors, but does not exhibit this effect. If we were simply titrating enzyme, we would expect to see this with SB590885 as well.

      Also, we will clarify in the revised manuscript that our interpretation of positive cooperativity of inhibition by type II inhibitors is also supported by our prior work with 14-3-3-bound RAF dimers (Tkacik et al, JBC 2025). This is a much simpler experiment, as dimers are pre-formed. We have now done a thorough study of the effect of enzyme concentration on the IC<sub>50</sub> and apparent cooperativity in dimer inhibition, which we will include in our revised manuscript. These experiments confirm that we are not in a regime where we are titrating enzyme.

      As an aside, with respect to models that incorporate free inhibitor concentration, we did try to fit our 14-3-3-bound dimer inhibition data (in Tkacik et al, JBC 2025) with the Morrison equation for tight-binding inhibitors, which does take into account free ligand concentration. The fits were not reasonable with type II inhibitors, at least in part due to the non-ATP-competitive behavior of the type II drugs. Also the Morrison equation does not model cooperativity.

      Some other points to consider

      (1) The observation that ARAF is not activated by type II inhibitors is interesting. A detailed comparison of the activation magnitudes between inhibitors and between A-, B-, and CRAF is hampered by the arbitrary baseline signal in the assay, which arises from a non-zero FRET ratio in the absence of any RAF activity. The authors might consider background correcting their data using a calibration curve constructed using MEK samples of known degrees of phosphorylation, so that they can calculate turnover numbers and fold activation values rather than an increase over baseline. This will likely reveal that the activation effects are more substantial than they appear against the high background signal.

      We will explore this for our revision.

      (2) The authors note that full-length autoinhibited 14-3-3-bound RAF monomers are not activated by type I and II inhibitors. However, since this process involves the formation of a RAF dimer from two monomers, the process would also be expected to be concentration dependent, and the authors have only investigated this at a single protein concentration. Since disassembly of the autoinhibited state must also occur before dimerization, it might be expected to be kinetically disfavored as well. Have the authors tested this?

      Good points. We have carried out this experiment at more than one enzyme concentration and differing reaction times, and also failed to see activation. However, we have not systematically explored either variable.

      (3) ATP concentration modulates activation. While this is an interesting observation, some of this analysis suffers from the same issue discussed above, of not considering high-affinity binding effects. For instance, LY is not affected by ATP concentration in their data (Figure 4D), but this is easily explained as being due to its very tight binding affinity, resulting in titration of the receptor and the shape of the inhibition curve reflecting the amount of RAF kinase in the experiment and not the effective Kd or IC50 value.

      As discussed above, we’ve convinced ourselves that we are not simply titrating enzyme. It occurred to us that such an effect could explain both the steepness of the inhibition curves with LY and other type II inhibitors and the apparent ATP-insensitivity. Our studies of concentration-dependence and the correlation of this effect with the type II binding mode argue against this possibility.

      Finally, as an overarching comment to this Reviewer and the others, we understand well that our enzyme inhibition studies (here and in Tkacik 2025) do not rise to the level of a formal demonstration of cooperative ligand binding. We envision a future study in which we could address this directly, perhaps by using single molecule fluorescence to observe on/off rates for binding of fluorescently tagged inhibitors to immobilized RAF dimers. (This is clearly beyond the scope of the present work).

      Reviewer #2 (Public review):

      This manuscript by Tkacik et al. uses in vitro reconstituted systems to examine paradoxical activation across RAF isoforms and inhibitor classes. The authors conclude that paradoxical activation can be explained without invoking negative allostery and propose a general model in which ATP displacement from an "open monomer" promotes dimerization and activation. The biochemical work is technically sound, and the systematic comparison across RAF paralogs (along with mutational/functional analysis) across inhibitor classes is a strength.

      However, the central mechanistic conclusions are overgeneralized relative to the experimental systems, and several key claims, particularly the dismissal of negative allostery and the proposed unifying model in Figure 6, are not directly supported by the data presented. Most importantly, the absence of RAS, membranes, and relevant regulatory context fundamentally limits the physiological relevance of several conclusions, especially regarding the current clinical type I.5 RAF inhibitors and paradoxical activation.

      Overall, this is a potentially valuable biochemical study, but the manuscript would benefit from more restrained interpretation, clearer framing of scope, and revisions to the model and title to better reflect what is actually tested.

      (1) A central issue is that the biochemical system lacks RAS, membranes, 14-3-3 and endogenous regulatory factors that are known to be required for paradoxical RAF and MAPK activation in cells. As previous work has repeatedly shown and the authors also acknowledge, paradoxical activation by RAF inhibitors is RAS-dependent in cells, and this dependence presumably explains why full-length autoinhibited RAF complexes are refractory to activation in the authors' assays.

      Importantly, the absence of paradoxical activation by type I.5 inhibitors in this system is therefore not mechanistically informative. Type I.5 inhibitors (e.g., vemurafenib, dabrafenib, encorafenib), but not Paradox Breakers (e.g., plixorafenib), robustly induce paradoxical activation in cells because binding of the inhibitor to inactive cytosolic RAF monomer promotes a conformational change that drives RAF recruitment to RAS in the membrane, promoting dimerization. The inability of the type 1.5 inhibitor to suppress the newly formed dimers is the basis of the pronounced paradoxical activation in cells. In the absence of RAS and membrane recruitment, failure to observe paradoxical activation in vitro does not distinguish between competing mechanistic models.

      As a result, conclusions regarding inhibitor class differences, and especially the generality of the proposed model, should be substantially tempered.

      We will emphasize the limitations of our highly simplified experimental system in the revised manuscript, and temper some of our interpretations. And while the lack of membranes/RAS/14-3-3 in our system and the lack of observed PA with type I.5 inhibitors is a limitation of our study, we disagree that it renders our study of type I.5 inhibitors mechanistically uninformative. As seen here and consistent with prior studies, the binding mode of these compounds disfavors formation of the kinase dimer. While this may be overcome by 14-3-3 binding and other effects in the cellular context, it reflects a fundamental mechanistic difference as compared with type I and type II inhibitors, which also exhibit paradoxical activation.

      (2) The authors argue that their data argue against negative allostery as a central feature of paradoxical activation. However, the presented data do not directly test negative allostery, nor do they exclude it. The biochemical assays do not recreate the cellular context in which negative allostery has been inferred. Further, structural data showing asymmetric inhibitor occupancy in RAF dimers cannot be dismissed on the basis of alternative symmetric structures alone, particularly given the dynamic nature of RAF dimers in cells.

      Most importantly, negative allostery was proposed to explain paradoxical activation by Type I.5 RAF inhibitors, yet these inhibitors do not paradoxically activate in the assays presented here. The absence of paradoxical activation in this system, therefore, cannot be used to argue against a mechanism that is specifically invoked to explain cellular behavior not recapitulated by the assay.

      To be clear, we are not dismissing the possibility of negative cooperativity. And we do not think of our model as an alternative to the negative cooperativity model – rather it is a generalization that can account for paradoxical activation by diverse inhibitor classes, irrespective of positive, negative or non-cooperative modes of inhibition. We will emphasize these points in the revised manuscript.

      If negative allostery were a requisite feature of PA, we would not expect to see PA with type II inhibitors. As discussed in our response to Reviewer 1, we see clear evidence of positively cooperative inhibition of 14-3-3-bound RAF dimers by type II inhibitors (Tkacik JBC 2025) and in the present study, we find clear paradoxical activation by type II inhibitors (and there are many reports in the literature of PA by type II inhibitors in cellular contexts).

      (3) The model presented in Figure 6 is conceptually possible but remains speculative. Key elements of the model, including RAS engagement, membrane recruitment, 14-3-3 rearrangements, and the involvement of cellular kinases and phosphatases, are explicitly absent from the experimental system. Accordingly, the model is not tested by the data presented and should not be framed as a validated or general mechanism. The figure and accompanying text should be clearly labeled as a working or conceptual model rather than a mechanistically supported conclusion.

      We will revise the text to more clearly reflect that this is a working model, and importantly, that it is based on a large literature in this area in addition to the relevant experimental work in this manuscript.

      (4) The manuscript states that type I.5 inhibitors do not induce paradoxical activation in the biochemical assay because their C-helix-out binding mode disfavors dimerization. While this is true in isolation, it overlooks the well-established fact that type I.5 inhibitors (with the exception of paradox breakers) clearly promote RAS-dependent RAF dimerization in cells. This distinction is critical and should be explicitly acknowledged when interpreting the in vitro findings.

      We will explicitly make this point in the revised manuscript.

      (5) The title suggests a general mechanism for paradoxical activation across RAF isoforms and inhibitor classes, whereas the data primarily address type I and type II inhibitors acting on isolated kinase-domain monomers. A more accurate framing would avoid the term "general" and confine the conclusions to C-helix-in (type I/II) RAF inhibitors in a reduced biochemical context.

      As noted above, and in our response to Reviewer 3 below, we will clarify the contribution of data in present manuscript to the model and that it is based more broadly on the literature on PA and our insights into RAF structure and regulation. We will also revise the title to avoid the implication that the model arises mainly from the experimental data in the manuscript.

      Reviewer #3 (Public review):

      Summary:

      Tkacik et al. systematically characterized all three RAF kinase isoforms in vitro with all three types of RAF inhibitors (Type I, I1/2, and II) to investigate the mechanism underlying paradoxical activation.

      In this study, the authors reconstituted heterodimers of A-, B-, and C-RAF kinase domains bound to non-phosphorylable MEK1 (SASA), mimicking the monomeric auto-inhibited state of RAF. These "RAF monomers" were tested for MEK phosphorylation with an increasing concentration of all three types of RAF inhibitors (Type I, I1/2, and II). This study is reminiscent of a previous study of the same team measuring RAF kinase activity in the presence of all three types of inhibitors in the context of dimeric RAF isoforms stabilized by 14-3-3 proteins (Tkacik et al 2025 JBC). RAF monomers had little to no activity at low concentrations of inhibitors (consistent with their "monomeric state"). Addition of type I1/2 inhibitor did not induce paradoxical activation as, in this context, they do not induce RAF dimerization required for activation, as observed by MP. Addition of type I and type II inhibitors led to paradoxical activation consistent with the RAF dimerization induced by these inhibitors, as observed by MP. Interestingly, type II inhibitors induced activation only for B- and C-RAF and not A-RAF.

      At high concentrations of type II inhibitors, kinase activity is inhibited with a strong or weak positive cooperativity for BRAF and CRAF, respectively. This observation is very similar to what the authors previously observed with their dimeric RAF system. Interestingly, when the NtA motif is modified by phosphomimetic mutations in A- and C-Raf, basal kinase activity is stronger, but most importantly, inhibitor-induced paradoxical activation is much stronger with both type I and II inhibitors. This demonstrates that mutation of the NtA motif of ARAF and CRAF sensitized them to paradoxical activation by type II inhibitors.

      The authors also tested the effect of ATP in the paradoxical activation observed in their RAF "monomer" system. As previously published in their assay with 14-3-3 stabilized dimeric RAF, the authors observed an expected shift of the IC50 with Type I inhibitors, while Type II inhibitors seem to behave as a non-competitive inhibitor. The authors next reconstituted the MAP kinase pathway (with RAF monomers at the top of the phosphorylation cascade) to test paradoxical activation amplification. Again, Type I1/2 inhibitors did not induce paradoxical activation, while Type I and II inhibitors did. The authors tested the inhibitors with FL auto-inhibited RAF/MEK/14-3-3 complexes, where, contrary to the "RAF monomers" experiments, FL B- and C-RAF were not paradoxically activated but were inhibited by all three types of inhibitors.

      Overall, Tkacik et al. tackle an important question in the field for which definitive experiments and thorough biochemical investigation to understand the molecular mechanisms for the inhibitor-induced paradoxical activation are still missing, and of high importance for future drug development.

      Strengths:

      The biochemical experiments here are rigorously executed, and the results obtained are highly informative in the field to decipher the intricate mechanisms of RAF activation and inhibitor-induced paradoxical activation.

      Weaknesses:

      The interpretation of the results in the context of the current state of the art is ambiguous and raises questions about the relevance of introducing a new model for inhibitor-induced paradoxical activation, particularly since the findings presented here do not clearly contradict established paradigms. I believe some clarification and precision are required.

      While our model does not conflict with established paradigms (because it can allow for negative cooperativity) our experimental findings (here and in Tkacik et al JBC 2025) are in conflict with the negative allostery model. We will work to clarify this in the revised manuscript.

      Main comments:

      (1) Figure 2:

      The authors comment on the expected greater increase (for a cascade assay) in the magnitude of ERK phosphorylation compared to what was observed for MEK phosphorylation. However, this observation might be reflective of the stoichiometries used in the assay, with 40 times more MEK compared to RAF concentration (250nm vs 6nM), which might favour pERK vs pMEK.

      The authors should clarify their rationale for the protein concentration used in this assay and explain how protein stoichiometry was taken into account for the interpretation of their results.

      The Reviewer makes a good point, the concentrations and ratios chosen are expected to make a substantial difference in observed amplification. We intended this experiment more as a qualitative demonstration of cascade amplification and will clarify this in the revised manuscript.

      In addition, the authors should justify comparing pMEK and pERK TR-FRET values when different anti-phospho antibodies were used. Antibodies may have distinct binding affinities for their epitopes. Could this not lead to differences in FRET signal amplitudes that complicate direct comparison?

      Also a good point, we will note this limitation in the revised manuscript.

      (2) Supplementary Figure 2:

      The author mentioned that the inhibitors did not activate the FL auto-inhibited RAF complexes; however, they did inhibit the TR-FRET signal.

      Can the authors comment on the origin of the observed basal activity? Would the authors expect self-release of the RAF kinase protein from the auto-inhibited state in the absence of RAS, leading to dimerization and activation? Alternatively, do the inhibitors at low-concentration relieve the auto-inhibited state, thereby driving dimerization and activation?

      We think that the baseline activity that is being inhibited is due to low concentrations of active dimer in our autoinhibited state preparations.

      Did the author test the addition of RAS protein in their in vitro system to determine whether "soluble" RAS is sufficient to release the protective interactions with RBD/CRD/14-3-3 and lead to inhibitor-induced paradoxical activation of FL RAF?

      We did not, but we’ve thought about it. We expect that soluble RAS would not be activating. We have previously carried our extensive studies of BRAF activation by soluble vs. farnesylated RAS in a membrane environment (liposomes) and observed partial activation in the latter (Park et al, Nature Communications 2023).

      (3) Figure 5B:

      The authors said that the Kd values obtained from their MP assay are consistent with prior studies of RAF homodimerization and RAF:MEK heterodimerization. While this is true from the previous studies of RAF:MEK interaction by BLI (performed from the same team), the Kd of isolated RAF kinase homodimerization has been measured around ~30µM by AUC in the cited ref (24,27 & 37).

      The authors should discuss the discrepancy between their Kd of homodimerization and the reported Kd values in the literature. At the concentration used for MP, it is surprising to observe RAF dimerization while the Kd of homodimerization has been measured at ~30µM (in the absence of MEK).

      We will cite/discuss these differences in our revised manuscript.

      Would the authors expect the presence of MEK to influence the homodimerization affinity for the isolated KD?

      Perhaps, but likely only modestly. We do not think this explains the discrepancy noted above.

      (4) Conclusions:

      Several times in the introduction and the conclusion, the authors suggest that the negative allostery model (where "inhibitor binding to one protomer of the dimer promotes an active but inhibitor-resistant conformation in the other") is a model that applies to all types of RAF inhibitors (I, I1/2, and II).

      However, from my understanding and all the references cited by the authors, this model only applies to type I1/2 inhibitors, where indeed the aC IN conformation in the second (inhibitor-free) protomer of the RAF dimer might be incompatible with the type I1/2 inhibitors inducing aC OUT conformation. The type I and type II inhibitors are aC IN inhibitors and are expected to bind both protomers from RAF dimers with similar affinities. Therefore, the negative allostery model does not apply to the type I and type II inhibitors. The difference in the mechanism of action of inhibitors is even used to explain the difference in the concentration range in which inhibitor-induced activation is observed in cells. The description of the state of the art in this study is confusing and does not help to properly understand their argumentation to revise the established model for paradoxical RAF activation.

      We will work to clarify these complicated issues in the revised manuscript. While the reviewer is correct that the negative allostery model was developed in the context of Type 1.5 inhibitors, there are many examples in the literature of it being used to explain PA by type I and type II inhibitors as well.

      Can the authors clarify their analysis of the state of the art on the different mechanisms of action for the paradoxical activation of RAF by the different types of RAF inhibitors?

      We’ll try!

      5) Conclusions:

      "Our results suggest that negative allostery (or negative cooperativity) is not a requisite feature of paradoxical activation. The type I and type II inhibitors studied here induce RAF dimers and exhibit paradoxical activation but do so without evidence of negative cooperativity, nor do they appear to inhibit intentionally engineered RAF dimers with negative cooperativity (25). Indeed, type II inhibitors exhibit apparent positive cooperativity while type I inhibitors are non-cooperative inhibitors of RAF dimers (25)."

      Can the authors explain how results on the paradoxical activation induced by type I and type II inhibitors inform or challenge a model that specifically applies to type I1/2 inhibitors?

      As noted above, the negative allostery model has also been widely applied irrespective of inhibitor type (rightly or wrongly). Essentially any review or discussion of the topic will explain in one way or another how inhibitor binding to one side of a dimer leaves the opposite side active but resistant to inhibitor. Our model is agnostic with respect to cooperativity of inhibition – essentially we are pointing out a simple circumstance that seems to have been lost in the focus on negative allostery. Paradoxical activation is a result of drug action on RAF monomers, while inhibition is a result of drug action on RAF dimers. Because these are distinct molecular species/complexes, they can be expected to differ in their affinity for RAF inhibitors, irrespective of type. Because binding of ATP in the active site of RAF monomers stabilizes the inactive monomeric state, displacing ATP can promote activation/dimerization. For any inhibitor that is more potent at displacing ATP from a monomer that from an active dimer, we could expect to observe a window of paradoxical activation.

      The authors often refer to their previous study (reference 25), where they tested the inhibition of all three types of inhibitors with engineered RAF dimers. While I agree with the authors that in reference 25 the Type I and type II inhibitors inhibit RAF dimers without exhibiting negative cooperativity (as expected from the literature and the current model), the authors did observe some negative cooperativity for Type I1/2 inhibitors in their study most particularly for the type I1/2 PB (with hill slope ranging from -0.4 to -0.9, indicative of negative cooperativity).

      Correct! Although we do note the caveat that weak inhibition can also give rise to apparent negative cooperativity.

      While the observations that type II inhibitors display positive cooperativity is both novel and very interesting, from what I understand the results from thakick et al 2025 and the current study appear more in line with the current paradigm in the field (which describe paradoxical activation with negative cooperativity for type I1/2 inhibitors and no negative cooperativity for the Type I and II inhibitors) rather than disapproving of the current model and supporting for a new model. 

      In this context, can the authors clarify how their results challenge the current model for paradoxical activation?

      While the difference in binding modes and structural effects of type I.5 vs type I and type II inhibitors are well known in the field, we do not know of any work that suggests paradoxical activation arises from anything other than negative allostery. As one example to the contrary, Rasmussen et al. observe allosteric coupling asymmetry in binding of type II inhibitors to BRAF and attribute the observed paradoxical activation to “induction of dimers with one inhibited and one catalytically active subunit” (Rasmussen et al., Elife 2024). They also studied type I inhibitors in this work, but did not observe paradoxical activation.

      (6) Conclusions:

      The authors describe the JAB34 experiment from Poulikakos et al. 2010 to conclude that "While this experiment cleanly demonstrates inhibitor-induced transactivation of RAF dimers, it is important to recognize that the differential inhibitor sensitivity of the two subunits in this experiment is artificial - it is engineered rather than induced by inhibitor binding as the negative allostery model proposes."

      Indeed, the JAB34 experiment demonstrated the inhibitor-induced transactivation, but the Poulikakos et al. 2010 study does not discuss differential inhibitor sensitivity. The negative allostery model was proposed later by poulikakos team in other papers (Yao et al 2015 and Karoulia et al, 2016), in which JAB34 was not used.

      Can the authors clarify how the JAB34 experiments question differential inhibitor sensitivity?

      Good point, we neglected to discuss the Yao and Karoulia papers and will do so in our revised manuscript.

      (7) Conclusions:

      "Considering that the conformation required for binding of type I.5 inhibitors destabilizes RAF dimers, it is unclear how an inhibitor binding to one protomer would be able to transmit an allosteric change to the opposite protomer, if that inhibitor's binding causes the existing dimer to dissociate."

      The authors should comment on whether 14-3-3 proteins might overcome negative regulation by type I1/2 inhibitors, similar to what has been shown for ATP, which acts as a dimer breaker like type I1/2 inhibitors.

      Certainly we expect that they will, and we will discuss this in our revised manuscript.

      (8) Conclusions:

      "Furthermore, the complex effects of type I.5 inhibitors on dimer stability and the clear resistance of active RAF dimers to these inhibitors complicates interpretation of inhibition data - weak or incomplete inhibition of an enzyme can be difficult to discern from true negative cooperativity (43). As we discuss below, the clear resistance of RAF dimers to type I.5 inhibitors is alone sufficient to explain their ineffective inhibition during paradoxical activation, without invoking negative allostery." 

      The authors should explain how they reconcile this statement and their proposal of a new model that does not rely on negative allostery with their previous findings showing negative cooperativity for RAF dimer inhibition with type I1/2 inhibitors.

      As discussed above and in responses to other Reviewers, we do not exclude negative cooperativity for Type I.5 inhibitors. That said, we are skeptical, even in light of our own findings of apparent negative cooperativity by type 1.5 compounds, due in part to the caveats the reviewer highlights above.

      (9) Conclusions:

      Here, the authors propose a new universal model to explain paradoxical activation of RAF by all types of RAF inhibitors:

      " Our findings here, in light of structural studies of RAF complexes and prior cellular investigations of paradoxical activation, lead us to a model for paradoxical activation that does not rely on negative allostery and is consistent with activation by diverse inhibitor classes. In this model, the open monomer complex is the target of inhibitor-induced paradoxical activation (Figure 6). Binding of ATP to the RAF active site stabilizes the inactive conformation of the open monomer, which disfavors dimerization. Displacement of ATP by an ATP-competitive inhibitor, irrespective of class, alters the relative N- and C-lobe orientations of the kinase to promote dimerization (30, 35). Once dimerized, inhibitor dissociation from one or both sides of the dimer would allow phosphorylation and activation of MEK."

      From my understanding, the novelty of this new model is twofold: a) the open monomer is the target of the inhibitor-induced paradoxical activation and b) once dimerized, inhibitor dissociation from one or both sides of the dimer would allow phosphorylation and activation of MEK.

      Novelty a) implies, as the authors stated, that "Inhibitor-induced activation and inhibition act on distinct species - activation on the open monomer and inhibition on the 14-3-3-stabilized dimer". The authors should explain what they mean by "activation of the open monomer", while only RAF dimers are catalytically active (except for BRAF V600E mutant)?

      We will clarify – by activation we mean promoting conversion of the open monomer to a dimer.

      For novelty b), the authors should explain more clearly what experimental results support this new model.

      We will more explicitly detail how our results here as well as prior work in the field support this model.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Interestingly, the observed rearrangements induced by Zn<sup>2+</sup> were not limited to the protein region proximal to the extracellular binding site but extended to the intracellular side of the channel. This finding agrees with previous studies showing that some extracellular H<sub>v</sub>1 inhibitors, such as Zn<sup>2+</sup> or AGAP/W38F, can cause long-range structural changes propagating to the intracellular vestibule of the channel (De La Rosa et al. J. Gen. Physiol. 2018, and Tang et al. Brit J. Pharm 2020). The authors should consider adding these references.

      We added the suggested references to the Results section.

      Since one of the main goals of this work was to validate Acd incorporation and the spectral FRET analysis approach to detect conformational changes in hHv1 in preparation for future studies, the authors should consider removing one subunit from their dimer model, recalculating FRET efficiencies for the monomer, and comparing the predicted values to the experimental FRET data. This comparison could support the idea that the reported FRET measurements can inform not only on intrasubunit structural features but also on subunit organization.

      We calculated the predicted intrasubunit FRET efficiency and presented the results in the new Figure S10. Pearson’s coefficient decreased from 0.48 for the dimer to 0.18 for the monomer, suggesting the experimental FRET contains information about subunit organization. This was added to the text.

      Reviewer #2 (Public review):

      (1) Tryptophan and tyrosine exhibit similar quantum yields, but their extinction coefficients differ substantially. Is this difference accounted for in your FRET analysis? Please clarify whether this would result in a stronger weighting of tryptophan compared to tyrosine.

      We accounted for differences in the extinction coefficients of Trp and Tyr in our calculations, which are detailed in the Supplementary Text. The assumptions result in a stronger contribution from Trp than from Tyr.

      (2) Is the fluorescence of acridon-2-ylalanine (Acd) pH-dependent? If so, could local pH variations within the channel environment influence the probe's photophysical properties and affect the measurements?

      The acridone fluorescence, which is the fluorophore in Acd, is not pH-dependent between pH 2 and 9 (Stephen G.S. and Sturgeon R.J. Analytica Chimica Acta. 1977). This was added to the text.

      (3) Several constructs (e.g., K125Tag, Y134Tag, I217Tag, and Q233Tag) display two bands on SDS-PAGE rather than a single band. Could this indicate incomplete translation or premature termination at the introduced tag site? Please clarify.

      Yes, the additional bands in the WB are due to the termination of translation for the mentioned protein constructs. We added a note in the legend of Figure 2 regarding this point.

      (4) In Figure 5F, the comparison between predicted FRET values and experimentally determined ratio values appears largely uninformative. The discussion on page 9 suggests either an inaccurate structural model or insufficient quantification of protein dynamics. If the underlying cause cannot be distinguished, how do the authors propose to improve the structural model of hHv1 or better describe its conformational dynamics?

      We understand the confusion about this point. We are not planning to improve the structural model with FRET between Trp/Tyr and Acd. We modified the text to avoid confusion regarding this point. We plan to use Acd as a transition metal ion FRET (tmFRET) donor to study the conformational dynamics of hH<sub>v</sub>1 in the future (Discussion). 

      (5) Cu<sup>2+</sup>, Ru<sup>2+</sup>, and Ni<sup>2+</sup> are presented as suitable FRET acceptors for Acd. Would Zn<sup>2+</sup> also be expected to function as an acceptor in this context? If so, could structural information be derived from zinc binding independently of Trp/Tyr?

      Transition metal ion FRET (tmFRET) uses a fluorophore as the donor and a transition metal ion chelator as the acceptor. For FRET to occur between these donor-acceptor pairs, the fluorescence spectrum of the donor must overlap the absorption spectrum of the metal ion (Zagotta et al., eLife. 2021; Zagotta et al., Biophys J. 2024; Gordon et al., Biophys J. 2024). Zn<sup>2+</sup> does not absorb visible light, so tmFRET cannot occur for this divalent metal.

      (6) The investigated structure is most likely dimeric. Previous studies report that zinc stabilizes interactions between hHv1 monomers more strongly than in the native dimeric state. Could this provide an explanation for the observed zinc-dependent effects? Additionally, do the detergent micelles used in this study predominantly contain monomers or dimers?

      Our full-length hH<sub>v</sub>1 in Anz3-12 detergent micelles is predominantly a dimer, as demonstrated in the new panel of Figure S5. From our data, we cannot compare the effects of zinc between monomers and dimers.

      (7) hHv1 normally inserts into a phospholipid bilayer, as used in the reconstitution experiments. In contrast, detergent micelles may form monolayers rather than bilayers. Could the authors clarify the nature of the micelles used and discuss whether the protein is expected to adopt the same fold in a monolayer environment as in a bilayer?

      We used Anzergent 3-12 detergent micelles, which stabilize hH<sub>v</sub>1 in solution. We indicated this in the Results and Materials and Methods sections. We are also intrigued by whether protein folding and conformational dynamics differ between detergent micelles and proteoliposomes, but our data do not provide an answer to this question. We found that the proteoliposomes used for measuring the hH<sub>v</sub>1 function don’t have enough Acd signals to record their spectra, preventing us from performing the same FRET measurements between Trp/Tyr and Acd in liposomes. Still, detergent-solubilized hH<sub>v</sub>1 is functional upon reconstitution, demonstrating that its functional folding is not irreversibly altered in micelles.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) On page 9, the reference to Figure S11 should be corrected to Figure S10.

      We thank the reviewer for catching this mistake. It was corrected in the updated version.

      (2) On page 9, multiple prior studies describing zinc binding to hHv1 should be acknowledged, for example:

      Musset et al. (2010), J. Physiol., 588, 1435-1449;

      Jardin et al. (2020), Biophys. J., 118, 1221-1233.

      References were added to the text.

      (3) On page 11, the statement "with Acd incorporated ... we can interrogate its gating mechanism in unprecedented detail" appears overly strong relative to the data presented. Another phrasing might be appropriate.

      The sentence was changed. It now reads: “With Acd incorporated at multiple sites in full-length hH<sub>v</sub>1, it will be possible to interrogate conformational changes across the protein’s different structural domains using Acd as a tmFRET donor to understand its molecular mechanisms.”

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      While the authors have proved their hypothesis by temporally increasing the activity of cholinergic neurons at different life stages through the auxin-inducible degron system, their work raises two major concerns. First, they might want to discuss the conflicting data from Zullo et al (Nature 2019, vol 574, pp 359-364). For example, the authors show that increasing the activity of acr-2-expressing neurons after the 7th day of adulthood increases lifespan. However, Zullo et al (2019) show that the reciprocal experiment, inhibiting cholinergic neuron activity on the 1st day or the 8th day of adulthood, also increases lifespan. Is this because the two studies are using different promoters, that of the acr-2 ACh receptor (this work) versus that of the unc-17 vesicular ACh transporter (Zullo et al., 2019)? The two genes are expressed in different subsets of cells that do not completely overlap. CeNGEN shows that acr-2 is expressed in motor and non-motor neurons, but some of these neurons are also different from those that express unc-17. Is it possible that different cholinergic neurons also have opposite lifespan effects during adulthood? Or is it because both lack of signaling and hypersignaling can lead to a long-life phenotype? Leinwand et al (eLife 2015, vol 4, e10181) previously suggested that disturbing the balance in neurotransmission alone can extend lifespan. A simple discussion of these possibilities in the Discussion section is likely sufficient. Or can the auxin treatment and removal be confounding factors? Loose and Ghazi (Biol Open 2021, vol 10, bio058703) show that auxin IAA alone can affect lifespan and that this effect can depend on the time the animal is exposed to the auxin.

      We thank the reviewer for the thoughtful comments and valuable suggestions. In response, we have expanded the Discussion section to address the points raised, as detailed below.

      We fully agree with the reviewer that the different results between our study (activating acr-2-expressing neurons) and Zullo et al. (inhibiting unc-17- expressing neurons) are most likely due to the distinct cholinergic neurons targeted. Our new preliminary data further support this neuron-specific model, as inhibition of acetylcholine synthesis at mid-late life stages produces opposing lifespan effects in different cholinergic neurons. At the same time, we cannot rule out the alternative possibility raised by the reviewer (eLife, 2015) that both activation and inhibition of neuronal activity may extend lifespan by similarly disrupting the balance of neurotransmission. This hypothesis requires further experimental validation in the context of cholinergic motor neurons. Regarding the potential technical concern related to auxin exposure (Biol Open, 2021), our control experiments using 0.5 mM auxin did not show non-specific lifespan effects.

      Accordingly, in the revised manuscript, we have discussed the first two possibilities in the Discussion by stating (page 17-18): “Nevertheless, it is still unclear whether other neuronal populations share similar temporal regulatory mechanisms. A previous study reported that inhibiting cholinergic neurons activity (using unc-17 promoter) extends lifespan regardless of timing[2], which is different from the temporal lifespan regulation we observed in cholinergic motor neurons (using acr-2 promoter). This discrepancy is likely due to differences in subsets of neurons, as the unc-17 promoter labels a broad repertoire of cholinergic neurons, while the acr-2 promoter mainly marks cholinergic motor neurons[53]. Thus, the distinct lifespan-modulating effects of cholinergic motor neurons may be overshadowed by opposing contributions from other cholinergic subtypes when a mixed population is manipulated. Alternatively, both activation and inhibition of cholinergic activity may perturb neurotransmission balance, leading to similar effects on lifespan[54]. It will be interesting to test these hypotheses in future studies.”

      Second, the daf-16-dependence of the early longevity-inhibiting effect of ACh signaling needs clarification and further experimentation. The authors present a model in Figure 6D, where DAF-16 inhibits longevity. This contradicts published literature. Libina et al (Cell 2003, vol 115, pp 489-502) have shown that intestinal DAF-16 increases lifespan. From the authors' data, it is possible that ACh signaling inhibits DAF-16, not promotes it as they have drawn in Figure 6D.

      We thank the reviewer for this important point. We agree that intestinal DAF-16 promotes longevity. Our original model Figure 6D aimed to show that the larval pathway shortens lifespan by inhibiting DAF-16, not that DAF-16 itself shortens lifespan. The arrowhead style used in the original Fiugure 6D might have given an impression that DAF-16 shortens lifespan. Our apologies. We have now fixed this error in Figure 6D. In addition, as suggested, we have performed additional daf-16 experiments (see below).

      In Figure 3F, the authors used Pacr-2::TeTx, which inhibits cholinergic neuron activity, to show an increase in the expression of DAF-16 targets. Why did the authors not use the worms that express the transgene Pacr-2::syntaxin(T254I), which increases cholinergic neuron activity? What happens to the expression of DAF-16 targets in these animals? Do their expression go down? What happens if intestinal daf-16 is knocked down in animals with increased cholinergic neuron activity, instead of reduced cholinergic neuron activity?”

      Thanks for these insightful questions. In Figure 3F-H, we used TeTx instead of syntaxin(T254I) to investigate the function of DAF-16 in the early stage pathway based on the two main reasons. First, Pacr-2::TeTx transgene extends lifespan in early life by inhibiting cholinergic activity, which provides a genetic background complementary to that of syntaxin(T254I) for characterizing the role of DAF-16. Second, TeTx pathway is expected to activate DAF-16 and upregulate its target genes. This approach is more sensitive than measuring gene downregulation in Pacr-2::syntaxin(T254I) transgenic worms.

      We fully agree with the reviewer that performing the corresponding experiments in the syntaxin(T254I) background would strengthen the overall evidence. As suggested, we have now examined the expression of DAF-16 target genes in Pacr-2::syntaxin(T254I) transgenic worms, and performed intestine-specific RNAi of daf-16 in the same background. We found that these worms exhibit downregulation of DAF-16 target genes. Furthermore, intestinal daf-16 knockdown did not further shorten the already reduced lifespan of these transgenic worms. Together, these results from both the TeTx and syntaxin(T254I) lines confirms that cholinergic motor neurons require DAF-16 in the intestine to regulate lifespan. These new data has now been described in Figure S5A-5D (page 11-12): “As expected, the expression level of sod-3 and mtl-1, two commonly characterized DAF-16 target genes, was upregulated in transgenic worms deficient in releasing ACh from cholinergic motor neurons (Figure 3F), and downregulated in transgenic worms with enhanced ACh release from cholinergic motor neurons (Figure S5A), consistent with the notion that DAF-16 acts downstream of cholinergic motor neurons.”, and “RNAi of daf-16 in the intestine abolished the ability of cholinergic motor neurons to regulate lifespan at early life stage (Figure 3G, 3H and Figure S5C-S5E).”

      Recommendations for The Authors:

      Reviewer #1 (Recommendations for The Authors):

      (1) “The Methods section needs to be clarified/expanded.”

      (a) “For example, are the authors using indole-3-acetic acid or a synthetic auxin? How long does it take for syntaxin to be made after the removal of the auxin?”

      We have now included auxin information and recovery time in the Method for auxin treatment by stating (page 24): “natural auxin indole-3-acetic acid (G&K Scientific)”, and “Expression of syntaxin(T254I) can be suppressed by auxin treatment and restored in 24 hours following auxin removal.”

      (b) “How much FUDR was used in some of the lifespan assays?”

      2 μg/mL FUDR was used in some of the lifespan assays. We have now included the concentration in the Method for lifespan assay by stating (page 23 line 526): “2 μg/mL 5-Fluoro-2’-deoxyuridine (FUDR) was included in assays involving TeTx transgene worms, unc-31 and unc-17 mutant worms, which show a defect in egg laying.”

      (c) “In line 494 of the Methods section, worms were anesthetized with 50 mM sodium azide. That concentration seems a bit high.”

      It is an error indeed. We used 5 mM NaN3. This has now been fixed in the text and in line 548.

      (d) “What are the concentrations of the transgenes used in the extrachromosomal arrays?”

      We have now included the concentrations in the Method for strains and genetics by stating (line 507-509 on page 22): “Microinjections were performed using standard protocols. Each plasmid DNA listed above in the transgenic line was injected at a concentration of 50 ng/μL. Each marker for RNAi was co-injected at a concentration of 25 ng/μL.”

      (2) “Gene expression can vary in different parts of the worm intestine. Do the measurements in Figure 6C represent the entire intestine or only certain parts of the intestine?”

      We have now included the intestine area used for quantification in the Method for microscopy by stating (page 24): “and the entire intestine area was selected by ImageJ”, and in the legends of Figure 6C by stating (page 36): “The entire intestinal area was selected for measurement.”

      (3) “In Figure S1C, does tph-1 have a slight effect? Might serotonin partly counteract the effects of ACh?”

      We thank the reviewer for raising this interesting point regarding the potential role of serotonin. We have re-examined our data in Figure S2C (the original Figure S1C) and agree that loss of tph-1 partly counteracted the lifespan-shortening effect of Pacr-2::syntaxin(T254I) transgene in early life stage, thought the whole-life suppression effect is slight. To assess whether the acr-2 promoter-driven manipulation might directly affect serotonergic neurons, we checked the CeNGen. We found that the transcript expression of acr-2 can be detected in serotonergic neurons (ADF, HSN, and NSM), but the levels are extremely low. In this regard, it is unlikely that the Pacr-2::syntaxin(T254I) transgene exerts its primary effect by substantially altering serotonin release. While a potential indirect interaction between cholinergic and serotonergic signaling in lifespan regulation remains, it falls beyond the primary focus of the current study. We would like to follow up this in future studies. We have now pointed this out in the text by stating (page 9):“As a control, we also tested mutants deficient in other types of small neurotransmitters, including glutamate (eat-4), GABA (unc-25), serotonin (tph-1), dopamine (cat-2), tyramine (tdc-1), and octopamine (tbh-1), but detected no effect, with the exception of tph-1, which showed a modest, partial suppression of the phenotype (Figure S2A-S2F). This observation suggests that the lifespan effects of cholinergic signaling can be modulated by serotonin.”

      (4) “Where else is GAR-2 expressed? Might there be redundancies between neuronal and intestinal GAR-2?”

      We appreciate this insightful question. Based on available single-cell gene expression atlases of C. elegans at both embryonic and adult stages[1,2], gar-2 expression has been detected not only in neurons and the intestine, but also in additional tissues such as the muscle. Regarding the observed lack of effects upon neuronal or intestinal gar-2 RNAi on the ability of cholinergic motor neurons to extend lifespan in mid-late life, and also suggested by another reviewer, we performed muscle-specific RNAi experiments. Together with our previously presented data, the results show that intestinal (but not neuronal or muscle) RNAi of gar-3 abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stages, while muscle-specific (but not neuronal or intestinal) RNAi of gar-2 suppresses this effect. This finding indicates that GAR-3 and GAR-2 mediate cholinergic signaling in distinct peripheral tissues, with GAR-3 primarily in the intestine and GAR-2 primarily in muscle, to produce their effects on longevity. Given our focus on neuron-gut signaling, the role of GAR-2 in the muscle will be further investigated in future studies. The new data have now been described in Figure S8 by stating (page 13-14): “RNAi of gar-2 in the intestine (Figure 4D and 4E), but not in neurons or the muscle (Figure 4D-4F, and Figure S8A, S8D-S8E), abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stage. Thus, GAR-3 may function in the intestine to regulate lifespan. Surprisingly, RNAi of gar-2 in the muscle (Figure S8A-S8C), but not in neurons or the intestine (Figure S7F-S7H) had an effect on the ability of cholinergic motor neurons to extend lifespan in mid-late life, indicating that GAR-2 acts in the muscle to regulate lifespan.”

      (1) Packer, J. S. et al. A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution. Science 365, doi:10.1126/science.aax1971 (2019).

      (2) Roux, A. E. et al. Individual cell types in C. elegans age differently and activate distinct cell-protective responses. Cell Rep 42, 112902, doi:10.1016/j.celrep.2023.112902 (2023).

      (3) Chun, L. et al. Metabotropic GABA signalling modulates longevity in C. elegans. Nat Commun 6, 8828, doi:10.1038/ncomms9828 (2015).

      (4) Izquierdo, P. G. et al. Cholinergic signaling at the body wall neuromuscular junction distally inhibits feeding behavior in Caenorhabditis elegans. J Biol Chem 298, 101466, doi:10.1016/j.jbc.2021.101466 (2022).

      (5) “In line 344, please correct "fwork" to "work".”

      This has now been fixed.

      (6) “In line 360, please correct "acts" to "act".”

      This has now been fixed.

      (7) “Please check citations within the main text. Some of the citations do not fit the cited material. For example, in line 112, reference 28 is not about GABAergic neurons.”

      We thank the reviewer for pointing out these important details. We have now carefully checked and corrected the citations throughout the manuscript as suggested.

      Reviewer #2 (Recommendations for The Authors):

      (1) “How are the authors assessing the efficacy of the TeTx manipulations in their strains? Likely TeTx has a concentration-dependent effect. Are there any phenotypes associated with the loss of cholinergic signaling? Also, does TeTx expression in cholinergic neurons alter the neuronal activity of other associated neurons, or alter muscle integrity?”

      Thanks for the question. Our observations show that overexpression of TeTx results in defects including small size, slow growth, egg-laying deficiencies, and severe locomotion impairment, which are all associated with the loss of cholinergic signaling. While we did not directly examine the activity of interconnected neurons in our strains, we tested the muscle integrity by recording muscle reaction to 1 mM levamisole and found that overexpression of TeTx does not affect muscle integrity. To circumvent these pleiotropic complications, we instead employed Syntaxin(T254I) transgenic worms, which exhibits only slight locomotion defects, to further characterize the temporal effect of cholinergic motor neurons on lifespan. This data has now been described in Figure S1A by stating (page 6): “Overexpression of TeTx induces characteristic phenotypes of cholinergic deficiency, such as developmental delay and severe locomotion impairment[32], yet does not compromise muscle function (Figure S1A).”

      (2) “The authors are expressing TeTx throughout the lifespan of the animal, including during development. How does this contribute to the organismal phenotype?”

      As described above, chronic TeTx expression from egg stage results in developmental delay, which is similar to the development phenotype of unc-17 mutant worms defective in acetylcholine transmission. However, unc-17 mutation has no effect on lifespan[3], which is different from TeTx overexpression, indicating that the developmental delay caused by TeTx overexpression may not affect the lifespan phenotype.

      (3) Chun, L. et al. Metabotropic GABA signalling modulates longevity in C. elegans. Nat Commun 6, 8828, doi:10.1038/ncomms9828 (2015).

      (3) “A previous study has shown that increasing cholinergic activity by altering ACR-2 expression can cause neurodegeneration (DOI: https://doi.org/10.1523/JNEUROSCI.1515-10.2010). Does overexpressing syntaxin, or AID-mediated degradation of syntaxin cause motor neuron degeneration, which could also contribute to the lifespan phenotype?”

      We thank the reviewer for raising this important point regarding potential motor neuron degeneration. In response, we performed confocal microscopy to assess the motor neurons. We found that worms expressing the transgene Pacr-2::syntaxin::mCherry do not exhibit a defect in the number or morphology of labeled neuronal cell bodies compared to control worms expressing Pacr-2::mCherry. This observation indicates that chronic, increased cholinergic activity through syntaxin overexpression, under our experimental conditions, does not induce motor neuron degeneration. This data has now been described in Figure S1B by stating (page 7): “This transgene simply shortened lifespan without causing a pleotropic effect (Figure 1B), and critically, without inducing motor neuron degeneration (Figure S1B).”

      (4) “Figures 1I-1L: The authors do not show how long it takes for the expression of syntaxin to be restored following the removal of auxin from plates. This would be important to assess the age-dependent effects of neuronal signaling.”

      We thank the reviewer for pointing this out. In general, complete restoration of syntaxin expression occurred within 24 hours after auxin withdrawal. We have now pointed this out in the text by stating (the last sentence on page 24):“Expression of syntaxin(T254I) can be suppressed by auxin treatment and restored in 24 hours following auxin removal.”

      (5) “In Figures S1A-E: Although the mutant backgrounds decrease the lifespan of animals expressing the Pacr2::syntaxin(T254I) transgene, the lifespan of these transgenic animals appears to be extended compared to what was shown in Figure 1B. Is this the case? (can these experiments be repeated alongside wild-type N2s to assess if their lifespan is indeed extended compared to the N2?). Also, if so, could it be that the lifespan effects are modified to different extents by other small neurotransmitters?”

      We thank the reviewer for pointing this out. All the experiments presented in current Figure S2 (original Figure S1) were performed with wild-type N2 controls, which are now included in the updated Figure S2. This data shows that, in the Pacr-2::syntaxin(T254I) transgenic background, loss of unc-25 (GABA) or tph-1 (serotonin) leads to a further extension of lifespan, while loss of other genes had no effect. Importantly, while unc-25 mutation also extends lifespan in wild-type worms, tph-1 mutation does not. This observation indicates that the lifespan effects of cholinergic signaling can be modulated by serotonin. We have now pointed this out in the text by stating (page 9):“As a control, we also tested mutants deficient in other types of small neurotransmitters, including glutamate (eat-4),, GABA (unc-25), serotonin (tph-1), dopamine ,(cat-2), tyramine (tdc-1), and octopamine (tbh-1), but detected no effect, with the exception of tph-1, which showed a modest, partial suppression of the phenotype (Figure S2A-S2F). This observation suggests that the lifespan effects of cholinergic signaling can be modulated by serotonin.”

      (6) “RNAi of several of the receptors appear to modulate wild-type lifespan. Although I understand that this is not the main focus of the manuscript, the fact that this occurs should be mentioned in the results and discussed later on.”

      We thank the reviewer for pointing this out. As suggested by the reviewer, we have now pointed this out in the text by stating (page 9):“Notably, RNAi of several ACh receptors such as acr-11 appears to shorten wild-type lifespan, whereas RNAi of several other ACh receptors such as acr-9 extends wild-type lifespan, suggesting lifespan-modulating potential of ACh receptors (Figure S3).”

      (7) “Cholinergic signaling and ACR-6 have been previously shown to regulate pharyngeal pumping/feeding behavior. (https://doi.org/10.1016/j.jbc.2021.10146”). Could the requirements for ACR-6/cholinergic signaling in longevity be related to caloric restriction/nutritional intake which in turn could be expected to alter DAF-16 and HSF-1 activity? These previous studies should be referenced and discussed.”

      Thanks for the suggestion. As suggested by the reviewer, we have examined the pumping rate of acr-6 mutant worms. Our results showed that acr-6 mutation slightly reduced the pumping rate. As the decrease is relatively minor, we do not expect a major DR effect, though we cannot completely rule out such a possibility. Furthermore, as acr-6 acts in the pharynx to regulate pumping but in the intestine to regulate the role of cholinergic signaling in lifespan, we do not expect this would have a major contribution to our pathway. This new data has now been described in Figure S4I. As suggested by the reviewer, we have now pointed this out in the text by stating (page 10): Previous data has shown that cholinergic signaling and ACR-6 may control pharyngeal pumping[42]. As expected, we found that acr-6 mutation slightly reduced pumping rates (Figure S4G).”

      (8) “The expectation for the studies in Figure 3/DAF-16, is that animals expressing Ex[Pacr-2::syntaxin(T254I)], should have downregulated DAF-16 in the intestine. This needs to be shown through some method (increased daf-16 activation upon loss of cholinergic signaling does not necessarily imply that the converse is also true).”

      We thank the reviewer for the insightful suggestion. The reviewer has suggested us performing additional measurements to confirm that DAF-16 is the downstream transcription factor in the intestine. Specifically, the reviewer suggested testing if syntaxin(T254I) transgene signaling could inhibit DAF-16 activity. We have now followed the reviewer’s suggestion by performing two different assays. First, as also suggested by the first reviewer, we detected the expression of DAF-16 target genes in Pacr-2::syntaxin(T254I) transgenic worms, which exhibited downregulation of these genes, consistent with the notion that increasing cholinergic motor neuron activity inhibits DAF-16. This data has now been described in Figure S5A. Second, we performed an assay to detect DAF-16 subcellular localization pattern in the intestine. We found that acr-6 RNAi notably promotes nuclear translocation of DAF-16, suggesting that ACR-16 inhibits DAF-16, which is consistent with our model. This new data has now been described in Figure S5E. As suggested by the reviewers, we have now pointed this out in the text by stating (page 11): “As expected, the expression level of sod-3 and mtl-1, two commonly characterized DAF-16 target genes, was upregulated in transgenic worms deficient in releasing ACh from cholinergic motor neurons (Figure 3F), and downregulated in transgenic worms with enhanced ACh release from cholinergic motor neurons (Figure S5A), consistent with the notion that DAF-16 acts downstream of cholinergic motor neurons. To obtain further evidence, we assessed the subcellular localization pattern of DAF-16::GFP fusion and found that acr-6 RNAi notably promoted nuclear translocation of DAF-16, confirming that ACh signaling inhibits DAF-16 activity (Figure S5B).”

      (9) “Similarly, it would be good to have additional lines of evidence that signaling through GAR-3 impinges on HSF1, and that the lifespan effects are not due to non-specific effects of hsf-1 knockdown, which could lead to several un-related deficiencies and compromise lifespan (Figure 5b).”

      We thank the reviewer for the valuable suggestions. The reviewer correctly noted that the observed lifespan effect from hsf-1 RNAi could involve non-specific deficiencies. In response, we performed an assay to detect HSF-1 subcellular localization in the intestine upon gar-3 overexpression by using the strain EQ87 (iqIs28[pAH71(hsf-1p::hsf-1::gfp) + pRF4(rol-6)]). We found that the induced nuclear translocation of HSF-1 was weak. This result suggests that GAR-3 may modulate HSF-1 activity through a mechanism distinct from, or more subtle than, robust nuclear accumulation, or that its effect is highly dependent on the expression level and timing.

      (10) “Figure 6: An N2 control should be provided to assess the specificity of the mCherry signal from the intestine (given autofluorescence in the animals' gut).”

      Thanks for the suggestion. As suggested by the reviewer, we have now included the control in Figure S10.

      Reviewer #3 (Recommendations for The Authors):

      (1) “While the model is consistent with the data, there are alternatives that were not addressed. Additionally, there are some deficiencies in the interpretation of results that should be addressed, in my opinion. Possibly most importantly given the claims, the authors should address an alternative model: that it is the level of acetylcholine signaling that matters. Is it possible that the level auxin-inducible degradation of syntaxin(T254I) in acr-2 expressing cells is age dependent, such that one level increases lifespan and the other shortens it, and that the timing doesn't matter at all? A chronic dose response to auxin concentration would address if the level of syntaxin is a non-monotonic determinant of lifespan.”

      We sincerely thank the reviewer for raising this important alternative model. The reviewer suggested that the apparent temporal effect we observed might instead be explained by an age-dependent change in the efficiency of AID system in degrading syntaxin(T254I) in acr-2 expressing cells. That is, different levels of acetylcholine signaling, rather than timing, produce opposite lifespan outcomes. We agree that this is a formal possibility that our current data cannot fully rule out. On the other hand, other data in the manuscript suggests otherwise. For example, the expression of ACR-6 and GAR-3 in the intestine exhibited a temporal switch in early and mid-late life, providing support for a time-dependent mechanism. In addition, the differential requirement of the downstream transcription factors DAF-16 and HSF-1 in the early and mid-late life, respectively, provides further evidence supporting a temporal mechanism. Thus, while we agree that the possibility raised by the reviewer cannot be formally ruled out, the temporal mechanism we proposed may play an important role.

      The reviewer suggested performing a chronic dose-response experiment with varying auxin concentrations. Actually when we first employed the AID system to temporally manipulate motor neuron output at different life stages, we tested potential effects of auxin concentration. Using the soma-expressed TIR1 system, we found that, restoring syntaxin(T254I) activity from day 10 of adulthood extends lifespan, regardless of whether the prior suppression was maintained with 0.1 mM or 0.5 mM auxin. This suggests that the pro-longevity effect is likely not triggered by differences in the efficacy of prior suppression within this concentration range. We acknowledge that the tested dose range may not cover potential threshold concentrations. Furthermore, we cannot exclude the possibility of a non-linear relationship between auxin concentration and degradation efficiency. We agree that a comprehensive chronic dose-response analysis remains a valuable future direction, and we plan to employ more precise tools in the future to investigate the interplay between signal level and temporal context in lifespan regulation. The auxin concentration data have now been described in Figure S1C-1D by stating (page 7): “Comparable outcomes were obtained with both 0.1 mM and 0.5 mM auxin treatments (Figure S1C-1D).” As suggested by the reviewer, we have discussed the alternative model in the Discussion by stating (page 19): “An alternative mechanism based on differential levels of cholinergic signaling could also contribute to the observed lifespan effects.”

      (2) “Several times, including in several section headings, it is claimed that daf-16 (eg line 205-206) and acr-6 (eg line 185-186) function "early in life". This was not tested, so the claim is not warranted. For instance, these genes could act later in life to respond to signals made or sent early in life, or they could act both early and late, or only early (as they claim).”

      We thank the reviewer for this precise and important clarification. The reviewer is correct that our genetic interventions do not by themselves define the temporal window.

      Our experimental rationale was based on the observation that the lifespan-shortening effect of Pacr-2::syntaxin(T254I) expression is similar whether it is induced throughout life or specifically during larval stages (early life), indicating the detrimental effect results from enhanced motor neuron output in early life. Therefore, we used the lifelong expression paradigm as a tool to genetically dissect the downstream pathway triggered by early-life neuronal activation. We acknowledge the reviewer's point that this design does not formally prove that daf-16 or acr-6 acts only in early life; they could be required continuously or again later. However, we would like to note that our expression data show that the gut expression of ACR-6 is restricted to early life, which is consistent with a primary early-life function in this context.

      To reflect this more accurate interpretation, we have revised all relevant statements, including section headings. We now consistently state that daf-16 is required for the lifespan-shortening effect of cholinergic motor neuron, rather than claiming it functions "in early life". We have also toned down the discussion regarding their temporal function by stating (page 12): “Because this lifespan-shortening effect results from enhanced motor neuron output in early life and overwrites its beneficial effect at later stages, we propose this signaling circuit mediates the lifespan-shortening effect in early life.”

      (3) “In line 118, they note that such intervention led to a complex effect on the lifespan curve "by initially promoting worm's survival followed by inhibiting it at later stages." I think that while findings from later experiments support a time-dependent lifespan effect stemming from syntaxin function in the cholinergic motor neurons, this experiment's TeTx expression in those neurons is not time-dependent. Lifespan is an endpoint measure, so there is no sense in which a non-timed perturbation has an early or late effect on an individual. Rather, the effect on survival they observed is at the population level, their intervention increases the average lifespan while decreasing the worm-to-worm variation in lifespan.”

      We thank the reviewer for the critical and precise comment regarding our interpretation of the survival curves of TeTx transgenic worms. As suggested by the reviewers, we have revised the text by stating (page 6): “Surprisingly, such intervention led to a complex effect on the population survival curve by reducing both early mortality and the proportion of long-lived individuals (Figure 1A). Specifically, the 25% lifespan of these worms was prolonged, while their 75% and maximal lifespan were slightly shortened, leading to a mean lifespan slightly increased or unchanged compared to that of wild-type worms. This suggests that inhibiting cholinergic motor neurons may exert temporally distinct effects on survival, leading to decreased individual variation in lifespan.”

      (4) “The layout of the plots separating the responses of wild type and mutants to different panels makes it often difficult to interpret the results. For instance, do acr-6, gar-3, and other receptor mutants or knockdowns affect lifespan on their own? If they do, it matters to the interpretation whether they live longer or shorter than the wild type: which of the mutants phenocopy the lack of a lifespan-extending signal that activates them? Which phenocopy lacks a lifespan-shortening signal that activates them? Could they phenocopy the effect of an inhibitory signal? And critically, are the effects of these mutants on lifespan consistent with their model?”

      “The paper would be stronger if they determined when ACR-6 and GAR-3 functions are necessary and sufficient. Is it possible that the receptor doesn't matter, just that there be one of the two expressed in the intestine, and that other mechanisms determine the lifespan response to modulation of syntaxin(T254I)? What does time-dependent knockdown of these receptors do to daf-16 and hsf-1 localization and to the transcription of the targets of these transcription factors?”

      We thank the reviewer for these insightful comments. We have addressed the points as follows:

      As suggested, we have reorganized the lifespan data in Figure S4 to directly compare wild type and mutant/RNAi conditions within the same panels. This new presentation clarifies the autonomous effects of these genes. The data shows that loss of acr-6 or gar-2 (via RNAi or mutation) has minimal effect on lifespan. Notably, acr-8 RNAi shortens lifespan, whereas the acr-8 mutation does not, supporting our hypothesis of tissue-specific or compensatory roles for this receptor, as detailed in our following response to point (5). The reviewer's key question regarding when these receptors are necessary and sufficient is central to our model. We agree with the reviewer that complementary loss-of-function experiments with temporal precision, such as time-specific knockdown of the two receptors, would provide even stronger evidence. To this end, we attempted to generate endogenous degron-tagged alleles of acr-6 and gar-3 to apply the AID system for precise, stage-specific degradation. Unfortunately, despite multiple design attempts and screening efforts, we were unable to obtain homozeygous strains with the desired genomic edits using the same gRNA we used to knock in mCherry or other gRNAs. This is rather frustrating. Consequently, we are currently unable to perform the ideal temporally controlled loss-of-function experiments suggested by the reviewer.

      (5) “Why does RNAi but not mutation of acr-8 and gar-2 suppress the lifespan shortening effect of Pacr-2::syntaxin(T254I)?”

      Thanks for this important question regarding the differential effects of feeding RNAi versus mutation of acr-8 and gar-2. The discrepancy likely arises from the potential off-target effects of RNAi. RNAi is not strictly specific as it may target other related genes, generating a non-specific effect, whereas precise mutations in acr-8 and gar-2 alone may not produce the same effect.

      (6) “sid-1(-); Ex[Pacr-2::tetx lives longer than sid-1(-); in daf-16(+) worms in Figure 3G; so it is very hard to interpret the lack of effect of Pacr-2::tetx in daf-16(-) worms, since this transgene behaves differently in sid-1 mutants than in wild type worms. This would be clear if the two plots were combined (appropriately, since it is the same experiment). It looks like daf-16 RNAi has a shortening effect in the sid-1 mutant, but not in in sid-1 mutants expressing Pacr-2::text.”

      Thanks for this helpful suggestion. As suggested by the reviewer, we have now merged Figure 3G and 3H into one figure to present as Figure S5F. This combined presentation clarifies the comparison and shows that intestinal daf-16 RNAi shortens lifespan in both sid-1 mutants and sid-1 mutants expressing Pacr-2::TeTx.

      Reviewer #4 (Recommendations for The Authors):

      (1) “Lines 50-52: I would replace "leading to increased incidents in age-related diseases and probability of death" with "leading to the onset of age-related diseases and increased probability of death". Instead of "such an aging process" I would use "the aging process".”

      This has now been fixed.

      (2) “Figure 2E-F: By rescuing the expression of ACR-6 in neurons or intestinal cells alone, the authors show that the release of ACh from cholinergic neurons has effects on the intestine to shorten lifespan. Is ACR-6 expressed in other tissues (e.g. muscle?) It might be interesting to assess whether ACh also regulates lifespan through activating the ACR-6 receptor in other tissues or specifically targets the intestine. This question is partially answered with the tissue-specific RNAi experiments for DAF-16, but it is possible that ACR-6 also modulates other pathways beyond the tested transcription factors.”

      Analyzing the role of other tissues could also be applied to understand how GAR-3 influences lifespan. Along these lines, it would be interesting to expand the tissue-specific knockdown experiments for GAR-3 to other tissues. More importantly, these experiments can address whether activation of ACR-6 and GAR-3 can also have different effects on lifespan by regulating distinct tissues in addition to the intestine, and not only due to temporal expression patterns. For instance, whereas DAF-16 regulates lifespan primarily through its effects in the intestine, HSF1 could have effects on additional tissues. Although it would interesting to perform these experiments, I understand that the authors main focus is the nervous system-gut axis.

      We thank the reviewer for the insightful suggestions regarding the potential tissue-specific functions of ACR-6 and GAR-3. As noted in our response to point #6, endogenous expression imaging indicates that ACR-6 and GAR-3 are primarily expressed in neurons and the intestine with weak expression of GAR-3 in the muscle, so we tested the muscle. We found that muscle-specific RNAi of gar-2 abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stages, whereas muscle-specific RNAi of gar-3 does not. This result further supports that GAR-3 primarily exerts this effect in the intestine.

      (3) “Can the authors specify in the corresponding figure legend at what age they tested sod-3 and mtl-1 expression in Pacr-2::TeTx worms (Figure 3F)? This is important to support the conclusions of the paper. Along these lines, can the authors also specify at what age they quantified the expression of HSF-1 targets (Figure 5F).”

      Thanks for the suggestion. As recommended, we have now provided the worm age in Figure 3F (day 1 adult) and Figure 5F legends (day 10 adult).

      (4) “To further strengthen the authors' conclusions, it might be interesting to examine the intracellular localization of DAF-16 in the intestine of Pacr-2::TeTx and syntaxin(T254I) worms compared to controls.”

      We thank the reviewer for this valuable suggestion, which was also raised by another reviewer. In response, we examined the subcellular localization of DAF-16 in the intestine. Direct imaging in the Pacr-2::TeTx or Pacr-2::syntaxin(T254I) backgrounds was technically challenging because their fluorescent protein tags (YFP or mCherry) would interfere with the detection of DAF-16::GFP. Therefore, we adopted an alternative approach by modulating the activity of acr-6, the intestinal acetylcholine receptor that transmits cholinergic signals from motor neurons to DAF-16. We found that acr-6 RNAi promotes the nuclear translocation of DAF-16. These new data are presented in Figure S5E by stating (page 11): “To obtain further evidence, we assessed the subcellular localization pattern of DAF-16::GFP fusion and found that acr-6 RNAi notably promotes nuclear translocation of DAF-16, confirming that ACh signaling modulate DAF-16 activity (Figure S5B).”

      (5) “The results with gar-2 RNAi are fascinating. I am very curious (and I assume potential readers too) about what tissues mediate the mid-late life effects of GAR-2 in longevity. Perhaps the authors could add experiments in a couple of other tissues known to regulate organismal lifespan (e.g. muscle). However, I totally understand why the authors focused on GAR-3, especially because both GAR-3 and ACR-6 have effects on the intestine and this is sufficient for the main conclusions of the paper.”

      We sincerely thank the reviewer for the insightful suggestion and for highlighting the potential role of GAR-2. In response, we performed muscle-specific RNAi experiments. Together with our previously presented data, the results show that intestinal (but not neuronal or muscle) RNAi of gar-3 abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stages, while muscle-specific (but not neuronal or intestinal) RNAi of gar-2 suppresses this effect. This finding indicates that GAR-3 and GAR-2 mediate cholinergic signaling in distinct peripheral tissues, with GAR-3 primarily in the intestine and GAR-2 primarily in the muscle, to produce their effects on longevity. Given our focus on neuron-gut signaling, the role of GAR-2 will be investigated in future studies. The new data have now been described in Figure S8 by stating (page 13-14): “RNAi of gar-3 in the intestine (Figure 4D and 4E), but not in neurons or the muscle (Figure 4D-4F, and Figure S8A, S8D-S8E), abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stage. Thus, GAR-3 may function in the intestine to regulate lifespan. Surprisingly, RNAi of gar-2 in the muscle (Figure S8A-S8C), but not in neurons or the intestine (Figure S7F-S7H) had effect on the ability of cholinergic motor neurons to extend lifespan in mid-late life, indicating that GAR-2 acts in the muscle to regulate lifespan.”

      (6) “Figure 6: It seems that the genes are also expressed in the muscle. Can the authors include images of other tissues in supplementary figures?”

      Thanks for the suggestion. As suggested by the reviewer, we have now included images of whole worms expressing mCherry, which was knocked in the endogenous locus off gar-3 or acr-6 by CRISPR in Figure S10. However, we did not detect strong expression of gar-3 or acr-6 in the muscle under the conditions examined, which may be limited by the low endogenous protein expression level of the two genes in the muscle, though the CeNGEN website shows they are expressed in the muscle. Determining the precise spatiotemporal expression profiles of these receptors will likely require more sensitive methods. We plan to address this important question in future studies by using such refined approaches.

    1. Author response:

      General Statements

      We thank all three reviewers for their time taken to provide valuable feedback on our manuscript, and for appreciating the quality and usefulness of our data and results presented in our study. We have improved the manuscript based on their suggestions and provide a detailed, point-by-point response below.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      The authors have a longstanding focus and reputation on single cell sequencing technology development and application. In this current study, the authors developed a novel single-cell multi-omic assay termed "T-ChIC" so that to jointly profile the histone modifications along with the full-length transcriptome from the same single cells, analyzed the dynamic relationship between chromatin state and gene expression during zebrafish development and cell fate determination. In general, the assay works well, the data look convincing and conclusions are beneficial to the community.

      Thank you for your positive feedback.

      There are several single-cell methodologies all claim to co-profile chromatin modifications and gene expression from the same individual cell, such as CoTECH, Paired-tag and others. Although T-ChIC employs pA-Mnase and IVT to obtain these modalities from single cells which are different, could the author provide some direct comparisons among all these technologies to see whether T-ChIC outperforms?

      In a separate technical manuscript describing the application of T-ChIC in mouse cells (Zeller, Blotenburg et al 2024, (Zeller et al., 2024)), we have provided a direct comparison of data quality between T-ChIC and other single-cell methods for chromatin-RNA co-profiling (Please refer to Fig. 1C,D and Fig. S1D, E, of the preprint). We show that compared to other methods, T-ChIC is able to better preserve the expected biological relationship between the histone modifications and gene expression in single cells.

      In current study, T-ChIC profiled H3K27me3 and H3K4me1 modifications, these data look great. How about other histone modifications (eg H3K9me3 and H3K36me3) and transcription factors?

      While we haven’t profiled these other modifications using T-ChIC in Zebrafish, we have previously published high quality data on these histone modifications using the sortChIC method, on which T-ChIC is based (Zeller, Yeung et al 2023)(Zeller et al., 2022). In our comparison, we find that histone modification profiles between T-ChIC and sortChIC are very similar (Fig. S1C in Zeller, Blotenburg et al 2024). Therefore the method is expected to work as well for the other histone marks.

      T-ChIC can detect full length transcription from the same single cells, but in FigS3, the authors still used other published single cell transcriptomics to annotate the cell types, this seems unnecessary?

      We used the published scRNA-seq dataset with a larger number of cells to homogenize our cell type labels with these datasets, but we also cross-referenced our cluster-specific marker genes with ZFIN and homogenized the cell type labels with ZFIN ontology. This way our annotation is in line with previous datasets but not biased by it. Due the relatively smaller size of our data, we didn’t expect to identify unique, rare cell types, but our full-length total RNA assay helps us identify non-coding RNAs such as miRNA previously undetected in scRNA assays, which we have now highlighted in new figure S1c .

      Throughout the manuscript, the authors found some interesting dynamics between chromatin state and gene expression during embryogenesis, independent approaches should be used to validate these findings, such as IHC staining or RNA ISH?

      We appreciate that the ISH staining could be useful to validate the expression pattern of genes identified in this study. But to validate the relationships between the histone marks and gene expression, we need to combine these stainings with functional genomics experiments, such as PRC2-related knockouts. Due to their complexity, such experiments are beyond the scope of this manuscript (see also reply to reviewer #3, comment #4 for details).

      In Fig2 and FigS4, the authors showed H3K27me3 cis spreading during development, this looks really interesting. Is this zebrafish specific? H3K27me3 ChIP-seq or CutTag data from mouse and/or human embryos should be reanalyzed and used to compare. The authors could speculate some possible mechanisms to explain this spreading pattern?

      Thanks for the suggestion. In this revision, we have reanalysed a dataset of mouse ChIP-seq of H3K27me3 during mouse embryonic development by Xiang et al (Nature Genetics 2019) and find similar evidence of spreading of H3K27me3 signal from their pre-marked promoter regions at E5.5 epiblast upon differentiation (new Figure S4i). This observation, combined with the fact that the mechanism of pre-marking of promoters by PRC1-PRC2 interaction seems to be conserved between the two species (see (Hickey et al., 2022), (Mei et al., 2021) & (Chen et al., 2021)), suggests that the dynamics of H3K27me3 pattern establishment is conserved across vertebrates. But we think a high-resolution profiling via a method like T-ChIC would be more useful to demonstrate the dynamics of signal spreading during mouse embryonic development in the future. We have discussed this further in our revised manuscript.

      Reviewer #1 (Significance):

      The authors have a longstanding focus and reputation on single cell sequencing technology development and application. In this current study, the authors developed a novel single-cell multi-omic assay termed "T-ChIC" so that to jointly profile the histone modifications along with the full-length transcriptome from the same single cells, analyzed the dynamic relationship between chromatin state and gene expression during zebrafish development and cell fate determination. In general, the assay works well, the data look convincing and conclusions are beneficial to the community.

      Thank you very much for your supportive remarks.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Joint analysis of multiple modalities in single cells will provide a comprehensive view of cell fate states. In this manuscript, Bhardwaj et al developed a single-cell multi-omics assay, T-ChIC, to simultaneously capture histone modifications and full-length transcriptome and applied the method on early embryos of zebrafish. The authors observed a decoupled relationship between the chromatin modifications and gene expression at early developmental stages. The correlation becomes stronger as development proceeds, as genes are silenced by the cis-spreading of the repressive marker H3k27me3. Overall, the work is well performed, and the results are meaningful and interesting to readers in the epigenomic and embryonic development fields. There are some concerns before the manuscript is considered for publication.

      We thank the reviewer for appreciating the quality of our study.

      Major concerns:

      (1) A major point of this study is to understand embryo development, especially gastrulation, with the power of scMulti-Omics assay. However, the current analysis didn't focus on deciphering the biology of gastrulation, i.e., lineage-specific pioneer factors that help to reform the chromatin landscape. The majority of the data analysis is based on the temporal dimension, but not the cell-type-specific dimension, which reduces the value of the single-cell assay.

      We focussed on the lineage-specific transcription factor activity during gastrulation in Figure 4 and S8 of the manuscript and discovered several interesting regulators active at this stage. During our analysis of the temporal dimension for the rest of the manuscript, we also classified the cells by their germ layer and “latent” developmental time by taking the full advantage of the single-cell nature of our data. Additionally, we have now added the cell-type-specific H3K27me3 demethylation results for 24hpf in response to your comment below. We hope that these results, together with our openly available dataset would demonstrate the advantage of the single-cell aspect of our dataset.

      (2) The cis-spreading of H3K27me3 with developmental time is interesting. Considering H3k27me3 could mark bivalent regions, especially in pluripotent cells, there must be some regions that have lost H3k27me3 signals during development. Therefore, it's confusing that the authors didn't find these regions (30% spreading, 70% stable). The authors should explain and discuss this issue.

      Indeed we see that ~30% of the bins enriched in the pluripotent stage spread, while 70% do not seem to spread. In line with earlier observations(Hickey et al., 2022; Vastenhouw et al., 2010), we find that H3K27me3 is almost absent in the zygote and is still being accumulated until 24hpf and beyond. Therefore the majority of the sites in the genome still seem to be in the process of gaining H3K27me3 until 24hpf, explaining why we see mostly “spreading” and “stable” states. Considering most of these sites are at promoters and show signs of bivalency, we think that these sites are marked for activation or silencing at later stages. We have discussed this in the manuscript (“discussion”). However, in response to this and earlier comment, we went back and searched for genes that show H3K27me3 demethylation in the most mature cell types (at 24 hpf) in our data, and found a subset of genes that show K27 demethylation after acquiring them earlier. Interestingly, most of the top genes in this list are well-known as developmentally important for their corresponding cell types. We have added this new result and discussed it further in the manuscript (Fig. 2d,e, , Supplementary table 3).

      Minors:

      (1) The authors cited two scMulti-omics studies in the introduction, but there have been lots of single-cell multi-omics studies published recently. The authors should cite and consider them.

      We have cited more single-cell chromatin and multiome studies focussed on early embryogenesis in the introduction now.

      (2) bT-ChIC seems to have been presented in a previous paper (ref 15). Therefore, Fig. 1a is unnecessary to show.

      Figure 1a. shows a summary of our Zebrafish TChIC workflow, which contains the unique sample multiplexing and sorting strategy to reduce batch effects, which was not applied in the original TChIC workflow. We have now clarified this in “Results”.

      (3) It's better to show the percentage of cell numbers (30% vs 70%) for each heatmap in Figure 2C.

      We have added the numbers to the corresponding legends.

      (4) Please double-check the citation of Fig. S4C, which may not relate to the conclusion of signal differences between lineages.

      The citation seems to be correct (Fig. S4C supplements Fig. 2C, but shows mesodermal lineage cells) but the description of the legend was a bit misleading. We have clarified this now.

      (5) Figure 4C has not been cited or mentioned in the main text. Please check.

      Thanks for pointing it out. We have cited it in Results now.

      Reviewer #2 (Significance):

      Strengths:

      This work utilized a new single-cell multi-omics method and generated abundant epigenomics and transcriptomics datasets for cells covering multiple key developmental stages of zebrafish.

      Limitations:

      The data analysis was superficial and mainly focused on the correspondence between the two modalities. The discussion of developmental biology was limited.

      Advance:

      The zebrafish single-cell datasets are valuable. The T-ChIC method is new and interesting.

      The audience will be specialized and from basic research fields, such as developmental biology, epigenomics, bioinformatics, etc.

      I'm more specialized in the direction of single-cell epigenomics, gene regulation, 3D genomics, etc.

      Thank you for your remarks.

      Reviewer #3 (Evidence, reproducibility and clarity):

      This manuscript introduces T‑ChIC, a single‑cell multi‑omics workflow that jointly profiles full‑length transcripts and histone modifications (H3K27me3 and H3K4me1) and applies it to early zebrafish embryos (4-24 hpf). The study convincingly demonstrates that chromatin-transcription coupling strengthens during gastrulation and somitogenesis, that promoter‑anchored H3K27me3 spreads in cis to enforce developmental gene silencing, and that integrating TF chromatin status with expression can predict lineage‑specific activators and repressors.

      Major concerns

      (1) Independent biological replicates are absent, so the authors should process at least one additional clutch of embryos for key stages (e.g., 6 hpf and 12 hpf) with T‑ChIC and demonstrate that the resulting data match the current dataset.

      Thanks for pointing this out. We had, in fact, performed T-ChIC experiments in four rounds of biological replicates (independent clutch of embryos) and merged the data to create our resource. Although not all timepoints were profiled in each replicate, two timepoints (10 and 24hpf) are present in all four, and the celltype composition of these replicates from these 2 timepoints are very similar. We have added new plots in figure S2f and added (new) supplementary table (#1) to highlight the presence of biological replicates.

      (2) The TF‑activity regression model uses an arbitrary R² {greater than or equal to} 0.6 threshold; cross‑validated R<sup>2</sup> distributions, permutation‑based FDR control, and effect‑size confidence intervals are needed to justify this cut‑off.

      Thank you for this suggestion. We did use 10-fold cross validation during training and obtained the R<sup>2</sup>> values of TF motifs from the independent test set as an unbiased estimate. However, the cutoff of R<sup>2</sup> > 0.6 to select the TFs for classification was indeed arbitrary. In the revised version, we now report the FDR-adjusted p-values for these R<sup>2</sup> estimates based on permutation tests, and select TFs with a cutoff of padj < 0.01. We have updated our supplementary table #4 to include the p-values for all tested TFs. However, we see that our arbitrary cutoff of 0.6 was in fact, too stringent, and we can classify many more TFs based on the FDR cutoffs. We also updated our reported numbers in Fig. 4c to reflect this. Moreover, supplementary table #4 contains the complete list of TFs used in the analysis to allow others to choose their own cutoff.

      (3) Predicted TF functions lack empirical support, making it essential to test representative activators (e.g., Tbx16) and repressors (e.g., Zbtb16a) via CRISPRi or morpholino knock‑down and to measure target‑gene expression and H3K4me1 changes.

      We agree that independent validation of the functions of our predicted TFs on target gene activity would be important. During this revision, we analysed recently published scRNA-seq data of Saunders et al. (2023) (Saunders et al., 2023), which includes CRISPR-mediated F0 knockouts of a couple of our predicted TFs, but the scRNAseq was performed at later stages (24hpf onward) compared to our H3K4me1 analysis (which was 4-12 hpf). Therefore, we saw off-target genes being affected in lineages where these TFs are clearly not expressed (attached Fig 1). We therefore didn’t include these results in the manuscript. In future, we aim to systematically test the TFs predicted in our study with CRISPRi or similar experiments.

      (4) The study does not prove that H3K27me3 spreading causes silencing; embryos treated with an Ezh2 inhibitor or prc2 mutants should be re‑profiled by T‑ChIC to show loss of spreading along with gene re‑expression.

      We appreciate the suggestion that indeed PRC2-disruption followed by T-ChIC or other forms of validation would be needed to confirm whether the H3K27me3 spreading is indeed causally linked to the silencing of the identified target genes. But performing this validation is complicated because of multiple reasons: 1) due to the EZH2 contribution from maternal RNA and the contradicting effects of various EZH2 zygotic mutations (depending on where the mutation occurs), the only properly validated PRC2-related mutant seems to be the maternal-zygotic mutant MZezh2, which requires germ cell transplantation (see Rougeot et al. 2019 (Rougeot et al., 2019)) , and San et al. 2019 (San et al., 2019) for details). The use of inhibitors have been described in other studies (den Broeder et al., 2020; Huang et al., 2021), but they do not show a validation of the H3K27me3 loss or a similar phenotype as the MZezh2 mutants, and can present unwanted side effects and toxicity at a high dose, affecting gene expression results. Moreover, in an attempt to validate, we performed our own trials with the EZH2 inhibitor (GSK123) and saw that this time window might be too short to see the effect within 24hpf (attached Fig. 2). Therefore, this validation is a more complex endeavor beyond the scope of this study. Nevertheless, our further analysis of H3K27me3 de-methylation on developmentally important genes (new Fig. 2e-f, Sup. table 3) adds more confidence that the polycomb repression plays an important role, and provides enough ground for future follow up studies.

      Minor concerns

      (1) Repressive chromatin coverage is limited, so profiling an additional silencing mark such as H3K9me3 or DNA methylation would clarify cooperation with H3K27me3 during development.

      We agree that H3K27me3 alone would not be sufficient to fully understand the repressive chromatin state. Extension to other chromatin marks and DNA methylation would be the focus of our follow up works.

      (2) Computational transparency is incomplete; a supplementary table listing all trimming, mapping, and peak‑calling parameters (cutadapt, STAR/hisat2, MACS2, histoneHMM, etc.) should be provided.

      As mentioned in the manuscript, we provide an open-source pre-processing pipeline “scChICflow” to perform all these steps (github.com/bhardwaj-lab/scChICflow). We have now also provided the configuration files on our zenodo repository (see below), which can simply be plugged into this pipeline together with the fastq files from GEO to obtain the processed dataset that we describe in the manuscript. Additionally, we have also clarified the peak calling and post-processing steps in the manuscript now.

      (3) Data‑ and code‑availability statements lack detail; the exact GEO accession release date, loom‑file contents, and a DOI‑tagged Zenodo archive of analysis scripts should be added.

      We have now publicly released the .h5ad files with raw counts, normalized counts, and complete gene and cell-level metadata, along with signal tracks (bigwigs) and peaks on GEO. Additionally, we now also released the source datasets and notebooks (Rmarkdown format) on Zenodo that can be used to replicate the figures in the manuscript, and updated our statements on “Data and code availability”.

      (4) Minor editorial issues remain, such as replacing "critical" with "crucial" in the Abstract, adding software version numbers to figure legends, and correcting the SAMtools reference.

      Thank you for spotting them. We have fixed these issues.

      Reviewer #3 (Significance):

      The method is technically innovative and the biological insights are valuable; however, several issues-mainly concerning experimental design, statistical rigor, and functional validation-must be addressed to solidify the conclusions.

      Thank you for your comments. We hope to have addressed your concerns in this revised version of our manuscript.

      Author response image 1.

      (1) (top) expression of tbx16, which was one of the common TFs detected in our study and also targeted by Saunders et al by CRISPR. tbx16 expression is restricted to presomitic mesoderm lineage by 12hpf, and is mostly absent from 24hpf cell types. (bottom) shows DE genes detected in different cellular neighborhoods (circled) in tbx16 crispants from 24hpf subset of cells in Saunders et al. None of these DE genes were detected as “direct targets” in our analysis and therefore seem to be downstream effects. (2) Effect of 3 different concentrations of EZH2 inhibitor (GSK123) on global H3K27me3 quantified by flow cytometry using fluorescent coupled antibody (same as we used in T-ChIC) in two replicates. The cells were incubated between 3 and 10 hpf and collected afterwards for this analysis. We observed a small shift in H3K27me3 signal, but it was inconsistent between replicates.

      References

      Chen, Z., Djekidel, M. N., & Zhang, Y. (2021). Distinct dynamics and functions of H2AK119ub1 and H3K27me3 in mouse preimplantation embryos. Nature Genetics, 53(4), 551–563. den Broeder, M. J., Ballangby, J., Kamminga, L. M., Aleström, P., Legler, J., Lindeman, L. C., & Kamstra, J. H. (2020). Inhibition of methyltransferase activity of enhancer of zeste 2 leads to enhanced lipid accumulation and altered chromatin status in zebrafish. Epigenetics & Chromatin, 13(1), 5.

      Hickey, G. J., Wike, C. L., Nie, X., Guo, Y., Tan, M., Murphy, P. J., & Cairns, B. R. (2022). Establishment of developmental gene silencing by ordered polycomb complex recruitment in early zebrafish embryos. eLife, 11, e67738.

      Huang, Y., Yu, S.-H., Zhen, W.-X., Cheng, T., Wang, D., Lin, J.-B., Wu, Y.-H., Wang, Y.-F., Chen, Y., Shu, L.-P., Wang, Y., Sun, X.-J., Zhou, Y., Yang, F., Hsu, C.-H., & Xu, P.-F. (2021). Tanshinone I, a new EZH2 inhibitor restricts normal and malignant hematopoiesis through upregulation of MMP9 and ABCG2. Theranostics, 11(14), 6891–6904.

      Mei, H., Kozuka, C., Hayashi, R., Kumon, M., Koseki, H., & Inoue, A. (2021). H2AK119ub1 guides maternal inheritance and zygotic deposition of H3K27me3 in mouse embryos. Nature Genetics, 53(4), 539–550.

      Rougeot, J., Chrispijn, N. D., Aben, M., Elurbe, D. M., Andralojc, K. M., Murphy, P. J., Jansen, P. W. T. C., Vermeulen, M., Cairns, B. R., & Kamminga, L. M. (2019). Maintenance of spatial gene expression by Polycomb-mediated repression after formation of a vertebrate body plan. Development (Cambridge, England), 146(19), dev178590.

      San, B., Rougeot, J., Voeltzke, K., van Vegchel, G., Aben, M., Andralojc, K. M., Flik, G., & Kamminga, L. M. (2019). The ezh2(sa1199) mutant zebrafish display no distinct phenotype. PloS One, 14(1), e0210217.

      Saunders, L. M., Srivatsan, S. R., Duran, M., Dorrity, M. W., Ewing, B., Linbo, T. H., Shendure, J., Raible, D. W., Moens, C. B., Kimelman, D., & Trapnell, C. (2023). Embryo-scale reverse genetics at single-cell resolution. Nature, 623(7988), 782–791.

      Vastenhouw, N. L., Zhang, Y., Woods, I. G., Imam, F., Regev, A., Liu, X. S., Rinn, J., & Schier, A. F. (2010). Chromatin signature of embryonic pluripotency is established during genome activation. Nature, 464(7290), 922–926.

      Zeller, P., Blotenburg, M., Bhardwaj, V., de Barbanson, B. A., Salmén, F., & van Oudenaarden, A. (2024). T-ChIC: multi-omic detection of histone modifications and full-length transcriptomes in the same single cell. In bioRxiv (p. 2024.05.09.593364). https://doi.org/10.1101/2024.05.09.593364

      Zeller, P., Yeung, J., Viñas Gaza, H., de Barbanson, B. A., Bhardwaj, V., Florescu, M., van der Linden, R., & van Oudenaarden, A. (2022). Single-cell sortChIC identifies hierarchical chromatin dynamics during hematopoiesis. Nature Genetics. https://doi.org/10.1038/s41588-022-01260-3

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study builds upon a major theoretical account of value-based choice, the 'attentional drift diffusion model' (aDDM), and examines whether and how this might be implemented in the human brain using functional magnetic resonance imaging (fMRI). The aDDM states that the process of internal evidence accumulation across time should be weighted by the decision maker's gaze, with more weight being assigned to the currently fixated item. The present study aims to test whether there are (a) regions of the brain where signals related to the currently presented value are affected by the participant's gaze; (b) regions of the brain where previously accumulated information is weighted by gaze.

      To examine this, the authors developed a novel paradigm that allowed them to dissociate currently and previously presented evidence, at a timescale amenable to measuring neural responses with fMRI. They asked participants to choose between bundles or 'lotteries' of food times, which they revealed sequentially and slowly to the participant across time. This allowed modelling of the haemodynamic response to each new observation in the lottery, separately for previously accumulated and currently presented evidence.

      Using this approach, they find that regions of the brain supporting valuation (vmPFC and ventral striatum) have responses reflecting gaze-weighted valuation of the currently presented item, whereas regions previously associated with evidence accumulation (preSMA and IPS) have responses reflecting gaze-weighted modulation of previously accumulated evidence.

      Strengths:

      A major strength of the current paper is the design of the task, nicely allowing the researchers to examine evidence accumulation across time despite using a technique with poor temporal resolution. The dissociation between currently presented and previously accumulated evidence in different brain regions in GLM1 (before gaze-weighting), as presented in Figure 5, is already compelling. The result that regions such as preSMA respond positively to |AV| (absolute difference in accumulated value) is particularly interesting, as it would seem that the 'decision conflict' account of this region's activity might predict the exact opposite result. Additionally, the behaviour has been well modelled at the end of the paper when examining temporal weighting functions across the multiple samples.

      Weaknesses:

      The results relating to gaze-weighting in the fMRI signal could do with some further explication to become more complete. A major concern with GLM2, which looks at the same effects as GLM1 but now with gaze-weighting, is that these gaze-weighted regressors may be (at least partially) correlated with their non-gaze-weighted counterparts (e.g., SVgaze will correlate with SV). But the non-gaze-weighted regressors have been excluded from this model. In other words, the authors are not testing for effects of gaze-weighting of value signals *over and above* the base effects of value in this model. In my mind, this means that the GLM2 results could simply be a replication of the findings from GLM1 at present. GLM3 is potentially a stronger test, as it includes the value signals and the interaction with gaze in the same model. But here, while the link to the currently attended item is quite clear (and a replication of Lim et al, 2011), the link to previously accumulated evidence is a bit contorted, depending upon the interpretation of a behavioural regression to interpret the fMRI evidence. The results from GLM3 are also, by the authors' own admission, marginal in places.

      We have addressed this comment with new GLMs. The new GLM1 includes both non-gazeweighted and gaze-weighted regressors and finds that the vmPFC and striatum reflect gazeweighted sampled value, while the preSMA reflects gaze-weighted accumulated value. We have now dropped the old GLM3 and added two other GLMs, one that explicitly interacts accumulated value with accumulated dwell, and the other that considers only partial gaze discounting. These analyses all support the preSMA as encoding gaze-weighted accumulated value.

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors seek to disentangle brain areas that encode the subjective value of individual stimuli/items (input regions) from those that accumulate those values into decision variables (integrators) for value-based choice. The authors used a novel task in which stimulus presentation was slowed down to ensure that such a dissociation was possible using fMRI despite its relatively low temporal resolution. In addition, the authors leveraged the fact that gaze increases item value, providing a means of distinguishing brain regions that encode decision variables from those that encode other quantities such as conflict or time-on-task. The authors adopt a region-of-interest approach based on an extensive previous literature and found that the ventral striatum and vmPFC correlated with the item values and not their accumulation, whereas the pre-SMA, IPS, and dlPFC correlated more strongly with their accumulation. Further analysis revealed that the preSMA was the only one of the three integrator regions to also exhibit gaze modulation.

      Strengths:

      The study uses a highly innovative design and addresses an important and timely topic. The manuscript is well-written and engaging, while the data analysis appears highly rigorous.

      Weaknesses:

      With 23 subjects, the study has relatively low statistical power for fMRI.

      We believe several features of our study design and analytic approach mitigate concerns regarding statistical power.

      First, our paradigm leveraged a within-subjects design with high total sample counts. Each participant completed approximately 60 choice trials across three 15-minute runs, with an average of 6.37 samples per trial. This yielded roughly 380 observations per participant, providing substantial statistical power at the individual level before aggregating across subjects. This within-subject power is particularly important for detecting parametric effects, as our regressors of interest (|∆_S_V| and |∆AV|) varied continuously across and within trials.

      Second, rather than conducting an exploratory whole-brain analysis that would require larger sample sizes to correct for multiple comparisons, we employed a targeted ROI approach based on well-established regions from prior literature (e.g., Bartra et al., 2013; Hare et al., 2011). This ROI-driven approach substantially increases statistical power by reducing the search space and leverages theoretical predictions about where effects should occur. Our novel contribution that gaze modulation of accumulated evidence signals was reflected in preSMA activity builds naturally on established findings. However, we acknowledge that a larger sample size would provide greater confidence in the null effects and would enable more detailed individual differences analyses.

      We have added a brief acknowledgement of the sample size limitation to the Discussion section of the main text:

      “While our sample size of 20 subjects is modest by current neuroimaging standards, the withinsubject statistical power from our extended decision paradigm (~380 observations per subject), combined with hypothesis-driven ROI analyses and multiple comparisons correction, provides confidence in our core findings. Nevertheless, replication with larger samples would be valuable, particularly for more fully characterizing null effects and marginal findings.”

      Recommendations for the authors:

      Editor Comments:

      Reviewer 1 in particular makes a number of suggestions for additional analyses that would help to strengthen the evidence supporting your conclusions.

      We thank the editor and the reviewers for the helpful suggestions for improving our manuscript. We discuss our efforts to address each point below.

      Reviewer #1 (Recommendations for the authors):

      (1) To address my concerns about GLM2, the first thing to do might be to simply show the correlation between the regressors used across the three different models (e.g., as a figure in the methods). Although the authors have done a good job to ensure that AV and SV are decorrelated when including them both in the same model, they haven't shown us whether the regressors used in, for example, GLM2 are correlated/similar to the regressors used in GLM1. This is important information for interpretation.

      Thank you for raising concerns about the overlap between different models. We agree that additional information regarding the correlation among sample-level regressors would aide readers in understanding the differences among the analyses. We now include this information in Figure 7 in the Methods section, as requested. While |SV| was uncorrelated with gaze-weighted |SV| (|SV<sub>Gaze</sub>|; Pearson’s r = 0.002, p = 0.848), lagged |AV| was significantly correlated with lagged, gaze-weighted |AV| (lagged |AV<sub>Gaze</sub>|; r = 0.365, p < 2.2 × 10<sup.-16</sup>).

      (2) The acid test for gaze-modulation of value signals would be to show that the gazemodulated signals explain the fMRI results over and above the non-gaze-modulated signals. This could simply mean including SVgaze and SV (and equivalent terms for AV) within the same GLM. Following from point (1), the authors may point out that these terms are highly correlated - yes, but the GLM will then test for the effects of SVgaze *over and above* the effects of SV. (In fact, although I'd normally caution against orthogonalisation - it would here be totally legitimate to orthogonalise SVgaze w.r.t. SV).

      We appreciate the reviewer’s suggestions for more robust tests of the presence of gaze-weighted signals. For reasons highlighted in our response above, we were initially hesitant to include both types of regressors in the same model due to their significant correlation. However, we now report the results of this analysis in the main text as the new GLM 1. This model incorporates both gaze-weighted and non-gaze-weighted terms. For each contrast we used the same procedures as reported in the main text (family-wise error corrected at p<0.05 and clusterforming thresholds at p<0.005).

      In the vmPFC, we found significant effects of both |∆SV| (peak voxel: x = -14, y = 44, z = -12; t = 3.90, p = 0.0190) and |∆SV<sub>Gaze</sub>| (peak voxel: x = 4, y = 38, z = -4; t= 5.21 p = 0.004), but no effects of |∆AV| or |∆AV<sub>Gaze</sub>|. The striatum also showed a significant correlation with |∆SV<sub>Gaze</sub>| (peak voxel: x = 22, y = 20, z = -10; t = 5.10 p = 0.014), but no other regressors.

      In the pre-SMA, we found a significantly positive relationship with both |∆AV| (peak voxel: x = 4, y = 14, z = 50; t = 4.75 p < 0.001) and |∆AV<sub>Gaze</sub>| (peak voxel: x = 4, y = 18, z = 50; t = 2.98, p = 0.032). In contrast, the dlPFC (x = 40, y = 34, z = 26; t = 6.83, p < 0.001) and IPS (x = 42, y = -50, z = 42; t = 5.16, p \= 0.010) were only correlated with |∆AV|. No other significant contrasts emerged.

      These results provide direct support for the presence of gaze-modulated value signals in the brain, which we now describe in the main text Results section.

      (3) With regards to GLM3, it would help to provide a bit more detail on what the time series looks like for the gaze regressor in this model - is it the entire timeseries of gaze (which presumably shifts back/forth between options multiple times within each trial) which is being convolved with the HRF? This seems different from how gaze is being calculated in GLM2, where it is amalgamated into an 'average gaze difference' within a sample between left/right options, if I understand the text correctly?

      We apologize for the lack of details regarding how we operationalized the gaze regressors in our analyses. You are correct that the gaze regressor was calculated differently in GLM2 and GLM3.

      However, in response to the reviewer’s points above (Major Point 2) and below (Major Point 4, Minor Point 1), we have decided to drop the old GLM3 from the paper while incorporating a revised GLM1 (combining old GLM1 and GLM2) and two new GLMs (see responses to Major Point 4 and Minor Point 1) to provide clearer evidence for gaze modulation of accumulated value in the brain.

      (4) Also, is there not a reason why it isn't more appropriate to interact AV with *previously deployed gaze difference* (accumulated across previous samples) in this model, rather than the current gaze location? The latter seems to rely upon the indirect linkage via the behavioural modelling result, which seems to weaken the claim.

      We thank the reviewer for this suggestion. We agree that our original GLM3 approach was limited because it interacted AV with current binary gaze location, which relies on the indirect behavioral relationship we established (i.e., that current gaze is negatively correlated with accumulated past gaze).

      The original GLM2 (which is now incorporated into the new GLM1) implemented something similar to what the reviewer is suggesting as it used gaze-weighted values accumulated across all previous samples. Specifically, in GLM2, the gaze-weighted accumulated value (AV<sub>gaze</sub>) was calculated as the sum of all previous sampled values, each weighted by the proportion of gaze allocated to each option during that sampling period.

      However, to more directly test whether accumulated evidence signals are modulated by accumulated gaze allocation we have now run an additional analysis (GLM2). In this analysis we have revised the old GLM3 to include additional regressors: ∆SV, lagged ∆AV, current gaze location, accumulated dwell advantage, ∆SV × current gaze location, and lagged ∆AV × accumulated dwell advantage.

      The two new regressors were defined as follows:

      Accumulated dwell advantage: For each sample t, accumulated dwell advantage represents the cumulative difference in gaze allocation up to sample t-1, calculated as (total dwell left – total dwell right) / (total dwell left + total dwell right). This is a continuous measure from -1 (all previous gaze to right) to +1 (all previous gaze to left).

      ∆AV × accumulated dwell advantage: The interaction between accumulated values and accumulated dwell advantage, which directly tests whether brain regions encoding accumulated value are modulated by the history of gaze allocation.

      This approach is conceptually similar to old GLM2’s gaze-weighting method, but allows us to examine the interaction effect more explicitly as a separate regressor rather than having it embedded within the value calculation.

      Here, we found that the pre-SMA showed a positive correlation with the ∆AV × accumulated dwell advantage term (peak voxel: x = 8, y = 10, z = 58; t = 3.10, p = 0.0258). Surprisingly, the striatum also showed a correlation with this term (peak: x = -16, y = 10, z = -6; t = 4.07, p = 0.0176). No other ROIs showed significant relationships.

      This analysis provides additional evidence that pre-SMA encodes accumulated value signals that are modulated by accumulated gaze allocation, without relying on indirect relationships between current and past gaze. We now report these results in the main text as GLM2 as follows:

      “To more directly test whether accumulated evidence signals were modulated by accumulated gaze allocation throughout a trial, we conducted additional, exploratory analyses. Specifically, we ran a GLM that incorporated the following two terms: accumulated dwell advantage and ∆AV × accumulated dwell advantage, in addition to ∆SV, the current gaze location, and ∆SV × current gaze location.

      We calculated accumulated dwell advantage as follows: For each sample t, accumulated dwell advantage is the cumulative difference in gaze allocation up to sample t-1, calculated as (total dwell left – total dwell right) / (total dwell left + total dwell right). This is a continuous measure from -1 (all previous gaze to right) to +1 (all previous gaze to left).

      We also included the interaction between accumulated dwell advantage and ∆AV (i.e., signed accumulated evidence). This interaction term is positive when gaze is primarily to the left and left has more value or when gaze is primarily to the right and right has more value. This interaction term directly tests whether brain regions encoding accumulated evidence are modulated by the history of gaze allocation. This approach allows us to examine the interaction effect more explicitly as a separate regressor rather than having it embedded within the value calculation itself.

      This GLM revealed a positive correlation between pre-SMA activity and the ∆AV × accumulated dwell advantage term (peak voxel: x = 8, y = 10, z = 58; t = 3.01, p = 0.026). Surprisingly, the striatum also showed a correlation with this term (peak voxel: x = -16, y = 10, z = -6; t = 4.07, p = 0.018). Additionally, activity in the dlPFC was positively correlated with ∆SV (peak voxel: x = -36, y = 34, z = 22; t = 3.96, p \= 0.016). No other ROIs showed significant relations.

      This analysis provides additional evidence that the pre-SMA encodes accumulated value signals that are modulated by the history of gaze allocation.”

      Minor

      (1) "In Trial A, the subject looks left 30% of the time and right 70% of the time. In Trial B, the subject looks left 70% of the time and right 30% of the time. In Trial A, the net input value ("drift rate") would be |0.3 ∙ 7 − 0.7 ∙ 3| = 0. In Trial B, the drift rate would be |0.7 ∙ 7 − 0.3 ∙ 3| = 4." I may be missing something, but isn't this consistent with an aDDM with theta=0, rather than theta=0.3-0.5 as is typically found?

      The reviewer raises an important point about our assumptions regarding attentional discounting. We agree that our approach could be problematic as it may assume stronger discounting than has been observed in the literature.

      To address this concern, we calculated drift on a sample-by-sample basis before aggregating to the trial level. Following Smith, Krajbich, and Webb (2019), for each individual sample within a trial, we computed:

      β = (G<sub>Left</sub> × V<sub>Left</sub>) – (G<sub>Right</sub> × V<sub>Right</sub>)

      γ = (G<sub>Right</sub> × V<sub>Left</sub>) – (G<sub>Left</sub> × V<sub>Right</sub>),

      where G<sub>Left</sub> and G<sub>Right</sub> represent the proportion of time spent fixating left versus right within that specific sample, and V<sub>Left</sub> and V<sub>Right</sub> are the instantaneous values of the left and right options. We then averaged these sample-level β and γ values across all samples within each trial to obtain trial-level regressors. This approach preserves the fine-grained temporal dynamics of gazedependent value accumulation that would be lost by calculating gaze proportions only at the trial level.

      Using this sample-level method in a mixed-effects logistic regression predicting choice (left vs. right), we estimated subject-specific values of θ = γ/β. Across our sample (N=20), we found mean θ = 0.77 (SD = 0.21, range = 0.55–1.25). These estimates are somewhat higher than the typical aDDM findings of attentional bias (θ = 0.3–0.5). This may reflect the drawn-out nature of this task relative to prior aDDM tasks.

      Next, we ran a new GLM that incorporated these θ estimates in the sampled value estimates. For this GLM3, we computed θ-weighted sampled-value (|∆_TW_SV|) as:

      TWSV = (G<sub>Left</sub> × (V<sub>Left</sub> – θV<sub>Right</sub>)) – (G_R × (V<sub>Right</sub> – θV<sub>Left</sub>)).

      Similar to GLM1, we computed an accumulated value signal based on the lagged sum of previous samples’ |∆_TW_SV| (i.e., |∆_TW_AV|).

      We found significant positive effects of |∆TW_SV| in the vmPFC (peak voxel: x = -14, y = 44, z = -12; t = 3.57, _p = 0.0270) and IPS (peak voxel: x = 30, y = -28, z = 40; t = 4.58 p = 0.0198), but in no other ROI.

      In contrast, we found significant positive relationships between |∆TW_AV| and activity in the preSMA (peak voxel: x = 0, y = 22, z = 52; t = 4.68, _p = 0.0014), dlPFC (peak voxel: x = 40, y = 32, z = 26; t = 4.32, p = 0.0040), and IPS (peak voxel: x = 44, y = -48, z = 42; t = 6.26, p < 0.0000). Notably, we also observed a significant relationship between |∆TW_AV| and activity in the vmPFC (x = 8, y = 38, z = 18; t = 3.89, _p = 0.0410). No other significant contrasts emerged.

      We now report this additional analysis as GLM3 in the main text, as follows:

      “In our first set of analyses, we implicitly assumed complete discounting of non-fixated information, in contrast with previous studies that have generally found only partial discounting (Krajbich et al., 2010; Sepulveda et al., 2020; Smith & Krajbich, 2019; Westbrook et al., 2020). To verify that our results are robust to inter-subject variability in attentional discounting, we estimated subject-level attentional discounting parameters and then re-estimated our original GLM with new, recalculated gaze-weighted value regressors.

      Following Smith, Krajbich, and Webb (2019), for each individual sample within a trial, we computed:

      β = (G<sub>Left</sub> × V<sub>Left</sub>) – (G<sub>Right</sub> × V<sub>Right</sub>) γ = (G<sub>Right</sub> × V<sub>Left</sub>) – (G<sub>Left</sub> × V<sub>Right</sub>), where G<sub>Left</sub> and G<sub>Right</sub> represent the proportion of time spent gazing left versus right within that specific sample, and V<sub>Left</sub> and V<sub>Right</sub> are the instantaneous values of the left and right options. We then averaged these sample-level β and γ values across all samples within each trial to obtain trial-level regressors. We then ran a mixed-effects logistic regression predicting choice (left vs. right) as a function of β and γ and then calculated subject-specific values of θ = γ/β. Across our sample (N=20), we found mean θ = 0.77 (SD = 0.21, range = 0.55–1.25).

      Next, for the GLM, we computed θ-weighted sampled-value (|∆SV<sub>θ</sub>|) as:

      SV<sub>θ</sub> = (G<sub>Left</sub> × (V<sub>Left</sub> − _θ_V<sub>Right</sub>)) – (G<sub>Right</sub> × (V<sub>Right</sub> − _θ_V<sub>Left</sub>))

      Similar to the original GLM, we computed an accumulated value signal, |∆AV<sub>θ</sub>|, based on the lagged sum of previous samples’ |∆SV<sub>θ</sub>|.

      We found significant positive effects of |∆SV<sub>θ</sub>| in the vmPFC (peak voxel: x = -14, y = 44, z = 12; t = 3.57 p = 0.027) and IPS (peak voxel: x = 30, y = -28, z = 40; t = 4.58 p = 0.020), but in no other ROI.

      In contrast, we found significant positive relationships between |∆AV<sub>θ</sub>| and activity in the preSMA (peak voxel: x = 0, y = 22, z = 52; t = 4.68, p = 0.001), dlPFC (peak voxel: x = 40, y = 32, z = 26; t = 4.32, p = 0.004), and IPS (peak voxel: x = 44, y = -48, z = 42; t = 6.26, p < 0.0001). Notably, we also observed a significant relationship between |∆AV<sub>θ</sub>| and activity in the vmPFC (x = 8, y = 38, z = 18; t = 3.89, p = 0.041). No other significant contrasts emerged.

      In summary, these analyses provide additional evidence that the vmPFC encodes gaze-weighted sampled value signals and the pre-SMA encodes gaze-weighted accumulated value signals, though other correlations also emerged.”

      (2) The reporting of statistical results in the fMRI could be sharpened - e.g. in the figure legends, don't just say "Voxels thresholded at p < .05.", but make clear whether you mean FWE whole-brain corrected (I think you do from the methods) or whether this is uncorrected for display; similarly, for the peak voxels, report the associated Z statistic at that voxel rather than just "negative beta".

      We agree that it is important to include additional details regarding how we reported the statistical results. We now clarify our procedures in the main text:

      “We report results using FWE-corrected statistical significance of p < 0.05 and a cluster significance threshold of p < 0.005.”

      We now also report the T statistics for peak voxels.

      (3) A couple of the citations are slightly wrong - e.g., Kolling et al 2012 shouldn't be cited as arguing for decision conflict, as in fact it argues strongly against this account and in favour of a foraging account of ACC activity. Similarly, Hunt et al 2018 doesn't provide support for decision conflict; instead, it shows signals in ACC show evidence accumulation for left/right actions over time (although not whether these accumulator signals are gazeweighted, in the same way as the present study).

      We thank the reviewer for pointing out these mistakes in our citations. We have revised the references throughout.

      Reviewer #2 (Recommendations for the authors):

      (1) In some places, the introduction would benefit from fleshing out certain points. For example it is stated “For instance, decisions that are less predictable also tend to take more time (Konovalov & Krajbich, 2019) and can be influenced by attention manipulations (Parnamets et al., 2015; Tavares et al., 2017; Gwinn et al., 2019; Bhatnagar & Orquin, 2022). The quantitative relations between these measures argue for an evidenceaccumulation process.” It is not clear why the relations between them argue for an EA process, and the reader would benefit from some further explanation.

      We thank the reviewer for this helpful suggestion. We agree that the original text did not sufficiently explain why these relationships support evidence-accumulation models. We have revised the introduction to better articulate the mechanistic basis for this claim.

      This revision clarifies these points in the main text:

      “Decisions like this are thought to rely on a bounded, evidence-accumulation process that depends on factors such as the value of the sampled information and shifts in attention. According to this framework, when two options are similar in value, evidence accumulates more slowly towards the decision threshold, resulting in longer response times (RT) and more opportunity for shifts in attention to influence the choice outcome. In contrast, when one option is clearly superior, evidence accumulates more rapidly and the decision is made quickly with less of a relation between gaze and choice. This choice process produces reliable, quantitative patterns in choice, RT, and eye-tracking data (Ashby et al., 2016; Callaway et al., 2021; Gluth et al., 2018; Krajbich et al., 2010; Smith & Krajbich, 2018). For instance, decisions with similar values are more random (i.e., less predictable), tend to take more time (Konovalov & Krajbich, 2019), and can be experimentally manipulated by diverting attention towards one option more than the other (Bhatnagar & Orquin, 2022; Gwinn et al., 2019; Pärnamets et al., 2015; Pleskac et al., 2022; Tavares et al., 2017). Critically, these behavioral measures do not simply correlate; rather, they exhibit precise quantitative relationships consistent with evidence accumulation models (Konovalov & Krajbich, 2019).”

      (2) Some of the study hypotheses also need to be clarified. What are the hypotheses regarding how SV and AV should translate to BOLD in an input vs integrator region? Larger SV/AV = larger BOLD? What predictions would be made for a time-on-task or conflict region? Are the predictions the same or different? Clarifying this will help the reader to understand to what extent the gaze manipulation is pivotal in identifying integrator regions.

      We thank the reviewer for this excellent suggestion. We agree that it is useful to clearly articulate our hypotheses about BOLD signal predictions for different aspects of the model, and why gaze manipulation is critical for distinguishing between them. We have now expanded the introduction to clarify these predictions.

      For input regions, we predicted a straightforward positive relationship: larger sampled value (|ΔSV|) should produce larger BOLD activity. Input regions encode the momentary evidence being sampled (i.e., the relative value of currently presented stimuli). Consistent with prior work (Bartra et al., 2013), we expected such activity in the vmPFC and ventral striatum.

      Critically, we also predicted that these sampled value signals should be modulated by gaze location. The attentional drift-diffusion model (aDDM; Krajbich et al., 2010) posits that attended items receive full value weight while unattended items are discounted. Consistent with prior work (Lim et al., 2011), we expected stronger vmPFC/striatum activity when the higher-value item is fixated compared to when the lower-value item is fixated

      For integrator regions, we predicted an analogous positive relationship: larger accumulated value (|ΔAV|) should produce more BOLD activity. Accumulator regions encode the summed evidence over the course of the decision. Consistent with prior work (Hare et al. 2011; Gluth et al. 2021; Pisauro et al. 2017) we expected such activity in the pre-SMA, dlPFC, and, IPS.

      As with sampled value, we predicted that integrator activity should reflect gaze-weighted accumulated value. Just as inputs are modulated by current gaze, the accumulated evidence should be weighted by the history of gaze allocation over the entire trial.

      Conflict-based models make qualitatively different predictions. Regions implementing conflict monitoring should show increased activity when options are similar in value, regardless of time.

      The conflict account predicts that BOLD activity should scale with inverse value difference: smaller |ΔV| → higher conflict → higher BOLD (Shenhav et al., 2014, 2016). In simple choice tasks, high conflict and high accumulated value are both associated with long RT (Pisauro et al. 2017), leading to ambiguity about how to interpret purported neural correlates of accumulated value. In our task we avoid this ambiguity – we analyze the effect of accumulated value at each point in time, not just at the time of decision. In this case, conflict should be inversely correlated with accumulated value. Moreover, the conflict account makes no predictions about how BOLD activity should be modulated by gaze allocation for a given set of values.

      A more serious concern is the potential link to putative time-on-task BOLD activity. Accumulated value inevitably increases with time, leading to a correlation between the two variables (Grinband et al. 2011; Holroyd et al., 2018; Mumford et al. 2024). This is where the gaze data become particularly important. Time-on-task regions should show no relation with gaze allocation. After accounting for non-gaze-weighted accumulated value, only accumulator, and not time-on-task, regions should show a relation with gaze-weighted accumulated value. The results of the revised GLMs provide exactly such evidence.

      We have edited the manuscript to make clear to readers why our gaze manipulation was not merely exploratory but rather a theoretically-motivated test to distinguish between competing models of decision-related neural activity.

      We have clarified our study hypotheses in the Introduction as follows:

      “We hypothesized that we would find (1) a positive correlation between gaze-weighted |SV| and activity in the reward network (the ventromedial prefrontal cortex (vmPFC) and ventral striatum), and (2) a positive correlation between gaze-weighted |AV| in the pre-supplementary motor area (pre-SMA) (Aquino et al., 2023), dorsolateral prefrontal cortex (dlPFC), and intraparietal sulcus (IPS).”

      We have also added clarifying text about conflict and time-on-task to the Discussion as follows: “Conflict-based models make qualitatively different predictions. Regions implementing conflict monitoring should show increased activity when options are similar in value, regardless of time. The conflict account predicts that BOLD activity should scale with the inverse value difference: smaller |ΔV| → higher conflict → higher BOLD (Shenhav et al., 2014, 2016). In simple choice tasks, high conflict and high accumulated value are both associated with long response times (Pisauro et al., 2017), leading to ambiguity about how to interpret purported neural correlates of accumulated value. In our task we avoided this ambiguity by analyzing the effect of accumulated value at each point in time, not just at the moment of decision. Under this approach, conflict should be inversely correlated with accumulated value (as higher accumulated evidence indicates less similarity between options). Moreover, the conflict account makes no predictions about how BOLD activity should be modulated by gaze allocation for a given set of option values.

      A more serious concern is the potential confound with time-on-task BOLD activity. Accumulated value inevitably increases with time within a trial, leading to a correlation between the two variables (Grinband et al., 2011; Holroyd et al., 2018; Mumford et al., 2024). This is where the gaze data were particularly important. Time-on-task regions should show no relation with gaze allocation patterns. After accounting for non-gaze-weighted accumulated value, only accumulator regions, and not time-on-task regions, should show a relationship with gazeweighted accumulated value. The results of our analyses provide exactly such evidence: preSMA activity was positively correlated with gaze-weighted accumulated value, even when accounting for previous gaze history and individual differences in attention discounting.”

      (3) The authors allude to there being a correlation between SV and AV on this task, but the correlation is never reported. Please report the correlation with and without the removal of T-1.

      We appreciate the reviewer pointing out this omission. We now report all correlations between SV and both the lagged and non-lagged versions of AV in the Methods section (Fig. 7). SV was significantly correlated with the full calculation of AV (Pearson’s r = 0.27). In contrast, this correlation, while still statistically significant, decreased when compared to lagged AV (Pearson’s r = 0.06).

      (4) When examining relationships between SV, AV, and choice probability, the authors note that a larger coefficient for SV compared to AV is an inevitable consequence of an SSM choice process. Please explain why this is the case.

      The reviewer is correct in observing that this point was not made sufficiently clear in the main text. We have now expanded the explanation in the behavioral results section.

      The key insight is that in sequential sampling models, choices occur when accumulated evidence reaches a decision threshold. Importantly, the perceived value of each sample consists of the true underlying value plus random noise. The final sample (SV) is what pushes the accumulated evidence over the threshold, which creates a selection bias: decisions tend to occur when the noise component of SV happens to be positive and large. This means that the perceived final SV systematically overestimates the true SV, biasing upward the regression coefficient for the effect of SV on choice. In contrast, AV represents the sum of all previous sampled evidence, samples that we know did not lead to a choice. These samples are thus more likely to have had a negative or small noise component, meaning that the perceived AV systematically underestimates the true AV. This biases downwards the regression coefficient for the effect of AV on choice.

      In the net, we expect that even when sample evidence is weighted equally over time in the true decision process, regression analyses will inevitably shower larger coefficients for the effects of SV then for those of AV. This is a statistical artefact of the threshold-crossing mechanism, and not a reflection of differential weighting. We have incorporated this explanation into the revised manuscript to make clear why this pattern is an expected consequence of the SSM framework:

      “The larger coefficient for ∆SV compared to ∆AV is an inevitable consequence of an SSM choice process. In SSMs, a choice occurs when accumulated evidence reaches a threshold. Critically, perceived value for any given sample consists of the true underlying value plus random noise. The final sample (∆SV) is what pushes the accumulated evidence over the threshold, which creates a selection effect: decisions tend to be made when the noise component of ∆SV is relatively large and aligned with the ultimate choice, causing the perceived final ∆SV to systematically overestimate the true ∆SV. As a result, the regression coefficient for the effect of final ∆SV on choice is overestimated. In contrast, ∆AV represents the sum of all previous evidence, which includes samples that were insufficient to trigger a choice and thus more likely to have noise components that favored the non-chosen option. This means that the perceived ∆AV systematically underestimates the true ∆AV. As a result, the regression coefficient for the effect of ∆AV on choice is underestimated. This creates an inherent asymmetry between ∆SV and ∆AV: even when the true decision process weights evidence equally over time, regression analyses will show larger coefficients for ∆SV than ∆AV. For any data generated by an SSM, regressing choice probability on final ∆SV and total ∆AV would produce a larger coefficient for ∆SV due to this threshold-crossing selection effect.”

      (5) It is not clear to me why the authors single out the pre-SMA only in the abstract when IPS and dlPFC also show stronger correlations with AV and exhibit gaze modulation in the authors' final non-linear analysis. Further explanation is required in the Discussion and I would also suggest amending the Abstract because the 'Most importantly' claim will not be meaningful for the reader.

      We appreciate the reviewer’s point. In the revised manuscript, we have included several new GLMs, including the new GLM1 that looks at gaze-weighted AV, above and beyond the effect of non-gaze-weighted AV. That analysis only supports pre-SMA. We have now clarified this in the Abstract as follows:

      “Finally, we found gaze modulated accumulated-value signals, above and beyond the non-gazemodulated signals, in the pre-supplementary motor area (pre-SMA), providing novel evidence that visual attention has lasting effects on decision variables and suggesting that activity in the pre-SMA reflects accumulated evidence.”

      (6) Some discussion of statistical power would be warranted given that a sample of 23 is now considered small by current fMRI standards.

      We appreciate the reviewer raising this important issue. We acknowledge that our sample size of 23 subjects (with only 20 having useable eye-tracking data) is on the small side by current fMRI standards. However, we believe several features of our study design and analytic approach mitigate concerns regarding statistical power.

      First, our paradigm leveraged a within-subjects design with high total sample counts. Each participant completed approximately 60 choice trials across three 15-minute runs, with an average of 6.37 samples per trial. This yielded roughly 380 observations per participant, providing substantial statistical power at the individual level before aggregating across subjects. This within-subject power is particularly important for detecting parametric effects, as our regressors of interest (|∆SV| and |∆AV|) varied continuously across and within trials.

      Second, rather than conducting an exploratory whole-brain analysis that would require larger sample sizes to correct for multiple comparisons, we employed a targeted ROI approach based on well-established regions from prior literature (e.g., Bartra et al., 2013; Hare et al., 2011). This ROI-driven approach substantially increases statistical power by reducing the search space and leverages theoretical predictions about where effects should occur. Our novel contribution that gaze modulation of accumulated evidence signals was reflected in pre-SMA activity builds naturally on established findings.

      However, we acknowledge that a larger sample size would provide greater confidence in the null effects and would enable more detailed individual differences analyses.

      We have added a brief acknowledgement of the sample size limitation to the Discussion section of the main text:

      “While our sample size of 20 subjects is modest by current neuroimaging standards, the withinsubject statistical power from our extended decision paradigm (~380 observations per subject), combined with hypothesis-driven ROI analyses and multiple comparisons correction, provides confidence in our core findings. Nevertheless, replication with larger samples would be valuable, particularly for more fully characterizing null effects and marginal findings.”

    1. Author response:

      Thank you for considering our manuscript, “Engineering ATP Import in Yeast Uncovers a Synthetic Route to Extend Cellular Lifespan” (eLife-RP-RA-2025-109761) for publication in eLife. We appreciate the time and effort invested by the reviewers and editors.

      We have carefully read the eLife assessment and both public reviews. After thorough evaluation, we believe there is a significant factual misunderstanding that has propagated through both reviews and fundamentally affected the interpretation of our central findings and the overall evaluation.

      We must also express concern regarding the review process duration. We were informed that the manuscript experienced an extended review period (107 days) due to delay from a third reviewer. Ultimately, we received only two reviews.

      The raised problem of our manuscript containing obvious internal contradictions or technical inconsistencies are not due to flawed data but due to a misinterpretation of measurement directionality.

      We also acknowledge the fact that we should more explicitly describe the figure legend 5, and that the methods sections should include the experimental design that led to the reverse correlation of the AU units.

      Together these facts led to the misinterpretation of the ATP measurements presented in Figure 5, specifically the directionality of the fluorescence-based ATP readout by both reviewers. In this essay, arbitrary units (AU) are reversely correlated with intracellular ATP abundance. Higher AU values correspond to lower ATP levels. This inverse relationship was clearly described in the Results section and figures marked with “Low versus High” of the manuscript, but it appears to have been overlooked. As a result, reviewers interpreted Figure 5 as contradicting Figure 2, when in fact the two datasets are fully consistent.

      Because this misunderstanding affected interpretation of the foundational ATP data, it appears to have influenced evaluation of all downstream conclusions. For example, neither reviewer meaningfully engaged with:

      - The identification of distinct cell death trajectories.

      - The mitochondrial dependency of NTT1-associated toxicity.

      - The integration of ATP depletion with mitochondrial function.

      - The distinction between intracellular ATP manipulation and extracellular ATP sensing mechanisms.

      We fully understand that when foundational data appears contradictory, reviewers naturally deprioritize downstream conclusions. However, in this case, the foundational contradiction does not exist it arises from a misreading of the reporter’s scale.

      From the Results section of the manuscript:

      “Our analysis of ATP abundance throughout the yeast lifespan showed that yeast cells are born with low ATP levels, which gradually increase during their lifespan. Some cells completed their lifespan without any observable reduction in ATP abundance, while others showed a drastic decrease in ATP levels during late life (Fig. 5A–D, Supplementary File S3), consistent with previous observations supporting two modes of yeast lifespan, mediated by mitochondrial and/or SIR2 function (42,46–49). Consistent with our data presented in Figure 2, we also observed significantly lower ATP abundance in NTT1-expressing cells throughout their entire lifespan compared to Wt control cells (Fig. 5A–C). Furthermore, these cells displayed significantly reduced mean and maximum replicative lifespan (RLS), directly indicating that intracellular ATP depletion shortens lifespan (Fig. 5D). Next, we assessed RLS and age-associated ATP changes under ATP supplementation. We found that exposing NTT1 cells to medium supplemented with 10 µM ATP restored intracellular ATP levels (Fig. 5A–C) and significantly (p = 4.03E-18) increased both mean and maximum RLS to levels comparable to WT cells (Fig. 5D).”

      This section explicitly explains that Figure 5 is consistent with Figure 2. LC-MS data (Figure 2) show intracellular ATP depletion in NTT1 cells under baseline conditions and restoration upon extracellular ATP supplementation. Figure 5 shows the same pattern longitudinally. The apparent contradiction raised by both reviewers stems entirely from misreading the directionality of the AU scale.

      In the public assessment,

      Concerns are raised about:

      - “Internally inconsistent, particularly regarding intracellular ATP measurements”

      - “Mismatched ATP measurements”

      - “Conceptual model contradicted by the data”

      - “The plots in Figure 5 make it seem like exogenous ATP addition lowers intracellular ATP…”

      These statements arise directly from the reversed interpretation of the AU scale. If the inverse relationship had been recognized, these perceived inconsistencies would not exist. Unfortunately, this misunderstanding then influenced broader interpretations, including the conclusion that the fundamental NTT1 model is internally contradictory.

      Similarly, Reviewer #2 states that LC-MS and QUEEN reporter data conflict and that ATP supplementation appears to lower intracellular ATP. This again reflects the same directional misunderstanding. There is no conflict between Figure 2 and Figure 5. Both show reduced ATP in NTT1 cells and restoration upon ATP supplementation.

      A second major point concerns the bidirectional transporter hypothesis. Reviewer #1 suggests that NTT1 may be bidirectional. However, NTT1 is well-characterized in the literature as a nucleotide transporter that exchanges extracellular ATP for intracellular ADP. We clearly described this in Figure 1C and cited the appropriate primary literature. The suggestion that we failed to consider directionality appears to stem from the same misinterpretation of intracellular ATP levels. We agree that clarifying the role of ADP/AMP depletion in NTT1-expressing cells would strengthen the manuscript, and we are prepared to revise the text to more explicitly describe how intracellular nucleotide exchange dynamics contribute to ATP depletion under baseline conditions.

      We also note that several criticisms, such as:

      -“Incorrect scale bars”

      - “Figure 5C does not match 5AB”

      - “Conceptual model contradicted by the data”

      - “No apparent correlation between ATP levels and lifespan”

      Are all rooted in this central misunderstanding of how ATP abundance is represented in the fluorescence measurements.

      To address this constructively during the next revision, we are willing to:

      (1) Revise all relevant figure legends to explicitly state that AU values are inversely correlated with ATP abundance. We will expand materials and methods section for clarifying reverse correlation and/or will generate new figures to minimize the confusion.

      (2) Add clarifying annotations directly onto the figures.

      (3) Include new figures for further validation of observed nucleotide changes.

      (4) We will expand our RNAseq data analyses.

      (5) Expand discussion of nucleotide exchange dynamics and transporter directionality

      (6) Adress remaining concerns with additional analyses, experiments and clarification throughout the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors describe a method to probe both the proteins associated with genomic elements in cells, as well as 3D contacts between sites in chromatin. The approach is interesting and promising, and it is great to see a proximity labeling method like this that can make both proteins and 3D contacts. It utilizes DNA oligomers, which will likely make it a widely adopted method. However, the manuscript over-interprets its successes, which are likely due to the limited appropriate controls, and of any validation experiments. I think the study requires better proteomic controls, and some validation experiments of the "new" proteins and 3D contacts described. In addition, toning down the claims made in the paper would assist those looking to implement one of the various available proximity labeling methods and would make this manuscript more reliable to non-experts.

      Strengths:

      (1) The mapping of 3D contacts for 20 kb regions using proximity labeling is beautiful.

      (2) The use of in situ hybridization will probably improve background and specificity.

      (3) The use of fixed cells should prove enabling and is a strong alternative to similar, living cell methods.

      Weaknesses:

      (1) A major drawback to the experimental approach of this study is the "multiplexed comparisons". Using the mtDNA as a comparator is not a great comparison - there is no reason to think the telomeres/centrosomes would look like mtDNA as a whole. The mito proteome is much less complex. It is going to provide a large number of false positives. The centromere/telomere comparison is ok, if one is interested in what's different between those two repetitive elements.

      We appreciate the reviewers' point here. In fact we selected the mitochondrial DNA as a target for just the reason that the reviewer notes. mtDNA should be spatially distinct from the nuclear targets and allow us to determine if we were in fact seeing spatially distinct proteins at the interorganelle (mtDNA vs. telomeres/centrosomes) and intraorganelle (telomeres vs centromeres) levels.

      But the more realistic use case of this method would be "what is at a specific genomic element"? A purely nuclear-localized control would be needed for that. Or a genomic element that has nothing interesting at it (I do not know of one).

      We have now added two studies in Figure 4 and Figure 5 detailing the use of OMAP to investigate specific genomic elements. In this case the Hox clusters (HOXA and HOXB) and haplotype-specific analysis of X-chromosome inactivation centers in female murine (EY.T4) cells. The controls in these cases are more specific, in line with those suggested by the reviewer as we (1) compare HOXA and HOXB with or without EZH2 inhibition using the same sets of probes and (2) specifically compare the region surrounding the XIC in female cells for the inactive and active X chromosomes.

      You can see this in the label-free work: non-specific, nuclear GO terms are enriched likely due to the random plus non-random labeling in the nucleus. What would a Telo vs general nucleus GSEA look like? (GSEA should be used for quantitative data, no GO). That would provide some specificity. Figures 2G and S4A are encouraging, but a) these proteins are largely sequestered in their respective locations, and b) no validation by an orthogonal method like ChIP or Cut and Run/Tag is used.

      We performed GSEA on the enrichment scores for the label-free proteomics data from the SAINT output in Figure 1D and that several of these proteins (e.g., those highlighted in Figure 2A: TERF1, CENPN, TOM70) have already been extensively validated to co-localize to these locations.

      To the reviewers request for additional validation, we analyzed ChIP-seq data for several proteins to determine if they were enriched surrounding specific loci. In the case of the HoxA/B analysis, we found that HDAC3 and TCF12 were enriched at HOXB compared to HOXA, and SMARCB1 and ZC3H13 were enriched at HOXA compared to HOXB (Figure 4C). HDAC3 and TCF12 ChIP data confirmed increased peak calls at HOXB and SMARCB1 and ZC3H13 ChIP data confirmed increased peak calls at HOXA for these four selected proteins (Figure 4D).

      You can also see this in the enormous number of "enriched" proteins in the supplemental volcano plots. The hypothesis-supporting ones are labeled, but do the authors really believe all of those proteins are specific to the loci being looked at? Maybe compared to mitochondria, but it's hard to believe there are not a lot of false positives in those blue clouds. I believe the authors are more seeing mito vs nucleus + Telo than the stated comparison. For example, if you have no labeling in the nucleus in the control (Figures 1C and 2C) you cannot separate background labeling from specific labeling. Same with mito vs. nuc+Telo. It is not the proper control to say what is specifically at the Telo.

      We agree with the reviewer that compared to mitochondrial targeting, there could be non-specific nuclear comparisons. We note again though that we purposefully stayed away from using the word “specifically” when describing the proteomics work developed here. The reason being that we are not atlasing a large number of targets to define specificity. Instead, we highlight in Figure 2 that we did observe differences in proteins associating with telomeres and mitochondrial DNA. That may be non-specific, and in fact, this is also why we decided to include two nuclear targets to determine what might be specifically enriched. Thus, we compared centromeric and telomeric protein enrichment as determined by OMAP and observed consistent differential enrichment of shelterin proteins at telomeres (Figure 2I) and CENP-A complex members at centromeres (Figure 2J). We could have done the relative comparisons to no-oligo controls, analogous to how CASPEX compared targeted analyses to no-sgRNA controls (PMID: 29735997). However, we found that the mitochondrial targeted samples were generally better as a comparator because (1) we have clear means to validate differences and (2) the local environment around DNA is being labeled.

      I would like to see a Telo vs nuclear control and a Centromere vs nuc control. One could then subtract the background from both experiments, then contrast Telo vs Cent for a proper, rigorous comparison. However, I realize that is a lot of work, so rewriting the manuscript to better and more accurately reflect what was accomplished here, and its limitations, would suffice.

      Assuming the nuclear control was the same, It is unclear how this ratio-of-ratios ([Telo/Ctrl]/[Cent/ctrl]) experiment would be inherently different from the direct comparison between Telo and Centromere. Again, assuming the backgrounds are derived from the same cellular samples. More than likely adding the extra ratios could increase the artifactual variance in the estimates, reducing the power of the comparisons as has been seen in proteomics data using ratio-of-ratio comparisons in the past (Super-SILAC).

      (2) A second major drawback is the lack of validation experiments. References to literature are helpful but do not make up for the lack of validation of a new method claiming new protein-DNA or DNA-DNA interactions. At least a handful of newly described proximal proteins need to be validated by an orthogonal method, like ChIP qPCR, other genomic methods, or gel shifts if they are likely to directly bind DNA. It is ok to have false positives in a challenging assay like this. But it needs to be well and clearly estimated and communicated.

      We appreciate the reviewers' point here. To be clear, we have not made any claims about new proteins at specific loci. Instead we validated that known telomeric and centromeric associating proteins were consistently enriched by DNA OMAP (Figure 2). We also want to emphasize that while valuable, the current paper is not an atlasing paper to define the full and specific proteomes of two genomic loci. We instead show how this method can be used to observe quantitative differences in proteins enriched at certain loci (HOXA/B work, Figure 4) and even between haplotypes (Xi/Xa work, Figure 5).

      (3) The mapping of 3D contacts for 20 kb regions is beautiful. Some added discussion on this method's benefits over HiC-variants would be welcomed.

      We appreciate the reviewers' point here and have added the following text to the discussion: “Additionally, we show that this method is also able to detect DNA-DNA contacts through biotinylation of loop anchors. Our approach functions similarly to 4C[86]. However, our approach of biotin labeling of contacts does not rely on pairwise ligation events. Thus, detection of contacts through DNA O-MAP will vary in the sampling of DNA-DNA contacts in comparison.”

      (4) The study claims this method circumvents the need for transfectable cells. However, the authors go on to describe how they needed tons of cells, now in solution, to get it to work. The intro should be more in line with what was actually accomplished.

      We took the reviewers point and have worked to scale down the DNA OMAP experiments while revising this manuscript. As noted in Figure 5, we have been able to scale this work down to work on plates with ~10x fewer cells than with our initial experiments. This is on top of the initial DNA OMAP work in Figure 1 and 2, as well as our additional work in Figure 4, where we are using 30-60 million cells in solutions which is still 10x less material than previous work (PMID: 29735997). Thus, the newest DNA OMAP platform uses ~100x fewer cells than previous work.

      (5) Comments like "Compared to other repetitive elements in the human genome...." appear to circumvent the fact that this method is still (apparently) largely limited to repetitive elements. Other than Glopro, which did analyze non-repetitive promoter elements, most comparable methods looked at telomeres. So, this isn't quite the advancement you are implying. Plus, the overlap with telomeric proteins and other studies should be addressed. However, that will be challenging due to the controls used here, discussed above.

      As noted above, we have added Figures 4 and 5 to address the reviewer concerns by targeting multiple non-repetitive loci (HOXA and HOXB clusters and a 4.5Mb region straddling X-inactivation center on both the active and inactive X homolog). Targeting the regions around the X-inactivation center shows the potential to perform haplotype-resolved proteome analysis of chromatin interactors.

      For the telomeric protein overlap, we tried to do this specifically in Figure 1F, we agree with the reviewer that the controls used dramatically change the proteins considered enriched. The goal of the network analysis was to show (1) that we identify proteins previously observed in telomere proteomic datasets and (2) that we gain a more complete view of proteins based on capturing more known interacting proteins than many previous methods as was noted for the RNA OMAP platform (PMID: 39468212). For example, we observed enrichment of PRPF40A in the telomeric DNA OMAP data. From the Bioplex interactome, PRPF40A was observed to interact with TERF2IP and TERF2, suggesting that through these interactions PRPF40A may colocalize at telomeres. Similarly, we observed enrichment of SF3A1, SF3B1, and SF3B2. The SF3 proteins are known regulators of telomere maintenance (PMID: 27818134), but have not previously been observed in telomeric proteomics datasets, except now in DNA OMAP.

      We have added the following text to the Results to clarify these points:

      “To benchmark DNA O-MAP, we compared the full set of telomeric proteins to proteins observed in five established telomeric datasets (PICh, C-BERST, CAPLOCUS, CAPTURE, BioID)12,14,16,35,36 (Figure 1F). DNA O-MAP captured both previously observed telomeric interacting proteins (shelterins) as well as telomere associated proteins (ribonucleoproteins). We identified multiple heterogeneous nuclear ribonucleoproteins (hnRNPs) previously annotated as telomere-associated, including HNRNPA1 and HNRNPU. HNRNPA1 has been demonstrated to displace replication protein A (RPA) and directly interact with single-stranded telomeric DNA to regulate telomerase activity37–39. HNRNPU belongs to the telomerase-associated proteome40 where it binds the telomeric G-quadruplex to prevent RPA from recognizing chromosome ends41. We mapped DNA O-MAP enriched telomeric proteins to the BioPlex protein interactome and observed that in addition to capturing proteins from previously observed telomeric datasets (Figure 1F), DNA O-MAP enriched for interactors of previously observed telomeric proteins. Previous data found RBM17 and SNRPA1 at telomeres, and in BioPlex these proteins interact with three SF3 proteins (SF3A1, SF3B1, SF3B2). Though they were not identified in previous telomeric proteome datasets, all three of these SF3 proteins were enriched in the DNA O-MAP telomeric data. Furthermore, through interactions with G-quadruplex binding factors, these SF3 proteins are regulators of telomere maintenance (PMID: 27818134). Taken together, this data supports the effectiveness of DNA O-MAP for sensitively and selectively isolating loci-specific proteomes.”

      Reviewer #2 (Public review):

      Summary

      Liu and MacGann et al. introduce the method DNA O-MAP that uses oligo-based ISH probes to recruit horseradish peroxidase for targeted proximity biotinylation at specific DNA loci. The method's specificity was tested by profiling the proteomic composition at repetitive DNA loci such as telomeres and pericentromeric alpha satellite repeats. In addition, the authors provide proof-of-principle for the capture and mapping of contact frequencies between individual DNA loop anchors.

      Strengths

      Identifying locus-specific proteomes still represents a major technical challenge and remains an outstanding issue (1). Theoretically, this method could benefit from the specificity of ISH probes and be applied to identify proteomes at non-repetitive DNA loci. This method also requires significantly fewer cells than other ISH- or dCas9-based locus-enrichment methods. Another potential advantage to be tested is the lack of cell line engineering that allows its application to primary cell lines or tissue.

      We thank the reviewers for their comments and note that we have followed up on the idea of targeting non-repetitive DNA loci (HOXA and HOXB clusters and a 4.5Mb section of the X chromosome on each homolog) in the revised manuscript (Figures 4 and 5).

      Weaknesses

      The authors indicate that DNA O-MAP is superior to other methods for identifying locus-specific proteomes. Still, no proof exists that this method could uncover proteomes at non-repetitive DNA loci. Also, there is very little validation of novel factors to confirm the superiority of the technique regarding specificity.

      Our primary claim for DNA OMAP is that it requires orders of magnitude fewer cells than previous studies. Based on comments along these lines from both reviewers, we performed DNA OMAP targeting non-repetitive DNA loci (HOXA and HOXB clusters and a 4.5Mb section of the X chromosome on each homolog) in the revised manuscript (Figure 4 and 5). For the X chromosome targeting, we used ~3 million cells per condition with methods that we optimized during revision. When targeting HOXA and HOXA, we were able to identify HDAC3 and TCF12 enrichment at HOXB compared to HOXA as well as ZC3H13 and SMARB1 enrichment at HOXA compared to HOXB, which is consistent with ChIP-seq reads from ENCODE for these proteins (Figure 4C, D). Both the HOXand X chromosome work help to address limitations noted in the Gauchier et al. paper the reviewer notes as both show progress towards overcoming “the major signal-to-noise ratio problem will need to be addressed before they can fully describe the specific composition of single-copy loci”.

      The authors first tested their method's specificity at repetitive telomeric regions, and like other approaches, expected low-abundant telomere-specific proteins were absent (for example, all subunits of the telomerase holoenzyme complex). Detecting known proteins while identifying noncanonical and unexpected protein factors with high confidence could indicate that DNA O-MAP does not fully capture biologically crucial proteins due to insufficient enrichment of locus-specific factors. The newly identified proteins in Figure 1E might still be relevant, but independent validation is missing entirely. In my opinion, the current data cannot be interpreted as successfully describing local protein composition.

      We analyzed ChIP-seq reads for our HOXA and HOXB (Figure 4C,D) which recapitulate our findings for four of our differentially enriched proteins. We also note that with the addition of the nonrepetitive loci (Figures 4 and 5), we have performed DNA OMAP on seven different targets (telomeres, pericentromeres, mitoDNA, HOXA, HOXB, Xi, and Xa) and identified expected targets at each of these. The consistency of these data, which mirrors the consistency of the RNA implementation of OMAP (PMID: 39468212), reinforces that we can successfully enrich local proteomes at genomic loci.

      Finally, the authors could have discussed the limitations of DNA O-MAP and made a fair comparison to other existing methods (2-5). Unlike targeted proximity biotinylation methods, DNA O-MAP requires paraformaldehyde crosslinking, which has several disadvantages. For instance, transient protein-protein interactions may not be efficiently retained on crosslinked chromatin. Similarly, some proteins may not be crosslinked by formaldehyde and thus will be lost during preparation (6).

      Based on this critique we have gone back through the manuscript to improve the fairness of our comparisons and expanded the limitations in our discussion section.

      To the point about fixation, Schmiedeberg et al., which the reviewer references, does describe crosslinking requiring longer interactions (~5 s). Yet, as featured in reviews, many additional studies have found that “it has been possible to perform ChIP on transcription factors whose interactions with chromatin are known from imaging studies to be highly transient” (Review PMID: 26354429). We note similar results in proteomics analysis in Subbotin and Chait that state that the linkage of lysine-based fixatives like formaldehyde and “glutaraldehyde to reactive amines within the cellular milieu were sufficient to preserve even labile and transient interactions (PMID: 25172955).

      (1) Gauchier M, van Mierlo G, Vermeulen M, Dejardin J. Purification and enrichment of specific chromatin loci. Nat Methods. 2020;17(4):380-9.

      (2) Dejardin J, Kingston RE. Purification of proteins associated with specific genomic Loci. Cell. 2009;136(1):175-86.

      (3) Liu X, Zhang Y, Chen Y, Li M, Zhou F, Li K, et al. In Situ Capture of Chromatin Interactions by Biotinylated dCas9. Cell. 2017;170(5):1028-43 e19.

      (4) Villasenor R, Pfaendler R, Ambrosi C, Butz S, Giuliani S, Bryan E, et al. ChromID identifies the protein interactome at chromatin marks. Nat Biotechnol. 2020;38(6):728-36.

      (5) Santos-Barriopedro I, van Mierlo G, Vermeulen M. Off-the-shelf proximity biotinylation for interaction proteomics. Nat Commun. 2021;12(1):5015.

      (6) Schmiedeberg L, Skene P, Deaton A, Bird A. A temporal threshold for formaldehyde crosslinking and fixation. PLoS One. 2009;4(2):e4636.

      Reviewer #3 (Public review):

      Significance of the Findings:

      The study by Liu et al. presents a novel method, DNA-O-MAP, which combines locus-specific hybridisation with proximity biotinylation to isolate specific genomic regions and their associated proteins. The potential significance of this approach lies in its purported ability to target genomic loci with heightened specificity by enabling extensive washing prior to the biotinylation reaction, theoretically improving the signal-to-noise ratio when compared with other methods such as dCas9-based techniques. Should the method prove successful, it could represent a notable advancement in the field of chromatin biology, particularly in establishing the proteomes of individual chromatin regions - an extremely challenging objective that has not yet been comprehensively addressed by existing methodologies.

      Strength of the Evidence:

      The evidence presented by the authors is somewhat mixed, and the robustness of the findings appears to be preliminary at this stage. While certain data indicate that DNA-O-MAP may function effectively for repetitive DNA regions, a number of the claims made in the manuscript are either unsupported or require further substantiation. There are significant concerns about the resolution of the method, with substantial biotinylation signals extending well beyond the intended target regions (megabases around the target), suggesting a lack of specificity and poor resolution, particularly for smaller loci.

      We thank the reviewers for their comments and note that we have followed up on the idea of targeting non-repetitive DNA loci (HOX clusters and part of the X chromosome) in the revised manuscript (Figures 4 and 5).

      Furthermore, comparisons with previous techniques are unfounded since the authors have not provided direct comparisons with the same mass spectrometry (MS) equipment and protocols. Additionally, although the authors assert an advantage in multiplexing, this claim appears overstated, as previous methods could achieve similar outcomes through TMT multiplexing. Therefore, while the method has potential, the evidence requires more rigorous support, comprehensive benchmarking, and further experimental validation to demonstrate the claimed improvements in specificity and practical applicability.

      We have made the comparisons as best as possible. In fact, we found it difficult to find examples of recent implementations of many of these methods. Purchasing the exact mass spectrometers or performing every version of chromatin proteomics would be well beyond the scope of this work. On the other hand, OMAP has already generated data for three manuscripts. We are making the claim that using the instrumentation and methods available to us, we were able to reduce the number of cells required to analyze a given genomic loci. We then applied TMT multiplexing to further improve the throughput and perform replicate analyses. To fully validate that one protein exists at one loci and no other would require exhaustive atlasing of protein-genomic interactions which would be well beyond the scope of this single paper. Similarly, ChIP for every target identified to assess an empirical FDR would be well beyond the scope of this work.

      Recommendations for the authors:

      Reviewing Editor Comments:

      In summary, all three reviewers raised major concerns about the limitations of the method, many of which could be resolved by more precise and transparent language about these limitations. If you choose to resubmit a revised version, you should address questions like: What scale does "individual locus" refer to? At what scale can the method map protein-DNA interactions at individual targeted loci, rather than large repetitive domains? What is the estimated false discovery rate for a set of enriched proteins? The eLife assessment for this version of the manuscript is based on reviewer concerns. Note that this assessment can be updated after receiving a response to reviewer comments.

      Reviewer #1 (Recommendations for the authors):

      (1)The first couple of paragraphs make it sound like your method would exclusively benefit from sample multiplexing with MS-based proteomics. That is a bit misleading. The other stated methods use TMT. They don't use it to compare very different genomic (or compartmental) regions, but there is no reason cberst, glopro or CasID could not.

      A good point and we have updated the manuscript to reflect this. While previous methods generally did not use TMT, they could be adapted to do so and, similar to OMAP, improved by the use of more replicates in their analyses.

      (2) Please make the colors in 1F for the dataset overlap easier to read. 2 and 4+ are too similar.

      We appreciate the comment on making the colors easier to discern. Along these lines we’ve changed the color of “2” to make it easier to distinguish from “4+”.

      (3) Label as many dots as legible in your volcano plots.

      We’ve labeled a number of proteins that are relevant to the discussion in this paper as well as some additional proteins. We feel that additional labeling would detract from the points that we are trying to make in individual figure panels about groups of proteins, rather than general remodeling of all proteins.

      (4) Figure 2E needs a divergent color scheme since it crosses 0. And is it scaled, log-transformed, or both? And compared to what then?

      Figure 2E (heatmap) is z-scaled relative protein abundance measurements based on TMTpro reporter ion signal to noise (“s/n”). We have added additional information to the legend to highlight the information that the reviewer points out here. For the color, we are unsure of what is being asked for, as above 0 is red and below 0 is blue.

      (5) Unclear what you are implying with "...only 1-2 biological replicates." I would omit or clarify.

      Fair point, we have updated the manuscript to omit this section to simplify the introduction.

      (6) H2O2 and biotin phenols might be toxic to living organisms. But so is 4% PFA and ISH. I realize you are trying to justify your new approach but you don't need to do it with exaggerated contrasts. This O-MAP is a great approach and probably more likely for people to adopt it because it's DNA ISH based. Plus, with the clinking, you are likely not displacing proteins via Cas9 landing.

      We appreciate the reviewer’s comments about adoption and lack of protein displacement. We’ve scaled back on the claims and added more about limitations owing to crosslinking and ISH.

      (7) How much genome does the Cent regions take up? You state 500 kb for Telos.

      In the text we delineate how large of a region the PanAlpha probes target “The genome-wide binding profile of the pan-alpha probe closely overlaps with centromeres (Figure S1) and covers approximately 35 Mb of the genome according to in silico predictions.” Additionally, we’ve added Table S4 to summarize target locus sizes for all of the included targets.

      (8) You seem to be underestimating the lysine labeling. Is that after TMT labeling and analysis? If so, you're already ignoring what couldn't be seen. I don't think it's that important but you included it, so please describe clearly why it's an issue and how much of an issue it is. How does that relate to lit values? And it's not just TMTpro, it's any lysine labeler.

      We appreciate the reviewers point about specifying the reasoning and the lack of clarity around overall lysine labeling. That 1.38% is the number of peptides with remainder modifications due to formaldehyde crosslinking. For overall acylation of lysines with TMT labels, we generally expect (and achieve) >97% labeling of lysines with TMT reagents as the Kuster and Carr labs nicely demonstrated across a range of labeling conditions (PMID: 30967486).

      Decrosslinking is a critical step generally for proteomics workflows on fixed or FFPE tissues and thus we sought to explore whether we could achieve sufficiently low residual lysine alkylation to enable protein quantitation by TMTpro reagents (or any lysine labeler, as the reviewer notes). For TMTpro-based methods on peptides, this is less of a concern generally as protease cleavage frees new primary amines at the N-termini of peptides which can be labeled for quantitation. But in part since we are describing a proteomics method on fixed tissues we wanted to share these data and the potential inclusion of residual fixation modifications for readers to potentially take into consideration when performing this method.

      Reviewer #3 (Recommendations for the authors):

      Liu et al. describe an original locus labelling approach that enables the isolation of specific genomic regions and their associated proteins. I have mixed views on this work, which, in my opinion, remains preliminary at this stage. Establishing the proteome of a single chromatin region is one of the most complex challenges in chromatin biology, as extensively discussed in Gauchier et al. (2020). Any breakthrough towards this goal is of significant interest to the community, making this manuscript potentially compelling. Indeed, some data suggest that the method works for repetitive DNA to some extent. However, much of the data is not very convincing, and in the case of small DNA targets, it argues against the use of DNA-O-MAP.

      In contrast to existing methods, DNA-O-MAP combines locus-specific hybridisation in situ (using affordable oligonucleotides) with proximity biotinylation. A major advantage of this strategy over other locus-specific biotinylation methods is the possibility of extensively washing excess or non-specifically hybridised probes before the biotinylation reaction, theoretically limiting biotinylation to the target region and thus significantly enhancing the signal-to-noise ratio. Other methods involving proximity biotinylation, such as targeted dCas9, do not have this capacity, meaning biotinylation occurs not only at the locus where a small fraction of dCas9 molecules is targeted but also around non-bound dCas9 molecules (representing the vast majority of dCas9 expressed in a given cell). This aspect potentially represents an interesting advance.

      We thank the reviewer for their thoughts and critiques, which we hope have in part relieved concerns pertaining to limitation on repetitive elements. To the latter points, we confirmed this with new specificity analysis that showed labeling to be highly specific to a given probe locus (Figure S3).

      Below, I outline the significant issues:

      The manuscript implies that DNA-O-MAP has better sensitivity than earlier techniques like CAPTURE, GLOPRO, or PICh. The authors state that PICh uses one trillion cells (which I doubt is accurate), and other methods require 300 million cells, whereas DNA-O-MAP uses only 60 million cells, suggesting the latter is more feasible. However, these earlier experiments were conducted almost 15 and 6 years ago, when mass spectrometry (MS) sensitivity was considerably lower than that of current instruments. The authors cannot know whether the proteome obtained by previous methods using 60 million cells, but analysed with current MS technology, would yield results inferior to those of DNA-O-MAP. Unless the authors directly compare these methods using the same number of cells and identical MS setups, I find their argument unjustified and misleading.

      Based on the instrumentation listed, we actually do have a good idea of how sensitivity changes may have affected identifications and overall sensitivity. For example, the CASPEX data was collected on an Orbitrap Fusion Lumos, while our data was collected on an Orbitrap Fusion Eclipse. From our work characterizing these two instruments during the Eclipse development (PMID: 32250601), we do actually know that the ion optics improvements boosted sensitivity of the Eclipse used in our work compared to the Lumos by ~50%, meaning if GLOPRO was run on an Eclipse it would still require >200 million cells per replicate for input.

      It is suggested that DNA-O-MAP is capable of 'multiplexing', whereas previous methods are not. This statement is also misleading. As I understand it, the targeted regions do not originate from a common pool of cells. Instead, TMT multiplexing only occurs after each group of cells has been independently labelled (Telo, Centro, Mito, control). Therefore, previous methods could also perform multiplexing with TMT. Moreover, it is unclear how each proteome was compared: one would expect many more proteins from centromeres than from telomeres (I am unsure about the number of mitochondria in these cells) since these regions are significantly larger than telomeres (possibly 10 to 100 times larger?). Have the authors attempted to normalise their proteomics data to the size (concatenated) of each target? This is particularly relevant when comparing histone enrichment at chromatin regions of differing sizes.

      We agree with the reviewers that this was overstated. In fact the GLOPRO paper notes that they performed a MYC analysis with a previous generation of TMT that could multiplex 10 samples. We have amended the manuscript to be more specific in those contexts. As stated in the methods section, “Samples were column normalized for total protein concentration”, to account for the amount of protein and size of the different targets.

      Figure 1C shows streptavidin dots resembling telomeres. To substantiate this claim, simultaneous immunofluorescence with a telomere-specific protein (e.g., TRF1 or TRF2) is required. It is currently unknown whether all or only a subset of telomeres are targeted by DNA-O-MAP, and it is also unclear if some streptavidin foci are non-telomeric. Quantification is needed to indicate the reproducibility of the labelling (the same comment applies to the centromere probes later in the manuscript; an immunofluorescence assay with CENPB would be informative, alongside quantifications).

      We understand the reviewer’s concern about specificity and reproducibility of DNA-O-MAP. To address this we have added analysis showing the efficiency and specificity of our FISH and biotin labeling for Telomere, PanAlpha, and Mitochondria targeting oligos (Figure S3). We found that biotin deposition was highly specific to the intended targets with an average across the three probes of 98% specificity.

      Perhaps more importantly, the authors suggest that it may be possible to enrich proteins that are not necessarily present at the target locus but are instead in spatial proximity (e.g., RNA polymerase I subunits enriched upon centromere targeting). Does this not undermine the purpose of retrieving locus-specific proteomes?

      The goal of DNA OMAP is to identify a local neighborhood of proteins around a specific genomic loci, similar to GLOPRO. As we note in the work presented in Figure 4 and 5 now, these neighborhoods are inherently interesting for comparison of quantitative changes that occur around a genomic locus.

      Possibly related to the previous issue, when DNA-O-MAP is used to assess DNA-DNA interactions, probes covering regions of 20-25 kb are employed. Therefore, one would expect these regions to be significantly biotinylated compared to flanking regions. However, Genome Browser screenshots indicate extensive biotinylation signals spanning several megabases around the 20-25 kb targets. If the method were highly resolutive, the target region would be primarily enriched, with possibly discrete lower enrichment at distant interacting regions. The lack of discrete enrichment suggests poor resolution, likely due to the likely large scale of proximity biotinylation. This compromises the effectiveness of DNA-O-MAP, especially if it is intended to target small loci with complex sequences. Could the authors quantify the absolute number of reads from the target region compared to those from elsewhere in the genome (both megabases around the locus and other chromosomes, where many co-enriched regions seem to exist)? This would provide insights into both enrichment and specificity.

      Thanks for this suggestion, we have included a new Figure S8 to look at normalized read depth as a function of distance from the genomic target. The resolution of DNA OMAP, like all peroxidase mediated proximity labeling methods, is not dependent on the sequence length of the DNA region, but the 30-40nm of physical space around the HRP molecule that is targeted to the genomic loci. 

      Minor Issues:

      (1) Page 3, second paragraph: It is unclear why probes producing a visible signal in situ necessarily translates to their ability to retrieve a specific proteome.

      We have revised the manuscript to de-emphasize the visible signal aspect of probe targeting and re-emphasize our initial point that the number of probes needed to properly target unique regions makes the use of locked nucleic acid probes cost-prohibitive. The basic point though, we and others previously showed with RNA OMAP (PMID: 39468212) and Apex/proximity labeling strategies, the ability to deposit biotin and visualize generally directly translates to recovery of proximally labeled proteins (PMID: 26866790).

      (2) Page 3, last paragraph: "to reach a higher degree of enrichment...": Has it been demonstrated that direct protein biotinylation provides higher enrichment of relevant proteins? Certainly, there is higher enrichment of proteins, but whether they are relevant is another matter.

      Our point here was that the methods using direct protein biotinylation have higher levels of enrichment and thus require less cells than the previously mentioned PICh method, which is why we wrote the following: “In the case of GLoPro, APEX-based proximity labeling enhanced protein detection sensitivity, reducing the input required for each replicate analysis to ~300 million cells—a 10-fold reduction in cell input compared to PICh which used 3 billion cells.”

      Regarding if these proteins are relevant or not, we show enrichment of known proteins that are critical to the function of their occupied genomic region at telomeres and centromeres. Additionally, we’ve made added quantitative comparisons to assess relevance in our analysis of Hox and our targeted region of the X chromosome through comparisons to ChIP data at these regions. The improved enrichment that we’ve established in our initial submission as well as in the updated version also means that we can further scale down the number of cells required.

      (3) Figure 2B is misleading; it appears as though all three regions are targeted in the same cell, suggesting true multiplexing, which, I believe, is not the case.

      To avoid any potential confusion about how the samples were derived we’ve updated this figure panel to show three separate cells, each with a different region being targeted.

      (3) If I understand correctly, the 'no probe' control should primarily retrieve endogenously biotinylated proteins (carboxylases), which are mainly found in mitochondria. Why does the Pearson clustering in Supplementary Figure 2 not place this control proteome closer to the mitochondrial proteome?

      Under the assumption that the ~10 carboxylases are biotinylated at the same levels in all cells, yet the proportion of these carboxylases compared to all enriched proteins for a given target is markedly reduced. Thus, as a proportion of the enriched proteome we note in Figure S4 that mitochondrial DNA OMAP enriches proteins besides the carboxylases. We believe this explains why the ‘no probe’ sample can be clearly separated along PC2 in Figure 2D.

      (4) Was CENPA enriched in the centromere DNA-O-MAP? If not, have the authors scaled up (e.g., with ten times more cells) to see if the local proteome becomes deeper and detects relevant low-abundance proteins like CENPA or HJURP? This would be very informative.

      We did not observe CENPA, and we had originally contemplated the experiment the reviewer suggested, but noted that CENPA has only two tryptic peptides (>7 AA, <35AA), and they are both in the commonly phosphorylated region of the protein. Rather than scale up these experiments, we decided to attempt DNA OMAP on the non-repetitive locus experiments.

      (5) Using a few million cells, I do not see how the starting chromatin amount could range from 0.5 to 7 mg, as shown in Figures 2 and 3. How were these figures calculated? One diploid cell contains approximately 6 pg of DNA/chromatin, which means one billion cells represent about 6 mg of DNA/chromatin (a typical measurement for these methods).

      Thanks to the reviewer for catching this, that should have been the total lysate amount, not chromatin mass. We have corrected Figures 2 and 3.

      (6) Figure S1: There is no indication of the metrics used for the shades of red.

      We have added a gradient legend to depict this.

      (7) What is the purpose of HCl in the experiment?

      HCl treatment was done to reduce autofluorescence for imaging (PMID: 39548245).

      (8) I could not find the MS dataset on the server using the provided accession number (PDX054080).

      Thank you for pointing this out, we have confirmed the dataset is public now and added the new datasets for the Xi/Xa and Hox studies. We also note that the accession should be “PXD054080”

      (9) Why desthiobiotin instead of biotin?

      We have tested both; desthiobiotin was helpful to reduce adsorption to surfaces. Either biotin or desthiobiotin can be used, though, for OMAP.

  2. Mar 2026
    1. Author response:

      The following is the authors’ response to the original reviews

      General Statements

      We are delighted that all reviewers found our manuscript to be a technical advance by providing a much sought after method to arrest budding yeast cells in metaphase of mitosis or both meiotic metaphases. The reviewers also valued our use of this system to make new discoveries in two areas. First, we provided evidence that the spindle checkpoint is intrinsically weaker in meiosis I and showed that this is due to PP1 phosphatase. Second, we determined how the composition and phosphorylation of the kinetochore changes during meiosis, providing key insights into kinetochore function and providing a rich dataset for future studies.

      The reviewers also made some extremely helpful suggestions to improve our manuscript, which we will have now implemented:

      (1) Improvements to the discussion. Following the recommendation of the reviewers recommended we have focused our discussion on the novel findings of the manuscript and drawn out some key points of interest that deserve more attention.

      (2) We added a new Figure 5 to help interpret the mass spectrometry data, to address Reviewer #3, point 4.

      (3) We added a new additional control experiment to address the minor point 1 from reviewer #3. Our experiment to confirm that SynSAC relies on endogenous checkpoint proteins was missing the cell cycle profile of cells where SynSAC was not induced for comparison. We have performed this experiment and the new data is show as part of a new Figure 2.

      (4) We included representative images of spindle morphology as requested by Reviewer #1, point 2 in Figure1.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      These authors have developed a method to induce MI or MII arrest. While this was previously possible in MI, the advantage of the method presented here is it works for MII, and chemically inducible because it is based on a system that is sensitive to the addition of ABA. Depending on when the ABA is added, they achieve a MI or MII delay. The ABA promotes dimerizing fragments of Mps1 and Spc105 that can't bind their chromosomal sites. The evidence that the MI arrest is weaker than the MII arrest is convincing and consistent with published data and indicating the SAC in MI is less robust than MII or mitosis. The authors use this system to find evidence that the weak MI arrest is associated with PP1 binding to Spc105. This is a nice use of the system.

      The remainder of the paper uses the SynSAC system to isolate populations enriched for MI or MII stages and conduct proteomics. This shows a powerful use of the system but more work is needed to validate these results, particularly in normal cells.

      Overall the most significant aspect of this paper is the technical achievement, which is validated by the other experiments. They have developed a system and generated some proteomics data that maybe useful to others when analyzing kinetochore composition at each division. Overall, I have only a few minor suggestions.

      We appreciate the reviewers’ support of our study.

      (1) In wild-type - Pds1 levels are high during M1 and A1, but low in MII. Can the authors comment on this? In line 217, what is meant by "slightly attenuated? Can the authors comment on how anaphase occurs in presence of high Pds1? There is even a low but significant level in MII.

      The higher levels of Pds1 in meiosis I compared to meiosis II has been observed previously using immunofluorescence and live imaging[1–3]. Although the reasons are not completely clear, we speculate that there is insufficient time between the two divisions to re-accumulate Pds1 prior to separase re-activation. We added the following sentence at Line 218: “ In wild-type cells, Pds1 levels are higher in meiosis I than in meiosis II, likely because the interval between the divisions is too short to allow Pds1 reaccumulation [1,2,4]. This pattern was also observed in SynSAC strains in the absence of ABA (Figure 3A).

      We agree “slightly attenuated” was confusing and we have re-worded this sentence to read “However, ABA addition at the time of prophase release resulted in Pds1<sup>securin</sup> stabilisation throughout the time course, consistent with delays in both metaphase I and II”. (Line 225).

      We do not believe that either anaphase I or II occur in the presence of high Pds1. Western blotting represents the amount of Pds1 in the population of cells at a given time point. The time between meiosis I and II is very short even when treated with ABA. For example, in Figure 2B (now Figure 3B), spindle morphology counts show that at 105 minutes, 40% of cells had anaphase I spindles (and will be Pds1 negative), while ~20% had metaphase I and ~20% metaphase II spindles (and will be Pds1 positive). In contrast, due to the better efficiency of the meiosis II arrest, anaphase II hardly occurs at all in these conditions, since anaphase II spindles (and the second nuclear division) are observed at very low frequency (maximum 10%) from 165 minutes onwards. Instead, metaphase II spindles partially or fully breakdown, without undergoing anaphase extension. Taking Pds1 levels from the western blot and the spindle data together leads to the conclusion that at the end of the time-course, these cells are biochemically in metaphase II, but unable to maintain a robust spindle. Spindle collapse is also observed in other situations where meiotic exit fails, and potentially reflects an uncoupling of the cell cycle from the programme governing gamete differentiation[3,5,6]. We re-wrote this section as follows. (Line 222).

      “Note that Pds1 levels do not fully decline in this population-based analysis as the short duration of meiotic stages results in a mixed-stage population. For example, at the anaphase I peak (90 minutes) around 30% of cells remain in prior stages in which Pds1 levels are expected to be high. However, ABA addition at the time of prophase release resulted in Pds1<sup>securin</sup> stabilisation throughout the time course, consistent with delays in both metaphase I and metaphase II. (Figure 3B). Anaphase I spindles nevertheless appeared with delayed kinetics, peaking at ~40% at 105 min. Concurrently, ~40% of cells remained in metaphase I or II and were therefore Pds1-positive, accounting for the persistent Pds1 signal on the western blot. In contrast, anaphase II spindles are observed at low frequency (maximum 10%) from 165 minutes onwards because metaphase II spindles give way to post-meiotic spindles, without undergoing anaphase II extension (Figure 1D).”

      (2) The figures with data characterizing the system are mostly graphs showing time course of MI and MII. There is no cytology, which is a little surprising since the stage is determined by spindle morphology. It would help to see sample sizes (ie. In the Figure legends) and also representative images. It would also be nice to see images comparing the same stage in the SynSAC cells versus normal cells. Are there any differences in the morphology of the spindles or chromosomes when in the SynSAC system?

      We have now included representative images as Figure 1D along with a schematic Figure 1C. This shows that there are no differences in spindle morphology or nuclei (chromosomes cannot be observed at this resolution), except of course the number of cells with a particular spindle morphology at a given time. We added the following text confirming that there is no change in spindle morphology (Line 174). “We scored spindle morphology after anti-tubulin immunofluorescence to determine cell cycle stage (Figure 1C). Prophase, metaphase I, anaphase I, metaphase II, anaphase II and post-meiotic spindles appeared successively over the timecourse in both the absence and presence of ABA (Figure 1D). While SynSAC dimerisation did not alter characteristic spindle morphologies, it changed their distribution over time.”

      The number of cells scored (at least 100 cells per timepoint) is given in the figure legends.

      (3) A possible criticism of this system could be that the SAC signal promoting arrest is not coming from the kinetochore. Are there any possible consequences of this? In vertebrate cells, the RZZ complex streams off the kinetochore. Yeast don't have RZZ but this is an example of something that is SAC dependent and happens at the kinetochore. Can the authors discuss possible limitations such as this? Does the inhibition of the APC effect the native kinetochores? This could be good or bad. A bad possibility is that the cell is behaving as if it is in MII, but the kinetochores have made their microtubule attachments and behave as if in anaphase.

      In our view, the fact that SynSAC does not come from kinetochores is a major advantage as this allows the study of the kinetochore in an unperturbed state. It is also important to note that the canonical checkpoint components are all still present in the SynSAC strains, and perturbations in kinetochore-microtubule interactions would be expected to mount a kinetochore-driven checkpoint response as normal. Indeed, it would be interesting in future work to understand how disrupting kinetochore-microtubule attachments alters kinetochore composition (presumably checkpoint proteins will be recruited) and phosphorylation but this is beyond the scope of this work. In terms of the state at which we are arresting cells – this is a true metaphase because cohesion has not been lost but kinetochore-microtubule attachments have been established. This is evident from the enrichment of microtubule regulators but not checkpoint proteins in the kinetochore purifications from metaphase I and II. While this state is expected to occur only transiently in yeast, since the establishment of proper kinetochore-microtubule attachments triggers anaphase onset, the ability to capture this properly bioriented state will be extremely informative for future studies. We acknowledge however that we cannot completely rule out unwanted effects of the system, as in any synchronisation system, and where possible findings with the system should be backed up with an orthogonal approach. We appreciate the reviewers’ insight in highlighting these interesting discussion points and we have re-written the relevant paragraph in the discussion, starting line 545.

      Reviewer #1 (Significance):

      These authors have developed a method to induce MI or MII arrest. While this was previously possible in MI, the advantage of the method presented here is it works for MII, and chemically inducible because it is based on a system that is sensitive to the addition of ABA. Depending on when the ABA is added, they achieve a MI or MII delay. The ABA promotes dimerizing fragments of Mps1 and Spc105 that can't bind their chromosomal sites. The evidence that the MI arrest is weaker than the MII arrest is convincing and consistent with published data and indicating the SAC in MI is less robust than MII or mitosis. The authors use this system to find evidence that the weak MI arrest is associated with PP1 binding to Spc105. This is a nice use of the system.

      The remainder of the paper uses the SynSAC system to isolate populations enriched for MI or MII stages and conduct proteomics. This shows a powerful use of the system but more work is needed to validate these results, particularly in normal cells.

      Overall the most significant aspect of this paper is the technical achievement, which is validated by the other experiments. They have developed a system and generated some proteomics data that maybe useful to others when analyzing kinetochore composition at each division.

      We appreciate the reviewer’s enthusiasm for our work.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The manuscript submitted by Koch et al. describes a novel approach to collect budding yeast cells in metaphase I or metaphase II by synthetically activating the spinde checkpoint (SAC). The arrest is transient and reversible. This synchronization strategy will be extremely useful for studying meiosis I and meiosis II, and compare the two divisions. The authors characterized this so-named syncSACapproach and could confirm previous observations that the SAC arrest is less efficient in meiosis I than in meiosis II. They found that downregulation of the SAC response through PP1 phosphatase is stronger in meiosis I than in meiosis II. The authors then went on to purify kinetochore-associated proteins from metaphase I and II extracts for proteome and phosphoproteome analysis. Their data will be of significant interest to the cell cycle community (they compared their datasets also to kinetochores purified from cells arrested in prophase I and -with SynSAC in mitosis).

      I have only a couple of minor comments:

      (1) I would add the Suppl Figure 1A to main Figure 1A. What is really exciting here is the arrest in metaphase II, so I don't understand why the authors characterize metaphase I in the main figure, but not metaphase II. But this is only a suggestion.

      Thanks for the suggestion. We agree and have moved the data for both meiosis I and meiosis II to make a new main Figure 2.

      (2) Line 197, the authors state: ...SyncSACinduced a more pronounced delay in metaphase II than in metaphase I. However, line 229 and 240 the auhtors talk about a "longer delay in metaphase <i compared to metaphase II"... this seems to be a mix-up.

      Thank you for pointing this out, this is indeed a typo and we have corrected it.

      (3) The authors describe striking differences for both protein abundance and phosphorylation for key kinetochore associated proteins. I found one very interesting protein that seems to be very abundant and phosphorylated in metaphase I but not metaphase II, namely Sgo1. Do the authors think that Sgo1 is not required in metaphase II anymore? (Top hit in suppl Fig 8D).

      This is indeed an interesting observation, which we plan to investigate as part of another study in the future. Indeed, data from mouse indicates that shugoshin-dependent cohesin deprotection is already absent in meiosis II in mouse oocytes7, though whether this is also true in yeast is not known. Furthermore, this does not rule out other functions of Sgo1 in meiosis II (for example promoting biorientation). We have included a paragraph in the discussion in the section starting line 641.

      Reviewer #2 (Significance):

      The technique described here will be of great interest to the cell cycle community. Furthermore, the authors provide data sets on purified kinetochores of different meiotic stages and compare them to mitosis. This paper will thus be highly cited, for the technique, and also for the application of the technique.

      Reviewer #3 (Evidence, reproducibility and clarity):

      In their manuscript, Koch et al. describe a novel strategy to synchronize cells of the budding yeast Saccharomyces cerevisiae in metaphase I and metaphase II, thereby facilitating comparative analyses between these meiotic stages. This approach, termed SynSAC, adapts a method previously developed in fission yeast and human cells that enables the ectopic induction of a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC components upon addition of the plant hormone abscisic acid (ABA). This is a valuable tool, which has the advantage that induces SAC-dependent inhibition of the anaphase promoting complex without perturbing kinetochores. Furthermore, since the same strategy and yeast strain can be also used to induce a metaphase arrest during mitosis, the methodology developed by Koch et al. enables comparative analyses between mitotic and meiotic cell divisions. To validate their strategy, the authors purified kinetochores from meiotic metaphase I and metaphase II, as well as from mitotic metaphase, and compared their protein composition and phosphorylation profiles. The results are presented clearly and in an organized manner.

      We are grateful to the reviewer for their support.

      Despite the relevance of both the methodology and the comparative analyses, several main issues should be addressed:

      (1) In contrast to the strong metaphase arrest induced by ABA addition in mitosis (Supp. Fig. 2), the SynSAC strategy only promotes a delay in metaphase I and metaphase II as cells progress through meiosis. This delay extends the duration of both meiotic stages, but does not markedly increase the percentage of metaphase I or II cells in the population at a given timepoint of the meiotic time course (Fig. 1C). Therefore, although SynSAC broadens the time window for sample collection, it does not substantially improve differential analyses between stages compared with a standard NDT80 prophase block synchronization experiment. Could a higher ABA concentration or repeated hormone addition improve the tightness of the meiotic metaphase arrest?

      For many purposes the enrichment and extended time for sample collection is sufficient, as we demonstrate here. However, as pointed out by the reviewer below, the system can be improved by use of the 4A-RASA mutations to provide a stronger arrest (see our response below). We did not experiment with higher ABA concentrations or repeated addition since the very robust arrest achieved with the 4A-RASA mutant deemed this unnecessary.

      (2) Unlike the standard SynSAC strategy, introducing mutations that prevent PP1 binding to the SynSAC construct considerably extended the duration of the meiotic metaphase arrests. In particular, mutating PP1 binding sites in both the RVxF (RASA) and the SILK (4A) motifs of the Spc105(1-455)-PYL construct caused a strong metaphase I arrest that persisted until the end of the meiotic time course (Fig. 3A). This stronger and more prolonged 4A-RASA SynSAC arrest would directly address the issue raised above. It is unclear why the authors did not emphasize more this improved system. Indeed, the 4A-RASA SynSAC approach could be presented as the optimal strategy to induce a conditional metaphase arrest in budding yeast meiosis, since it not only adapts but also improves the original methods designed for fission yeast and human cells. Along the same lines, it is surprising that the authors did not exploit the stronger arrest achieved with the 4A-RASA mutant to compare kinetochore composition at meiotic metaphase I and II.

      We agree that the 4A-RASA mutant is the best tool to use for the arrest and going forward this will be our approach. We collected the proteomics data and the data on the SynSAC mutant variants concurrently, so we did not know about the improved arrest at the time the proteomics experiment was done. Because very good arrest was already achieved with the unmutated SynSAC construct, we could not justify repeating the proteomics experiment which is a large amount of work using significant resources. We highlighted the potential of using the 4A-RASA variant more strongly as follows:

      Line 312, Results:

      “These findings also indicate that spc105<sup>(1-455)</sup>-4A-RASA is the preferred SynSAC variant, particularly where metaphase I arrest is the goal.”

      Line 598, Discussion: “Finally, the stronger and more prolonged SynSAC arrest obtained using the PP1 binding site mutant spc105<sup>(1-455)</sup>-4A-RASA prompts its consideration as an alternative tool for future studies, particularly where meiosis I arrest is important. At the time of performing the kinetochore immunoprecipitations, these mutations were not yet available but, as we have demonstrated, wild type SynSAC protein fragments nevertheless yielded sufficiently enriched populations of metaphase I and II cells to allow reliable detection of stage-specific kinetochore proteins and phosphorylations. Going forward, however, we consider SynSAC-4A-RASA to be the optimal tool for inducing metaphase arrests.”

      (3) The results shown in Supp. Fig. 4C are intriguing and merit further discussion. Mitotic growth in ABA suggest that the RASA mutation silences the SynSAC effect, yet this was not observed for the 4A or the double 4A-RASA mutants. Notably, in contrast to mitosis, the SynSAC 4A-RASA mutation leads to a more pronounced metaphase I meiotic delay (Fig. 3A). It is also noteworthy that the RVAF mutation partially restores mitotic growth in ABA. This observation supports, as previously demonstrated in human cells, that Aurora B-mediated phosphorylation of S77 within the RVSF motif is important to prevent PP1 binding to Spc105 in budding yeast as well.

      We agree these are intriguing findings that highlight key differences as to the wiring of the spindle checkpoint in meiosis and mitosis and potential for future studies, however, currently we can only speculate as to the underlying cause. The effect of the RASA mutation in mitosis is unexpected and unexplained. However, the fact that the 4A-RASA mutation causes a stronger delay in meiosis I compared to mitosis can be explained by a greater prominence of PP1 phosphatase in meiosis. Indeed, our data (now Figure 7A) show that the PP1 phosphatase Glc7 and its regulatory subunit Fin1 are highly enriched on kinetochores at all meiotic stages compared to mitosis.

      We agree that the improved growth of the RVAF mutant is intriguing, along with the reduced metaphase I delay, which together point to a role of Aurora B-mediated phosphorylation also in S. cerevisiae, though previous work has not supported such a role [8].

      We have re-written and expanded the paragraph in the discussion related to the mutation of the RVSF motif starting line 564 to reflect these points.

      (4) To demonstrate the applicability of the SynSAC approach, the authors immunoprecipitated the kinetochore protein Dsn1 from cells arrested at different meiotic or mitotic stages, and compared kinetochore composition using data independent acquisition (DIA) mass spectrometry. Quantification and comparative analyses of total and kinetochore protein levels were conducted in parallel for cells expressing either FLAG-tagged or untagged Dsn1 (Supp. Fig. 7A-B). To better detect potential changes, protein abundances were next scaled to Dsn1 levels in each sample (Supp. Fig. 7C-D). However, it is not clear why the authors did not normalize protein abundance in the immunoprecipitations from tagged samples at each stage to the corresponding untagged control, instead of performing a separate analysis. This would be particularly relevant given the high sensitivity of DIA mass spectrometry, which enabled quantification of thousands of proteins. Furthermore, the authors compared protein abundances in tagged-samples from mitotic metaphase and meiotic prophase, metaphase I and metaphase II (Supp. Fig. 7E-F). If protein amounts in each case were not normalized to the untagged controls, as inferred from the text (lines 333 to 338), the observed differences could simply reflect global changes in protein expression at different stages rather than specific differences in protein association to kinetochores.

      While we agree with the reviewer that at first glance, normalising to no tag appears to be the most appropriate normalisation, in practice there is very low background signal in the no tag sample which means that any random fluctuations have a big impact on the final fold change used for normalisation. This approach therefore introduces artefacts into the data rather than improving normalisation.

      To provide reassurance that our kinetochore immunoprecipitations are specific, and that the background (no tag) signal is indeed very low, we have provided a new figure showing the volcanos comparing kinetochore purifications at each stage with their corresponding no tag control (Figure 5).

      It is also important to note that our experiment looks at relative changes of the same protein over time, which we expect to be relatively small in the whole cell lysate. We previously documented proteins that change in abundance in whole cell lysates throughout meiosis9. In this study, we found that relatively few proteins significantly change in abundance. We added a sentence to this effect in the discussion (Line 632). “Although some variation could reflect global changes in protein abundance during meiosis, we previously found that only a few proteins undergo dynamic abundance changes during the meiotic divisions [9], so this is unlikely to fully explain the kinetochore composition differences observed.”

      Our aim in the current study was to understand how the relative composition of the kinetochore changes and for this, we believe that a direct comparison to Dsn1, a central kinetochore protein which we immunoprecipitated is the most appropriate normalisation.

      (5) Despite the large amount of potentially valuable data generated, the manuscript focuses mainly on results that reinforce previously established observations (e.g., premature SAC silencing in meiosis I by PP1, changes in kinetochore composition, etc.). The discussion would benefit from a deeper analysis of novel findings that underscore the broader significance of this study.

      We strongly agree with this point and we have re-framed the discussion to focus on the novel findings, as also raised by the other reviewers and noted above.

      Finally, minor concerns are:

      (1) Meiotic progression in SynSAC strains lacking Mad1, Mad2 or Mad3 is severely affected (Fig. 1D and Supp. Fig. 1), making it difficult to assess whether, as the authors state, the metaphase delays depend on the canonical SAC cascade. In addition, as a general note, graphs displaying meiotic time courses could be improved for clarity (e.g., thinner data lines, addition of axis gridlines and external tick marks, etc.).

      We added the requested data, which is now part of Figure 2. This now clearly shows that mad2 and mad3 mutants have very similar meiotic cell cycle profiles in the SynSAC background whether or not ABA is added. Please note that we removed the mad1 mutant from this analysis as technical difficulties prevented the strain from entering meiosis well.

      We have improved graphs throughout, as suggested: data lines are thinner, axis gridlines and external grid marks are included. We added an arrow to indicate the time of ethanol/ABA addition.

      (2) Spore viability following SynSAC induction in meiosis was used as an indicator that this experimental approach does not disrupt kinetochore function and chromosome segregation. However, this is an indirect measure. Direct monitoring of genome distribution using GFP-tagged chromosomes would have provided more robust evidence. Notably, the SynSAC mad3Δ mutant shows a slight viability defect, which might reflect chromosome segregation defects that are more pronounced in the absence of a functional SAC.

      Spore viability is a much more sensitive way of analysing segregation defects that GFP-labelled chromosomes. This is because GFP labelling allows only a single chromosome to be followed. On the other hand, if any of the 16 chromosomes mis-segregate in a given meiosis this would result in one or more aneuploid spores in the tetrad, which are typically inviable. The fact that spore viability is not significantly different from wild type in this analysis indicates that there are no major chromosome segregation defects in these strains, and we therefore we think this experiment unnecessary.

      (3) It is surprising that, although SAC activity is proposed to be weaker in metaphase I, the levels of CPC/SAC proteins seem to be higher at this stage of meiosis than in metaphase II or mitotic metaphase (Fig. 4A-B).

      We speculate that the challenge in biorienting homologs which are held together by chiasmata, rather than back-to-back kinetochores results in a greater requirement for dynamic error correction in meiosis I. Interestingly, the data with the RASA mutant also point to increased PP1 activity in meiosis I, and we additionally observed increased levels of PP1 (Glc7 and Fin1) on meiotic kinetochores, consistent with the idea that cycles of error correction and silencing are elevated in meiosis I. We have re-written and expanded the discussion section starting line 565 to reflect these points.

      (4) Although a more detailed exploration of kinetochore composition or phosphorylation changes is beyond the scope of the manuscript, some key observations could have been validated experimentally (e.g., enrichment of proteins at kinetochores, phosphorylation events that were identified as specific or enriched at a certain meiotic stage, etc.).

      We agree that this is beyond the scope of the current study but will form the start of future projects from our group, and hopefully others.

      (5) Several typographical errors should be corrected (e.g., "Kinvetochores" in Fig. 4 legend, "250uM ABA" in Supp. Fig. 1 legend, etc.)

      Thank you for pointing these out, they have been corrected and we have carefully proofread the manuscript.

      Reviewer #3 (Significance):

      Koch et al. describe a novel methodology, SynSAC, to synchronize budding yeast cells in metaphase I or metaphase II during meiosis, as well and in mitotic metaphase, thereby enabling differential analyses among these cell division stages. Their approach builds on prior strategies originally developed in fission yeast and human cells models to induce a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC proteins upon addition of abscisic acid (ABA). The results from this manuscript are of special relevance for researchers studying meiosis and using Saccharomyces cerevisiae as a model. Moreover, the differential analysis of the composition and phosphorylation of kinetochores from meiotic metaphase I and metaphase II adds interest for the broader meiosis research community. Finally, regarding my expertise, I am a researcher specialized in the regulation of cell division.

      References

      (1) Salah, S.M., and Nasmyth, K. (2000). Destruction of the securin Pds1p occurs at the onset of anaphase during both meiotic divisions in yeast. Chromosoma 109, 27–34.

      (2) Matos, J., Lipp, J.J., Bogdanova, A., Guillot, S., Okaz, E., Junqueira, M., Shevchenko, A., and Zachariae, W. (2008). Dbf4-dependent CDC7 kinase links DNA replication to the segregation of homologous chromosomes in meiosis I. Cell 135, 662–678.

      (3) Marston, A.L.A.L., Lee, B.H.B.H., and Amon, A. (2003). The Cdc14 phosphatase and the FEAR network control meiotic spindle disassembly and chromosome segregation. Developmental cell 4, 711–726. https://doi.org/10.1016/S1534-5807(03)00130-8.

      (4) Marston, A.L., Lee, B.H., and Amon, A. (2003). The Cdc14 phosphatase and the FEAR network control meiotic spindle disassembly and chromosome segregation. Dev Cell 4, 711–726. https://doi.org/10.1016/s1534-5807(03)00130-8.

      (5) Attner, M.A., and Amon, A. (2012). Control of the mitotic exit network during meiosis. Molecular Biology of the Cell 23, 3122–3132. https://doi.org/10.1091/mbc.E12-03-0235.

      (6) Pablo-Hernando, M.E., Arnaiz-Pita, Y., Nakanishi, H., Dawson, D., del Rey, F., Neiman, A.M., and de Aldana, C.R.V. (2007). Cdc15 Is Required for Spore Morphogenesis Independently of Cdc14 in Saccharomyces cerevisiae. Genetics 177, 281–293. https://doi.org/10.1534/genetics.107.076133.

      (7) El Jailani, S., Cladière, D., Nikalayevich, E., Touati, S.A., Chesnokova, V., Melmed, S., Buffin, E., and Wassmann, K. (2025). Eliminating separase inhibition reveals absence of robust cohesin protection in oocyte metaphase II. EMBO J 44, 5187–5214. https://doi.org/10.1038/s44318-025-00522-0.

      (8) Rosenberg, J.S., Cross, F.R., and Funabiki, H. (2011). KNL1/Spc105 Recruits PP1 to Silence the Spindle Assembly Checkpoint. Current Biology 21, 942–947. https://doi.org/10.1016/j.cub.2011.04.011.

      (9) Koch, L.B., Spanos, C., Kelly, V., Ly, T., and Marston, A.L. (2024). Rewiring of the phosphoproteome executes two meiotic divisions in budding yeast. EMBO J 43, 1351–1383. https://doi.org/10.1038/s44318-024-00059-8.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Taylar Hammond and colleagues identified new regulators of the G1/S transition of the cell cycle. They did so by screening publicly available data from the Cancer Dependency Map and identified FAM53C as a positive regulator of the G1/S transition. Using biochemical assays they then show that FAM53 interacts with the DYRK1A kinase to inhibit its function. They show in RPE1 cells that loss of FAMC53 leads to a DYRK1A + P53-dependent cell cycle arrest. Combined inactivation of FAM53C and DYRK1A in a TP53-null background caused S-phase entry with subsequent apoptosis. Finally the authors assess the effect of FAM53C deletion in a cortical organoid model, and in Fam53c knockout mice. Whereas proliferation of the organoids is indeed inhibited, mice show virtually no phenotype.

      The authors have revised the manuscript, and I respond here point-by-point to indicate which parts of the revision I found compelling, and which parts were less convincing. So the numbering is consistent with the numbering in my first review report.

      (1) The p21 knockdowns are a valuable addition, and the claim that other p53 targets than p21 are involved in the FAMC53 RNAi-mediated arrest is now much more solid. Minor detail: if S4D is a quantification of S4C, it is hard to believe that the quantification was done properly (at least the DYRK1Ai conditions). Perhaps S4C is not the best representative example, or some error was made?

      We appreciate the concern from the Reviewer. As explained in the first round of revisions, we have mostly used an immunoassay based on capillary transfer (WES system), which is very quantitative (much more than classical immunoblot). As for the other WES assays, the panel in S4C is a representation from the signal in the capillary from one of the experiments we performed (in many ways, we should simply not show these representations but readers and reviewers expect them). We agree that this was not visually the most representative, likely because of the saturation of the signal, and we replaced it with another one.

      (2a) I appreciate the decision to remove the cyclin D1 phosphorylation data. A more nuanced model now emerges. It is not clear to me however why the Protein Simple immunoassay was used for experiments with RPE cells, and not the cortical organoids. Even though no direct claims are made based on the phospho-cyclin D data in Figure 5E+G, showing these data suggests that FAM53C deletion increases DYRK1A-mediated cyclin D1 phosphorylation. I find it tricky to show these data, while knowing now that this effect could not be shown in the RPE1 cells.

      The Reviewer raises a valid point. The data we had presented in the first version of the manuscript were strongly suggestive of changes in Cyclin D1 phosphorylation and protein stability but we followed the Reviewer’s advice to remove them from the revised manuscript because the effects were sometimes small. We decided to keep these data in the organoid model because we felt this is a question that many readers would have (how do changes in FAM53C affect Cyclin D levels?). As the Reviewer mentions, we did not draw conclusions about this but we felt and still feel it is important to connect the dots, even if imperfectly, between FAM53C and the cell cycle, and these data in Figure complement the data in Figure 3F. The experiments with RPE-1 cells were mostly performed in the Sage lab with the WES assay while the experiments with organoids were largely performed in the Pasca lab where more ‘classic’ immunoblots are routinely used. More generally, some antibodies work better with one method vs. the other and we often go back and forth between the two.

      (2b) The quantifications of the immunoassays are not convincing. In multiple experiments, the HSP90 levels vary wildly, which indicates big differences in protein loading if HSP90 is a proper loading control. This is for example problematic for the interpretation of figure 3F and S3I. The cyclin D1 "bands" look extremely similar between siCtrl and siFAM53C (Fig S3I), in fact the two series of 6 samples with different dosages of DYRK1Ai look seem an identical repetition of each other. I did not have to option to overlay them, but it would be important to check if a mistake was made here. The cyclin D1 signals aside, the change in cycD1/HSP90 ratios seems to be entirely caused by differences in HSP90 levels. Careful re-analysis of the raw data and more equal loading seem necessary. The same goes (to a lesser extent) for S3J+K.

      As mentioned above, the representation of the fluorescence signal may be important for readers who are used to seeing immunoblot (Western blots), but the quantification is performed on the values directly obtained from the WES system from ProteinSimple. In these experiments, we make sure that the numbers we obtain are in a validated range, allowing us to use the values, even if sometimes the loading is a bit different between lanes. The sensitivity of the WES assay allows for high accuracy in intra-well quantification allowing for accurate inter-well quantification once loading control normalization is completed.

      (2c) the new model in Fig S4L: what do the arrows at the right FAM53C and p53 that merge a point straight towards S-phase mean? They suggest that p53 (and FAM53C) directly promote S-phase progression, but most likely this is not what the authors intended with it.

      Very good point. We were trying to be inclusive of various signaling pathways that may be implicated in the regulation of the cell cycle by this group of proteins. FAM53C does promote S-phase entry (more cycling when FAM53C is overexpressed) but we removed the arrow coming from p53, which is certainly not a positive regulator of cell cycle progression. Thank you for helping us correct this mistake.

      (3) Clear; nicely addressed.

      (4) Thank you for correcting.

      (5) I appreciate that the authors are now more careful to call the IMPC analysis data preliminary. This is acceptable to me, but nevertheless, I suggest the authors to seriously consider taking this part entirely out. The risk of chance finding and the extremely skewed group sizes (as reviewer #2 had pointed out) hamper the credibility of this statistical analysis.

      We appreciate this concern but feel that it is important for the community to be aware of these phenotypes so other investigators either study FAM53C in different genetic contexts or, for example, generate a conditional knockout allele to study more acute effects of FAM53C loss during development and in adult mice. We believe that the text is carefully written and acknowledge the caveats of small sample sizes in some statistical analyses.

      Reviewer #2 (Public review):

      The authors sought to identify new regulators of the G1/S transition by mining the Cancer Dependency Map (DepMap) co-dependency dataset. This analysis successfully identified FAM53C, a poorly characterized protein, as a candidate. The strength of the paper lies in this initial discovery and the subsequent biochemical work convincingly showing that FAM53C can directly interact with the kinase DYRK1A, a known cell cycle regulator.

      The authors then present evidence, primarily from acute siRNA knockdown in RPE-1 cells, that loss of FAM53C induces a strong G1 cell cycle arrest. Their follow-up investigation proposes a model where FAM53C normally inhibits DYRK1A, thereby protecting Cyclin D from degradation and preventing p53 activation, to allow for G1/S progression. The authors have commendably addressed some concerns from the initial review: they have now demonstrated the G1 arrest using two independent siRNAs (an improvement over the initial pool), shown the effect in several additional cancer cell lines (U2OS, A549, HCT-116), and developed a more nuanced model that incorporates p53 activation, which helps to explain some of the complex data.

      However, a central and critical weakness persists. The entire functional model is built upon the very strong G1 arrest phenotype observed in vitro following acute knockdown. This finding is in stark contrast to data from other contexts. As the authors note, the knockout of Fam53c in mice results in minimal phenotypes, and the DepMap data itself suggests the gene is largely non-essential in most cancer cell lines.

      This major discrepancy creates two competing interpretations:

      As the authors suggest, FAM53C has a critical role in the cell cycle, but its loss is rapidly masked by compensatory mechanisms in long-term knockout models (like iPSCs and mice) or in established cancer cell lines.

      The strong acute G1 arrest is an experimental artifact of the siRNA-mediated knockdown, and not a true reflection of FAM53C's primary function.

      The authors' new controls (using two individual siRNAs and showing the arrest is RB-dependent) make an off-target effect less likely, but they do not definitively rule it out. The gold-standard experiment to distinguish between these two possibilities-a rescue of the phenotype using an siRNA-resistant cDNA-has not been performed.

      Because this key control is missing, the foundation of the paper's functional claims is not as solid as it needs to be. While the study provides an interesting and valuable new candidate for the cell cycle field to investigate, readers should be cautious in accepting the strength of FAM53C's role in the G1/S transition until this central discrepancy is definitively resolved.

      We appreciate this concern from the Reviewer. Genetically, FAM53C is linked to a number of genes coding for known regulators of the G1/S transition and its loss of function would be predicted to lead to G1 arrest based on these genetic interactions. As the Reviewer nicely summarizes, we have data in several cell types, including non-cancerous immortalized cells (RPE-1) and several cancer cell lines, that FAM53C acute knock-down leads to a G1 arrest. Our data also indicate that this arrest is RB dependent and p53 independent. Furthermore, genetic knockout of FAM53C in iPSC-derived human cortical organoids results in decreased proliferation. All these elements point to a role for FAM53C in G1/S. We performed some pilot rescue experiments, as suggested by the Reviewer, but these preliminary assays could not identify the right “dose” of FAM53C. We agree that it will be important in future studies to develop better genetic systems in which FAM53C can be manipulated genetically. However, our overexpression experiments show increased proliferation, providing more support for a role of FAM53C at the G1/S transition of the cell cycle.

      Reviewer #3 (Public review):

      Summary:

      In this study Hammond et al. investigated the role of Dual-specificity Tyrosine Phosphorylation regulated Kinase 1A (DYRK1) in G1/S transition. By exploiting Dependency Map portal, they identified a previously unexplored protein FAM53C as potential regulator of G1/S transition. Using RNAi, they confirmed that depletion of FAM53C suppressed proliferation of human RPE1 cells and that this phenotype was dependent on the presence protein RB. In addition, they noted increased level of CDKN1A transcript and p21 protein that could explain G1 arrest of FAM53C-depleted cells but surprisingly, they did not observe activation of other p53 target genes. Proteomic analysis identified DYRK1 as one of the main interactors of FAM53C and the interaction was confirmed in vitro. Further, they showed that purified FAM53C blocked the ability of DYRK1 to phosphorylate cyclin D in vitro although the activity of DYRK1 was likely not inhibited (judging from the modification of FAM53C itself). Instead, it seems more likely that FAM53C competes with cyclin D in this assay. Authors claim that the G1 arrest caused by depletion of FAM53C was rescued by inhibition of DYRK1 but this was true only in cells lacking functional p53. This is quite confusing as DYRK1 inhibition reduced the fraction of G1 cells in p53 wild type cells as well as in p53 knock-outs, suggesting that FAM53C may not be required for regulation of DYRK1 function. Instead of focusing on the impact of FAM53C on cell cycle progression, authors moved towards investigating its potential (and perhaps more complex) roles in differentiation of IPSCs into cortical organoids and in mice. They observed a lower level of proliferating cells in the organoids but if that reflects an increased activity of DYRK1 or if it is just an off-target effect of the genetic manipulation remains unclear. Even less clear is the phenotype in FAM53C knock-out mice. Authors did not observe any significant changes in survival nor in organ development but they noted some behavioral differences. Weather and how these are connected to the rate of cellular proliferation was not explored. In the summary, the study identified previously unknown role of FAM53C in proliferation but failed to explain the mechanism and its physiological relevance at the level of tissues and organism. Although some of the data might be of interest, in current form the data is too preliminary to justify publication.

      Major comments:

      (1) Whole study is based on one siRNA to Fam53C and its specificity was not validated. Level of the knock down was shown only in the first figure and not in the other experiments. The observed phenotypes in the cell cycle progression may be affected by variable knock-down efficiency and/or potential off target effects.

      We fully acknowledge these limitations in our study. First, we agree that the efficiency of the knock-down can be variable across experiments; unfortunately, antibodies against FAM53C are currently still not optimal and immunoassays against this protein have not always been reliable in our hands. It will be important in the future to develop better antibodies for this poorly studied factor. Second, we also agree that the siRNA pool is perhaps not optimal (note that we used a pool, not a single siRNA). We provide data in the manuscript that single siRNAs (from the pool) also arrest cells in G1. Our data also show that this arrest in observed in several cell lines (cancerous and not cancerous), in a p53 independent but RB dependent way. We further note that we also provide data in cortical spheroids derived from CRISPR/Cas9 knockout iPSCs showing a similar inhibition of proliferation, validating our observations in a completely orthogonal system. Finally, overexpression studies support a role for FAM53C at the G1/S transition (i.e., FAM53C overexpression is sufficient to promote proliferation).

      (2) Experiments focusing on the cell cycle progression were done in a single cell line RPE1 that showed a strong sensitivity to FAM53C depletion. In contrast, phenotypes in IPSCs and in mice were only mild suggesting that there might be large differences across various cell types in the expression and function of FAM53C. Therefore, it is important to reproduce the observations in other cell types.

      As mentioned above, we have observed cell cycle arrest in several cancer cell lines (U2OS, A549, HCT-116) and in iPSC-derived organoids. We acknowledge that RPE-1 cells seem most sensitive to the knock-down and, currently, we do not understand why. In the future, it will be critical to gain a better understanding of the cellular/genetic contexts in which FAM53C plays more important roles in the G1/S transition; it will be also critical to understand what mechanisms may compensate for loss of FAM53C in cells, in culture and in vivo.

      (3) Authors state that FAM53C is a direct inhibitor of DYRK1A kinase activity (Line 203), however this model is not supported by the data in Fig 4A. FAM53C seems to be a good substrate of DYRK1 even at high concentrations when phosphorylations of cyclin D is reduced. It rather suggests that DYRK1 is not inhibited by FAM53C but perhaps FAM53C competes with cyclin D. Further, authors should address if the phosphorylation of cyclin D is responsible for the observed cell cycle phenotype. Is this Cyclin D-Thr286 phosphorylation, or are there other sites involved?

      We completely agree with the Reviewer that the functional interactions between FAM53C and DYRK1A will need to be explored further. Our data (and other data from mass spectrometry experiments in other contexts) support a model in which FAM53C binds to DYRK1A. Genetics analyses indicate that FAM53C is antagonistic to DYRK1A function. Our phosphorylation assays show decreased DYRK1A activity when FAM53C is present. Because our data also show that DYRK1A phosphorylates FAM53C, there may be more than one level of functional interaction between the two proteins, including effects by DYRK1A on FAM53C through its phosphorylation activity. We state in the text that our data suggest “that FAM53C may be a competitive substrate and/or an inhibitor of DYRK1A”, and we agree that we cannot provide a stronger conclusion at this point.

      We believe that genetic data from DepMap and our data support a model in which Cyclin D is downstream of FAM53C in its regulation of the G1/S progression. As discussed with Reviewer #1, it has proven challenging to investigate how FAM53C may control the phosphorylation and degradation of Cyclin D. Thr286 is certainly a critical phosphorylation site, and this residue can be phosphorylated by DYRK1A, but whether FAM53C and DYRK1A engage with other residues or domains is not known and should be the focus of future studies.

      (4) At many places, information on statistical tests is missing and SDs are not shown in the plots. For instance, what statistics was used in Fig 4C? Impact of FAM53C on cyclin D phosphorylation does not seem to be significant. In the same experiment, does DYRK1 inhibitor prevent modification of cyclin D?

      We thank the Reviewer for this comment. We made sure in the revised version to mention all the statistical tests used.

      (5) Validation of SM13797 compound in terms of specificity to DYRK1 was not performed.

      We provided tables in Figure S3 that summarize the biochemical characterization of this DYRK1A inhibitor (performed by Biosplice Therapeutics, where this compound was developed)

      (6) A fraction of cells in G1 is a very easy readout but it does not measure progression through the G1 phase. Extension of the S phase or G2 delay would indirectly also result in reduction of the G1 fraction. Instead, authors could measure the dynamics of entry to S phase in cells released from a G1 block or from mitotic shake off.

      This is an interesting point raised by the Reviewer. It is correct that we only performed a more in-depth characterization of cell cycle phenotypes in certain contexts (e.g., cell counting, EdU incorporation) (see Figures 1 and S1). It is possible that different cell types adapt differently to loss or overexpression of FAM53C, and assays to synchronize the cells, including by mitotic shake off, maybe useful in future experiments to further characterize the cell cycle of FAM53C mutant cells.

      Comments to the revised manuscript:

      In the revised version of the manuscript, authors addressed most of the critical points. They now include new data with depletion of FAM53C using single siRNAs that show small but significant enrichment of population of the G1 cells. This G1 arrest is likely caused by a combined effects on induction of p21 expression and decreased levels of cyclin D1. Authors observed that inhibition of DYRK1 rescued cyclin D1 levels in FAM53 depleted cells suggesting that FAM53C may inhibit DYRK1. This possibility is also supported by in vitro experiments. On the other hand, inhibition of DYRK1 did not rescue the G1 arrest upon depletion of FAM53C, suggesting that FAM53C may have also DYRK1-independent role in G1. Functional rescue experiments with cyclin D1 mutants and detection of DYRK1 activity in cells would be necessary to conclusively explain the function of FAM53C in progression through G1 phase but unfortunately these experiments were technically not possible. Knock out of FAM53C in iPSCs and in mice suggest that FAM53C may have additional functions besides the cell cycle control and/or that adaptation may have occurred in these model systems. Overall, the study implicated FAM53C in fine tuning DYRK1 activity in cells that may to some extent influence the progression through G1 phase. In addition, FAM53C may also have DYRK1 and cell cycle independent functions that remain to be addressed by future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      All my minor points (6-11) were addressed adequately. No further comments.

      Reviewer #2 (Recommendations for the authors):

      The paper's conclusions would be substantially strengthened and the primary concern about off-target effects could be definitively resolved by performing one of the following two experiments:

      (1) Perform a rescue experiment. This would involve transfecting RPE-1 cells with an expression vector for an siRNA-resistant FAM53C cDNA (alongside a control vector) and then treating the cells with the FAM53C siRNAs. If the G1 arrest is a true on-target effect, the cells expressing the resistant cDNA should be "rescued" and continue to proliferate, while the control cells arrest. This is the most direct and standard way to validate a phenotype derived from siRNA.

      (2) Use an acute gene deletion approach that bypasses siRNAs entirely. The authors could use a lentiviral gRNA/Cas9 system to induce acute knockout of FAM53C in RPE-1 cells and assess the cell cycle phenotype at an early time point (e.g., 48-72 hours post-infection). This would provide a direct comparison to the acute siRNA knockdown, and if it recapitulates the strong G1 arrest, it would confirm the phenotype is due to FAM53C loss and not an artifact of the RNAi machinery. The current knockout models (iPSC, mice) are stable and long-term, which allows for the compensatory mechanism argument; an acute knockout would be a much stronger control. The authors could then also follow the fate of the cells and determine the nature of the suspected compensatory mechanisms.

      Addressing this central point is critical for the credibility of the proposed G1/S control element.

      As discussed above, the observations of similar phenotypes in four cell lines (RPE-1 cells and three cancer cell lines) using a pool of siRNAs and in cortical organoids derived from iPSCs using a knockout approach strongly support our results. But we agree that our current study has limitations, including the lack of genetic re-introduction of FAM53C in knock-down or mutant cells. We also note that strong genetic evidence points to a role for FAM53C at the G1/S transition. We hope that some of the readers will be excited by FAM53C as an understudied factor with possible critical roles in fundamental cell biology and human diseases, and future studies will continue to investigate its function in cells using additional approaches.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors test the hypotheses, using an effort-exertion and an effort-based decision-making task, while recording brain dynamics with EEG, that the brain processes reward outcomes for effort differentially when they earned for themselves versus others.

      The strengths of this experiment include what appears to be a novel finding of opposite signed effects of effort on the processing of reward outcomes when the recipient is self versus others. Also, the experiment is well-designed, the study seems sufficiently powered, and the data and code are publicly available.

      We thank Reviewer #1 for the affirmative appraisal of our manuscript as well as the thoughtful and insightful comments, which have enabled us to significantly improve the manuscript.

      (1) Inferences rely heavily on the results of mixed effects models which may or may not be properly specified and are not supported by complementary analyses.

      We thank Reviewer #1 for raising this critical issue of model specification. We have re-fitted our mixed-effects models and performed complementary analyses to validate the robustness of our findings. Specifically, we adopted the maximal converging random-effects structure (including random slopes for Recipient, Effort, and Magnitude where feasible) while ensuring model stability (see Responses to Reviewer #1’s Recommendations point 2). Crucially, our primary findings, including the Recipient × Effort and Recipient × Effort × Magnitude interactions, remained robust. Furthermore, additional analyses confirmed that these results were not confounded by factors such as response speed and subjective effort rating (see Responses to Reviewer #1’s Recommendations point 5).

      (2) Also, not all results hang together in a sensible way. For example, participants report feeling less subjective effort, but also more disliking of tasks when they were earning rewards for others versus self. Given that participants took longer to complete tasks when earning effort for others, it is conceivable that participants might have been working less hard for others versus themselves, and this may complicate the interpretation of results.

      We thank Reviewer #1 for this insightful point (which also relates to Reviewer #3’s point 5). In our study, participants were asked to rate three specific dimensions: Effort (“How much effort did you exert to complete each effort condition when earning rewards for yourself [or the other person]?”), Difficulty (“How much difficulty did you perceive in each effort condition when earning rewards for yourself [or the other person]?”), and liking (“How much did you like each effort condition when earning rewards for yourself [or the other person]?”).

      We acknowledge the Reviewer #1’s concern that the lower subjective effort ratings for others seems contradictory to the higher disliking and longer completion times. We propose that in this paradigm, subjective effort ratings are susceptible to demand characteristics and likely captured motivational engagement (e.g., “how hard I tried” or “how willing I was”) rather than perceived task demands. To disentangle these factors, we included a measure of perceived task difficulty, which is anchored in task properties and is less prone to social desirability biases (Harmon-Jones et al., 2020; Wright et al., 1990). We found no differences in perceived difficulty between self- and other-benefiting trials (Figure 2D), suggesting that the task demands were perceived as equivalent across conditions. To examine this interpretation more directly, we analyzed correlations among participants’ ratings of difficulty, effort, and liking. As illustrated in Figure S1, we found no correlation between difficulty and effort ratings. Crucially, liking ratings were negatively correlated with difficulty ratings.

      More importantly, our performance data contradict the interpretation that participants “worked less hard” for others in terms of task completion. While participants took longer to complete tasks for others, they maintained comparable, near-ceiling success rates for self (97%) and other (96%) recipients (b = -0.46, p = 0.632; Supplementary Table S1). This dissociation suggests that although participants were less motivated (e.g., lower subjective ratings, longer completion times, and greater disliking) to work for others, they ultimately exerted the necessary physical effort to achieve successful outcomes. Thus, the results consistently point to a decrease in prosocial motivation (consistent with prosocial apathy) rather than a failure of effort exertion.

      Wright, R. A., Shaw, L. L., & Jones, C. R. (1990). Task demand and cardiovascular response magnitude: Further evidence of the mediating role of success importance. Journal of Personality and Social Psychology, 59(6), 1250-1260. https://doi.org/10.1037/0022-3514.59.6.1250

      Harmon-Jones, E., Willoughby, C., Paul, K., & Harmon-Jones, C. (2020). The effect of perceived effort and perceived control on reward valuation: Using the reward positivity to test a dissonance theory prediction. Biological Psychology, 107910. https://doi.org/10.1016/j.biopsycho.2020.107910

      Reviewer #2 (Public review):

      Measurements of the reward positivity, an electrophysiological component elicited during reward evaluation, have previously been used to understand how self-benefitting effort expenditure influences the processing of rewards. The present study is the first to complement those measurements with electrophysiological reward after-effects of effort expenditure during prosocial acts. The results provide solid evidence that effort adds reward value when the recipient of the reward is the self but discounts reward value when the beneficiary is another individual.

      An important strength of the study is that the amount of effort, the prospective reward, the recipient of the reward, and whether the reward was actually gained or not were parametrically and orthogonally varied. In addition, the researchers examined whether the pattern of results generalized to decisions about future efforts. The sample size (N=40) and mixed-effects regression models are also appropriate for addressing the key research questions. Those conclusions are plausible and adequately supported by statistical analyses.

      We appreciate Reviewer #2’s positive appraisal of our manuscript. We are fortunate to receive your thoughtful and insightful suggestions and have revised the manuscript accordingly.

      (1) Although the obtained results are highly plausible, I am concerned whether the reward positivity (RewP) and P3 were adequately measured. The RewP and P3 were defined as the average voltage values in the time intervals 300-400 ms and 300-440 ms after feedback onset, respectively. So they largely overlapped in time. Although the RewP measure was based on frontocentral electrodes (FC3, FCz, and FC4) and the P3 on posterior electrodes (P3, Pz, and P4), the scalp topographies in Figure 3 show that the RewP effects were larger at the posterior electrodes used for the P3 than at frontocentral electrodes. So there is a concern that the RewP and P3 were not independently measured. This type of problem can often be resolved using a spatiotemporal principal component analysis. My faith in the conclusions drawn would be further strengthened if the researchers extracted separate principal components for the RewP and P3 and performed their statistical analyses on the corresponding factor scores.

      We thank Reviewer #2 for raising this issue. We would like to clarify that these two components were time-locked to different types of feedback and therefore reflect neural responses to distinct stages of the prosocial effort task. Specifically, the P3 was time-locked to performance feedback (the effort-completion cue; e.g., the tick shown in Figure 1B), whereas the RewP was time-locked to reward feedback (e.g., the display of “+0.6”). Thus, despite the numerical similarity in the post-stimulus windows, the components capture neural activity evoked by independent events separated in time, corresponding to the performance monitoring versus reward evaluation stages of the task. To avoid misunderstanding, we have made this distinction more explicit in the revised manuscript, which now reads, “Single-trial RewP amplitude was measured as mean voltage from 300 to 400 ms relative to reward feedback onset (i.e., reward delivery) over frontocentral channels (FC3, FCz, FC4). We also measured the parietal P3 (300–440 ms; averaged across P3, Pz, and P4) in response to performance feedback (i.e., effort completion), given its relationship with motivational salience (Bowyer et al., 2021; Ma et al., 2014)” (page 27, para. 1, lines 2–6).

      Reviewer #3 (Public review):

      This study investigates how effort influences reward evaluation during prosocial behaviour using EEG and experimental tasks manipulating effort and rewards for self and others. Results reveal a dissociable effect: for self-benefitting effort, rewards are evaluated more positively as effort increases, while for other-benefitting effort, rewards are evaluated less positively with higher effort. This dissociation, driven by reward system activation and independent of performance, provides new insights into the neural mechanisms of effort and reward in prosocial contexts.

      This work makes a valuable contribution to the prosocial behaviour literature by addressing areas that previous research has largely overlooked. It highlights the paradoxical effect of effort on reward evaluation and opens new avenues for investigating the mechanisms underlying this phenomenon. The study employs well-established tasks with robust replication in the literature and innovatively incorporates ERPs to examine effort-based prosocial decision-making - an area insufficiently explored in prior work. Moreover, the analyses are rigorous and grounded in established methodologies, further enhancing the study's credibility. These elements collectively underscore the study's significance in advancing our understanding of effort-based decision-making.

      We thank Reviewer #3 for the positive assessment. We are particularly encouraged by the reviewer’s recognition of our novel integration of ERPs to uncover the distinct effects of effort on reward evaluation for self versus others. We have carefully addressed the specific recommendations raised in the subsequent comments to further strengthen the rigor and clarity of the manuscript.

      (1) Incomplete EEG Reporting: The methods indicate that EEG activity was recorded for both tasks; however, the manuscript reports EEG results only for the first task, omitting the decision-making task. If the authors claim a paradoxical effect of effort on self versus other rewards, as revealed by the RewP component, this should also be confirmed with results from the decision-making task. Omitting these findings weakens the overall argument.

      We thank Reviewer #3 for giving us the opportunity to verify the specific roles of our two tasks. The primary aim of our study is to elucidate the neural after-effects of effort exertion on subsequent reward evaluation during prosocial acts. The prosocial effort task was specifically designed for this purpose, as it involves actual effort expenditure followed by reward outcomes. Furthermore, this task uses preset effort-reward combinations, ensuring balanced trial counts and adequate signal-to-noise ratios across conditions, a critical requirement for robust ERP analysis. In contrast, the prosocial decision-making task was included specifically to quantify behavioral preference (i.e., prosocial effort discounting) rather than neural reward processing. Specifically, this task involves choices without immediate effort execution and reward feedback, making it impossible to examine the neural after-effects of effort exertion. However, the decision-making task remains indispensable for our study structure: it provides an independent behavioral phenomenon of prosocial apathy, which allowed us to link individual differences in behavioral motivation to the neural dissociations observed in the prosocial effort tasks (as detailed in our Responses to Reviewer #3’s 2). Thus, the two tasks provide complementary, rather than redundant, insights into the behavioral and neural mechanism of prosocial effort.

      (2) Neural and Behavioural Integration: The neural results should be contrasted with behavioural data both within and between tasks. Specifically, the manuscript could examine whether neural responses predict performance within each task and whether neural and behavioural signals correlate across tasks. This integration would provide a more comprehensive understanding of the mechanisms at play.

      We thank Reviewer #3 for this insightful and helpful suggestion. We agree that linking neural signatures with behavioral patterns is crucial for establishing the functional significance for our ERP findings. Regarding within-task association, it is important to note that the prosocial effort task was designed to require participants to exert fixed, preset levels of physical effort to earn uncertain rewards. This experimental control was necessary to standardize effort exertion across self-benefiting and other benefiting trials, thereby minimizing confounds such as differences in physical or perceived effort prior to the feedback phase. Indeed, the neural after-effects remained after controlling for these behavioral measures (i.e., response speed and self-reported effort; as detailed in responses to Reviewer #1’Recommendations point 5). Furthermore, unlike the prosocial effort task, the decision-making task inherently precludes the examination of the neural after-effects of effort; therefore, within-task association in this task was not possible.

      Given these considerations, we focused on the cross-task association. We examined whether the neural after-effects of effort (indexed by the RewP) in the prosocial effort task were modulated by individual differences in effort discounting. We used the K value estimated from the prosocial decision-making task as the index of effort discounting. We entered the K value (log-transformed and z-scored) as a continuous predictor into the mixed-effects models of RewP amplitudes. The full regression estimates for the model are presented in Table S1 (left).

      We observed a significant four-way interaction among recipient, effort, magnitude, and K value (b = 0.58, p = 0.013). To decompose this complex interaction, we performed simple slopes analyses separately for self- and other-benefiting trials at high and low levels of reward magnitude and discounting rate (±1 SD). As shown in Figure S2, for self-benefiting trials, the effort-enhancement effect on the RewP was significant only for participants with high discounting rates at low reward magnitude (b = 1.02, 95% CI = [0.22, 1.82], p = 0.012). In contrast, participants with low discounting rates exhibited no significant effort effect (b = -0.37, 95% CI = [-0.89, 0.15], p = 0.159). At high reward magnitude, simple slopes analyses detected no significant effort effects for either high (b = 0.35, 95% CI = [-0.44, 1.14], p = 0.383) or low (b = 0.45, 95% CI = [-0.07, 0.97], p = 0.093) discounting individuals. These findings strongly support the cognitive dissonance account (Aronson & Mills, 1959): those who find effort most aversive are most compelled to inflate the value of small rewards to justify their exertion. For these individuals, the completion of a costly action for a small reward may trigger a stronger internal justification effect, resulting in an amplified neural reward response.

      For other-benefiting trials, participants with low discounting rates exhibited a significant effort-discounting effect at high reward magnitude (b = -0.97, 95% CI = [-1.74, -0.20], p = 0.014). In contrast, no significant effort effects were observed for participants with high discounting rates at either high (b = -0.45, 95% CI = [-0.97, 0.08], p = 0.098) or low (b = -0.16, 95% CI = [-0.69, 0.38], p = 0.564) reward magnitudes, nor for participants with low discounting rates at low reward magnitude (b = 0.14, 95% CI = [-0.64, 0.92], p = 0.729). These results suggest that the justification mechanism observed for self-benefiting effort appears absent for other-benefiting effort. Instead, we observed a persistent effort discounting before, during, and after effort expenditure, which was most pronounced in individuals with low effort sensitivity (low K) when reward magnitude was high. This seemingly paradoxical pattern might be interpreted through the lens of disadvantageous inequity aversion (Fehr & Schmidt, 1999). Specifically, the combination of high personal effort and high monetary reward for another person creates a salient disparity between the participant’s incurred cost and the recipient’s gain. Although low-K individuals are behaviorally willing to tolerate this cost, their neural valuation system may nonetheless track the “unfairness” of this asymmetry, thereby attenuating the neural reward signal (Tricomi et al., 2010). These insights suggest that facilitating prosocial behavior may require not just lowering costs, but potentially framing outcomes to trigger the effort justification mechanisms that drive the effort paradox observed in self-benefiting acts (Inzlicht & Campbell, 2022).

      To confirm this four-way interaction, we also replaced the high-effort choice proportions in the decision-making task and observed a similar four-way interaction among recipient, effort, magnitude, and high-effort choice proportions (b = -0.58, p = 0.014; see Table S1 for detailed regression estimates). Together, this cross-task analysis not only provides a more comprehensive understanding of the mechanisms at play but also justifies the inclusion of the prosocial decision-making task. We sincerely thank Reviewer #3’ for this valuable suggestion, which has significantly strengthened our manuscript. We have included this analysis (page 16, para. 2; page 17, paras. 1–2) and discussed the results (page 20, para. 2, lines 10–15; page 20, para. 3; page 21, para. 1, lines 1–8) in the revised manuscript.

      Aronson, E., & Mills, J. (1959). The effect of severity of initiation on liking for a group. The Journal of Abnormal and Social Psychology, 59(2), 177-181. https://doi.org/10.1037/h0047195

      Fehr, E., & Schmidt, K. M. (1999). A theory of fairness, competition, and cooperation. The Quarterly Journal of Economics, 114(3), 817-868. http://www.jstor.org/stable/2586885

      Tricomi, E., Rangel, A., Camerer, C. F., & O'Doherty, J. P. (2010). Neural evidence for inequality-averse social preferences. Nature, 463(7284), 1089-1091. https://doi.org/10.1038/nature08785

      (3) Success Rate and Model Structure: The manuscript does not clearly report the success rate in the prosocial effort task. If success rates are low, risk aversion could confound the results. Additionally, it is unclear whether the models accounted for successful versus unsuccessful trials or whether success was included as a covariate. If this information is present, it needs to be explicitly clarified. The exclusion criteria for unsuccessful trials in both tasks should also be detailed. Moreover, the decision to exclude electrodes as independent variables in the models warrants an explanation.

      We appreciate the opportunity to clarify these points. In the revised manuscript, we have now explicitly reported the descriptive statistics and the results of a mixed-effects logistic model on response success in the revised manuscript (page 8, para. 1, lines 2–4; Supplementary Table S1). Participants achieved similarly high success rates in both self (M = 97%) and other trials (M = 96%; Figure S3). As shown in Table S2, success rates decreased as effort increased (b = -4.77, p < 0.001). However, no other effects reached significance (ps > 0.245). These near-ceiling success rates indicate strong task engagement and effectively rule out risk aversion as a potential confound.

      Regarding model structure, we excluded unsuccessful trials from statistical analyses because they were rare and distributed equally across conditions. Given the near-ceiling performance, we did not include success rate as a covariate, as it offers limited variance.

      Finally, we did not include electrodes as an independent variable because our hypotheses focused on condition effects rather than topographic differences. Following established research (e.g., Krigolson, 2018; Proudfit, 2015), we averaged RewP amplitudes across a frontocentral cluster (FC3, FCz, and FC4) and P3 amplitudes across a parietal cluster (P3, Pz, and P4), where activity is typically maximal. Averaging across these theoretically grounded clusters improves the signal-to-noise ratio and provides more reliable estimates of the underlying components. We have explicitly included this rationale in the revised manuscript, which reads, “Data were averaged across the selected electrode clusters to improve signal-to-noise ratio and reliability” (page 27, para. 1, lines 9–10).

      Proudfit, G. H. (2015). The reward positivity: From basic research on reward to a biomarker for depression. Psychophysiology, 52(4), 449-459. https://doi.org/10.1111/psyp.12370

      Krigolson, O. E. (2018). Event-related brain potentials and the study of reward processing: Methodological considerations. Int J Psychophysiol, 132(Pt B), 175-183. https://doi.org/10.1016/j.ijpsycho.2017.11.007

      (4) Prosocial Decision Computational Modelling: The prosocial decision task largely replicates prior behavioural findings but misses the opportunity to directly test the hypotheses derived from neural data in the prosocial effort task. If the authors propose a paradoxical effect of effort on self-rewards and an inverse effect for prosocial effort, this could be formalised in a computational model. A model comparison could evaluate the proposed mechanism against alternative theories, incorporating the complex interplay of effort and reward for self and others. Furthermore, these parameters should be correlated with neural signals, adding a critical layer of evidence to the claims. As it is, the inclusion of the prosocial decision task seems irrelevant.

      We thank Reviewer #3 for this thoughtful suggestion regarding the value of computational modelling. We fully agree that formalizing mechanisms is crucial, but we would like to clarify why a computational model of decision-making cannot directly capture the paradoxical after-effects observed in our neural data. The paradoxical after-effect of effort exertion we report refers to experienced utility (i.e., how prior costs modulate the hedonic consumption of a reward), whereas the decision task measures decision utility (i.e., how prospective costs and benefits are integrated to guide choice). We included the prosocial decision task to establish a behavioral baseline and replicate the well-documented phenomenon of prosocial apathy. Consistent with prior work (e.g., Lockwood et al., 2017; Lockwood et al., 2022), our data show that at the decision stage (ex-ante), effort functions as a universal cost: participants discounted rewards for both self and others, differing only quantitatively (steeper discounting for others). It is only after effort is exerted (ex-post) that the pattern reverses: effort is valued for self but remains costly for others, representing a qualitative shift. Crucially, incorporating a "paradoxical valuation" parameter (i.e., effort as a reward) into our decision model would mathematically contradict the behavioral reality. Since participants actively avoided high-effort options, a model assuming effort adds value might fail to fit the choice data. The theoretical novelty of our study lies precisely in this temporal dissociation: whereas self-benefiting effort paradoxically enhances reward valuation, other-benefiting effort induces a persistent reward devaluation.

      To address the reviewer’s interest in bridging these two domains, we examined whether these distinct stages are linked at the level of individual differences. We hypothesized that an individual’s sensitivity to prospective effort cost (discounting rate K) might modulate their susceptibility to the retrospective neural after-effect. As detailed in our Responses to Reviewer #3’s point 2, we found that for self-benefiting trials, high-discounting individuals showed an effort-enhancement effect on the RewP at low reward magnitude, while for other-benefiting trials, low-discounting individuals exhibited effort-discounting effects at high reward magnitude. We sincerely thank Reviewer #3’ for this valuable suggestion, which has successfully correlated the two tasks and facilitated our understanding of the mechanisms at play.

      Lockwood, P. L., Hamonet, M., Zhang, S. H., Ratnavel, A., Salmony, F. U., Husain, M., & Apps, M. A. J. (2017). Prosocial apathy for helping others when effort is required. Nat Hum Behav, 1(7), 0131. https://doi.org/10.1038/s41562-017-0131.

      Lockwood, P. L., Wittmann, M. K., Nili, H., Matsumoto-Ryan, M., Abdurahman, A., Cutler, J., Husain, M., & Apps, M. A. J. (2022). Distinct neural representations for prosocial and self-benefiting effort. Curr Biol, 32(19), 4172-4185 e4177. https://doi.org/10.1016/j.cub.2022.08.010.

      (5) Contradiction Between Effort Perception and Neural Results: Participants reported effort as less effortful in the prosocial condition compared to the self condition, which seems contradictory to the neural findings and the authors' interpretation. If effort has a discounting effect on rewards for others, one might expect it to feel more effortful. How do the authors reconcile these results? Additionally, the relationship between behavioural data and neural responses should be examined to clarify these inconsistencies.

      This point aligns with the issues raised in Reviewer #1’s point 2. We acknowledge the apparent discrepancy between lower reported effort in the prosocial condition and the neural discounting effect. As detailed in our Responses to Reviewer #1’s point 2, we reconcile this by proposing that subjective effort ratings in this paradigm likely reflect motivational engagement (e.g., “how hard I tried” or “how willing I was”) rather than perceived task demands. Under this interpretation, the lower effort ratings for others reflect a withdrawal of engagement (consistent with prosocial apathy), which conceptually aligns with, rather than contradicts, the neural discounting effect. To validate this, we contrasted effort ratings with difficulty ratings (a more reliable index of objective demand). Our correlational analysis revealed no significant relationship between difficulty and effort ratings (r = -0.21, p = 0.196), suggesting that they capture distinct constructs. Furthermore, liking ratings were negatively correlated with difficulty ratings (r = -0.43, p = 0.011) but not with effort ratings (r = 0.32, p = 0.061), further dissociating the two measures. Crucially, as detailed in our Responses to Reviewer #1’s Recommendations point 5, our RewP effects remained significant even after controlling for individual effort ratings. This demonstrates that the neural effort-discounting effect for others is a physiological signature that operates independently of the subjective report bias.

      (6) Necessary Revisions to Manuscript: If the authors address the issues above, corresponding updates to the introduction and discussion sections could strengthen the narrative and align the manuscript with the additional analyses.

      We thank Reviewer #3 for the above insightful and helpful comments. We have carefully addressed these issues raised above and have updated the manuscript accordingly, including abstract, introduction, result, and discussion sections.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) The two biggest concerns I have are

      - Whether the mixed-effect models are properly specified, and

      - Whether the main interaction between the Recipient and effort on the reward positivity (RewP) reflects different levels of effort exertion when working for self versus others.

      We thank Reviewer #1 for identifying these two critical issues. We have carefully considered these points and conducted additional analyses to address them. Below, we provide a detailed response to each concern, explaining how we have improved the model specification and ruled out alternative interpretations regarding effort exertion.

      (2) On the first point, I noticed that the authors selectively excluded random effects for Effort and Magnitude when regressing RewP on Effort, Magnitude, Recipient, and Valence. This is important because the key result in the paper is a fixed effect two-way interaction between Recipient and Effort and a three-way interaction between Recipient, Effort, and Magnitude. It is not clear that these results will remain significant when Effort and Magnitude are included as random effects in the model. Thus the authors should justify their exclusion as random effects, and/or show that the results don't depend on including those random effects in the model. The same logic applies to the specification of other mixed effects models (e.g. the effect of Magnitude in the model predicting RTs).

      We thank Reviewer #1 for raising this important methodological point. We fully agree that including random slopes wherever possible reduces Type 1 error rates and yields more conservative tests of fixed effects. In our analyses, we determined the random effects structure for each model using singular value decomposition (SVD). Specifically, we began with a maximal model that included by-participant random slopes for all main effects and interactions as well as a participant-level random intercept. When the model failed to converge or yielded a singular fit, we applied SVD to identify redundant dimensions (i.e., components explaining zero variance) and iteratively removed these terms until convergence was achieved. This procedure allowed us to retain the maximal converging random-effects structure while ensuring model stability. We have clarified this procedure in the revised manuscript as follows, “For each model, we fitted the maximal random-effects structure and, when the model was overparameterized, used singular value decomposition to simplify the random-effects structure until the model converged” (page 28, para. 1, lines 5–8).

      Regarding the RewP model, including all variables (i.e., Recipient, Effort, Magnitude, and Valence) in the random-effects structure resulted in a boundary (singular) fit. Examination of the variance-covariance structure of the random effects revealed that the random slopes for Valence and Magnitude were perfectly negatively correlated (r = -1.00), indicating severe overparameterization. In our original submission, we removed the random slopes for Effort and Magnitude because the SVD analysis indicated redundant dimensions in the model structure.

      However, we agree with the Reviewer that retaining slopes for variables involved in key interactions is crucial. Therefore, we re-evaluated the model strategy: instead of removing Effort and Magnitude, we removed the random slope for Valence (which was the primary source of the perfect correlation). This modification successfully resolved the singularity while allowing us to retain the random slopes for the critical variables (i.e., Effort and Magnitude).

      Critically, this updated model yielded the same pattern of results as our original submission: the two-way interaction between Recipient and Effort and the three-way interaction between Recipient, Effort, and Magnitude remained significant (see Table S3). As expected, including the random slopes for Effort and Magnitude yielded a more conservative test of the fixed effects. While the critical three-way interaction remained significant (p = 0.019), the simple slope for the Self condition at high reward magnitude shifted slightly from significant (p = 0.041) to marginally significant (p = 0.056). However, the effect size remained largely unchanged (b = 0.42 vs. original b = 0.43), and the dissociation pattern, where self-benefiting trials show a positive trend while other-benefiting trials show a significant negative slope, remains robust and is statistically supported by the significant interaction. We have adopted this updated model in the revised manuscript and updated the relevant sections accordingly. Finally, note that we have removed the RewP table from the Supplementary Materials because the RewP model results are now presented as a figure in the main text (as suggested by Reviewer #1’s Recommendations point 3).

      We have also carefully verified the random effects structures for other mixed-effects models, including the RT and Performance-P3 models in the prosocial effort task, as well as the decision time and decision choice models in the prosocial decision-making task. The updated information is detailed as follows:

      Regarding the RT model, we replaced it with a more reasonable model of response speed (button presses per second), as suggested by Reviewer #1 (see our responses to Reviewer #1’s Recommendations point 4 for details).

      Regarding the performance-P3 model, the random-effects structure could only support Effort, as in our original submission; thus, the results remain unchanged.

      Regarding the decision time model, we have updated our results to include the quadratic effort term, as suggested by Reviewer #1 (see our responses to Reviewer #1’s Recommendations point 6 for details).

      Regarding the decision choice model, we included Recipient, Effort, and Magnitude in the random-effects structure. As shown in Table S4, the results remain largely consistent with the original model, except for a newly significant interaction between effort and magnitude. Follow-up simple slopes analyses revealed that the discounted effect of effort was more pronounced at low reward magnitude (M − 1SD: b = -2.69, 95% CI = [-3.09, -2.29], p < 0.001) than at high reward magnitude (M + 1SD: b = -2.38, 95% CI = [-2.82, -1.94],p < 0.001).

      In summary, we have improved the model specification following Reviewer #1’s suggestion. Crucially, the results remain qualitatively consistent with our original findings. We have updated the Results section, figures (Figures 2, 4, and 5), and OSF documents (including a new R Markdown file and an HTML output file detailing the final results) to reflect these analyses. Additionally, we have explicitly stated the method used for calculating p-values in the mixed-effects models (page 28, para. 1, lines 8–10), which was omitted in the original submission.

      (3) Regarding the mixed models, it would also be good to show a graphical depiction summarizing key effects (e.g. the Recipient by Effort interaction on RewP) rather than just showing the predictions of the fitted mixed effects models.

      This point is well-taken. Please see Figure S4, which visualizes the key effects and has now been included in the revised manuscript as Figure 4A.

      (4) Finally, regarding the mixed effect models of RTs - given the common finding that RTs are not normally distributed, the Authors might be better off regressing 1/RT (interpreted as speed rather than latency) since 1/RT will often make distributions less asymmetric and heavy-tailed.

      We thank Reviewer #1 for this helpful suggestion regarding data distribution. In our original analysis, the dependent variable was “completion time” (i.e., the latency to complete the required button presses with the 6-s window). We agree that these raw latency data exhibited characteristic non-normality (see Figure S5, Left). Based on Reviewer #1’s suggestion, we adopted “response speed” (calculated as button presses per second) as the dependent variable. As expected, this transformation substantially improved the normality of the distribution (see Figure S5, Right). We have refitted the mixed-effects model using this speed metric. Critically, the results largely replicated the patterns observed in our original model, with the exception that the main effect of reward magnitude did not reach significance in the speed model (see Table 5). Given the superior distributional properties of the speed metric, we have replaced the original latency analysis with the response speed model in the revised manuscript. We have updated the Results section (page 8, para. 1, lines 4–9) and Figures 2B–C accordingly.

      (5) Regarding the level of effort exerted, there are two reasons to suspect that participants exerted less for others versus themselves. The first is that they were slower to complete the button pressing for others versus themselves. The second is that they reported paradoxically less subjective effort for others versus self (paradoxical because they also reported liking the task less for others versus self). The explanation for both may be that they exerted less effort for others versus self and this has important implications for interpreting the main effects. If they exerted less effort for others, this may partly account for the key Recipient:Effort and Recipient:Effort:Magnitude interactions in the mixed effects regression of RewP. Do either median effort durations or self-reported effort predict the magnitude of the Recipient:Effort and Recipient:Effort:Magnitude interactions (if these were included as random effects)? If so, that would provide evidence supporting this story. Alternatively, if median durations or self-reported effort were included as covariates, do these interactions still obtain? In any case, the Authors should include caveats regarding this potential explanation of the self-versus-other interactions with effort and magnitude on the RewP" (or explain why this can not explain the interactions).

      We thank Reviewer #1 for raising this important interpretational issue. We acknowledge the concern that differences in physical exertion or perceived effort could potentially confound the neural findings. However, we argue that the observed RewP effects are not driven by these factors for several reasons.

      First, the prosocial effort task enforced fixed effort thresholds (10%–90% of their maximum effort level) across self-benefiting and other-benefiting trials. Importantly, participants achieved ceiling-level success rates that were highly comparable between self-benefiting (97%) and other-benefiting (96%) trials, indicating that they successfully exerted the required effort across conditions.

      Second, regarding the slower response speed for others (we used response speed instead of completion time, as the former is more suitable for statistical analysis; see details in Responses to Reviewer #1’s Recommendations point 4), we interpret this as a reduction in motivation rather than a reduction in the amount of effort exerted. Similarly, as detailed in our Responses to Reviewer#1’s point 2, subjective effort ratings in this paradigm appear to be influenced by demand characteristics and do not reliably track physical exertion. For instance, liking ratings were associated with difficulty (r = -0.43, p = 0.011) instead of effort (r = 0.32, p = 0.061) ratings.

      To empirically rule out the possibility that these behavioral differences account for the neural effect, we followed the reviewer’s suggestion and re-ran the mixed-effects model predicting RewP amplitudes with trial-by-trial response speed and subjective effort rating included as covariates. These control analyses revealed that neither response speed (b = -0.07, p = 0.614) nor self-reported effort (b = 0.10, p = 0.186) significantly predicted RewP amplitudes (see Table S6). Most importantly, the key interactions of interest (Recipient × Effort and Recipient × Effort × Magnitude) remained significant and virtually unchanged. These findings suggest that the observed neural after-effects of prosocial effort are not driven by variations in motor execution or perceived effort.

      Minor comments:

      (6) In Figure 5A a quadratic effect (not a linear effect) seems fairly obvious in decision times as a function of effort level. This makes sense given that participants are close to indifference, on average, around the 50-70% effort level. I recommend fitting a model that has a quadratic predictor and not just a linear predictor when regression decision times on effort levels.

      We thank Reviewer #1 for this insightful suggestion. We agree that decision times likely track decision conflict, which typically peaks near indifference points (e.g., moderate effort levels). Accordingly, we reanalyzed the decision time data using a mixed-effects model that included both linear and quadratic terms for effort. As detailed in Table S7, this analysis revealed a significant quadratic main effect of effort, which was further qualified by a significant interaction between the quadratic effort term and reward magnitude. Decomposition of this interaction (Figure S6) revealed that the quadratic effort effect was more pronounced at low reward magnitude (M − 1SD: b = -160.10, 95% CI = [-218.30, -101.90], p < 0.001) than at high reward magnitude (M + 1SD: b = -99.50, 95% CI = [-157.60, -41.40], p = 0.001). However, we found no significant interactions involving the quadratic effort term and recipient. We have updated the Results section (page 13, para. 2; page 14, para. 1) and Figures 5A–B (right panel) to reflect these findings.

      (7) The distinction between the effort and decision-making tasks wasn't super clear from the main text. A sentence early on in the results section could be useful for readers' understanding.

      This point is well taken. In the revised manuscript, we have clarified this distinction at the beginning of the Results section (page 6, para. 2, lines 1–10). In addition, we have explicitly indicated the corresponding task within each subsection heading in the Results:

      “2.1 Investing effort for others is less motivating than for self in the prosocial effort task” (page 7)

      “2.2 Effort adds reward value for self but discounts reward value for others in the prosocial effort task” (page 9)

      “2.3 Reward is devalued by effort to a higher degree for others than for self in the prosocial decision-making task” (page 13)

      (8) To what does "three trials" refer to on lines 143-144?

      Thank you for raising this point. Participants completed three trials in which they were asked to press a button as rapidly as possible with their non-dominant pinky finger for 6000 ms. The maximum effort level was operationalized as the average button-press count across the three trials. To improve clarity, we have also provided more detailed description in the Results section, which reads: “The mean maximum effort level (i.e., the average button-press count across three 6000-ms trials; see Procedure for details) ….” (page 7, para. 1, lines 1–2).

      (9) It is unclear how the authors select their time windows for ERP analyses.

      We thank Reviewer #1 for this comment. Measurement parameters (i.e., time windows and channel sites) were determined based on the grand-averaged ERP waveforms and topographic maps collapsed across all conditions. This procedure is orthogonal to the conditions of interest and prevents bias in the selection of measurement windows and channels, consistent with the “orthogonal selection approach” (Luck & Gaspelin, 2017). We have clarified this point in the revised manuscript, which now reads, “Measurement parameters (time windows and channel sites) were determined from the grand-averaged ERP waveforms and topographic maps collapsed across all conditions, which was thus orthogonal to the conditions of interest (Luck & Gaspelin, 2017)” (page 27, para. 1, lines 6–9).

      Luck, S., & Gaspelin, N. (2017). How to get statistically significant effects in any ERP experiment (and why you shouldn't). Psychophysiology, 54(1), 146-157.

      (10) There are a few typos throughout. For example, Line 124 should read "other half benefitted...", Line 127 should read "interest at each effort level...", "following" on Line 369, and Supplemental table titles incorrectly spell the word "Results".

      We thank Reviewer #1 for catching these errors. We have corrected all the specific typos noted (page 6, para. 2, lines 11 and 15; page 22, para. 3, line 2; Supplementary Table S2). Furthermore, we have conducted a thorough proofreading of the entire text and supplementary materials to ensure linguistic accuracy and consistency throughout the manuscript.

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1) Lines 84-86. "The RewP ... has its neural sources in the anterior cingulate cortex (Gehring & Willoughby, 2002) and ventral striatum (Foti et al., 2011)." This is a better reference for the ACC source: https://pubmed.ncbi.nlm.nih.gov/23973408/. And perhaps remove the reference to the ventral striatum; most people would agree that activity in the ventral striatum cannot be measured with scalp EEG.

      We thank Reviewer #2 for providing the updated reference, which has been cited in the revised manuscript. We agree that activity in the VS cannot be reliably measured with scalp EEG and thus have removed the reference to the VS. The revised sentence now reads, “… has its neural sources in the anterior cingulate cortex (Gehring & Willoughby, 2002; Hauser et al., 2014)” (page 4, para. 2, lines 12–13).

      (2) Lines 152-153. What exactly is shown in Figure 2A? How did the authors average across subjects?

      We thank Reviewer #2 for raising this issue. Figure 2A depicts the distribution of the maximum effort level, defined as the average button-press count across three 6000-ms trials completed before the prosocial effort task. In these trials, participants were instructed to press the button as rapidly as possible with their non-dominant pinky fingers. To improve clarity, we have revised the figure caption as: “(A) Distribution of the maximum effort level (i.e., the average button-press count across three 6000-ms trials) across participants” (Figure 2).

      (3) Lines 160-164. "As expected (Figure 2D), participants perceived increased effort as more difficult ... and more disliking (b = -0.62, p < 0.001) when the beneficiary was others than themselves." Does this sentence describe the main effect of the beneficiary or the interaction between beneficiary and effort level, as the start of the sentence ("increased effort") suggests?

      We thank Reviewer #2 for pointing out this ambiguity. The sentence describes the main effect of beneficiary rather than the interaction between beneficiary and effort level. In the revised manuscript, we have rephrased the sentence as: “They felt less effort (b = -0.32, p = 0.019) and more disliking (b = -0.62, p = 0.001) for other-benefiting trials compared to self-benefiting trials” (page 9, para. 1, lines 4–6).

      (4) Lines 195-196. "..., we conducted post-hoc simple slopes analyses at -1 SD ("Low") and + SD ("High") reward magnitude." I did not understand what the authors meant with these reward magnitudes, given that the actual potential rewards were ¥0.2, ¥0.4, ¥0.6, ¥0.8, and ¥1.0.

      In our analyses, the actual reward magnitudes (¥0.2, ¥0.4, ¥0.6, ¥0.8, and ¥1.0) were z-scored and entered as a continuous regressor in the mixed-effects models. Post-hoc simple slopes analyses were then conducted at ±1 SD from the mean of the z-scored reward magnitude. To clarify, we have revised the sentence as “… we conducted post-hoc simple slopes analyses at 1 standard deviation (SD) below (“Low”) and above (“High”) the mean reward magnitude” (page 11, para. 2, lines 8–9). This standard method for testing simple effects for continuous predictors is recommended by Aiken and West (1991). Aiken, L. S., West, S. G., & Reno, R. R. (1991). Multiple regression: Testing and interpreting interactions. Sage.

      (5) Lines 253 and 275. I would not call this a computational model. The authors fit a curve to data, there is no model of the computations involved.

      This point is well taken. We have replaced “computational model” with “discounting” (Figure 5) and “parabolic discounting model” (page 15, para. 1, line 15).

      (6) Line 710. Figure S1 does not show topographic maps of the P3, as the figure caption suggests.

      We thank Reviewer #2 for identifying this oversight. We have now included topographic maps of the P3 in Figure S1.

      (7) Please check language in lines 33 (effect between), 38 (shape), 49 (highest cost form?), 74 (tunning), 90 (omit following), 127 (interest on at each effort level), 135 (press buttons >> rapidly press a button?), 142 (motivated), 219 (should low be high?), 265-266 (missing word), 275 (confirmed by following), 292 (an action can be effortful, a feeling cannot), 315 (when it comes into), 330-331 (data is plural; the aftereffect of prosocial effect), 387 (interest on at each effort level), 405 (should quickly be often?).

      We thank Reviewer #2 for the careful review and feedback about these language issues. We have revised all the phrasing you identified. The corrections are as follows:

      Line 33: “effect between” has been changed to “effects for” (page 2, para. 1, line 6).

      Line 38: “shape” has been updated to “shapes” (page 2, para. 1, line 13).

      Line 49: “highest cost form?” has been revised to “the most common cost type” (page 3, para. 1, lines 7–8).

      Line 74: “tunning” has been corrected to “tuning” (page 4, para. 2, line 1).

      Line 90: omit following. Done (page 5, para. 1, line 2).

      Line 127: “interest on at each effort level” has been corrected to “liking for each effort level” (page 6, para. 2, line 15).

      Line 135: “press buttons” has been updated to “rapidly press a button” (the caption of Figure 1).

      Line 142: “motivated” has been revised to “motivating” (page 7).

      Line 219: should low be high? Yes, we have corrected this (the caption of Figure 4).

      Lines 265–266: The missing word “with” has been inserted (page 15, para. 1, line 2).

      Line 275: “confirmed by following” has been revised as “corroborated by a parabolic …” (page 15, para. 1, line 15).

      Line 292: an action can be effortful, a feeling cannot. We have changed the word “effortful” to “effort” (page 18, para. 2, line 3).

      Line 315: “when it comes into” has been revised to “when it came to” (page 19, para. 1, line 10).

      Lines 330–331: These two expressions have been revised to “our data establish …” and “the after-effect of prosocial effort” (page 20, para. 1, lines 2–3).

      Line 387: “interest on at each effort level” has been corrected to “interest at each effort level” (page 23, para. 2, line 5).

      Line 405: should quickly be often? We agree that “quickly” might imply latency or speed of a single press, whereas the task required maximizing the frequency of presses within the time window. To capture this meaning accurately, we have revised the phrase to “pressed a button as rapidly as possible” (implying repetition rate) in the revised manuscript (page 24, para. 2, lines 3–4).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Del Rosario et al characterized the extent and cell types of sibling chimerism in marmosets. To do so, they took advantage of the thousands of SNPs that are transcribed in single-nucleus RNA-seq (snRNA-seq) data to identify the sibling genotype of origin for all sequenced cells across 4 tissues (blood, liver, kidney, and brain) from many marmosets. They found that chimerism is prevalent and widespread across tissues in marmosets, which has previously been shown. However, their snRNA-seq approach allowed them to identify precisely which cells were of sibling origin, and which were not. In doing so they definitively show that sibling chimerism across tissues is limited to cells of myeloid and lymphoid lineages. The authors then focus on a large sample of microglia sequenced across many brain regions to quantify: (1) variation in chimerism across brain regions in the same individual, and (2) the relative importance of genetic vs. environmental context on microglia function/identity.

      (1) Much like across different tissues in the same individual, they found that the proportion of chimeric microglia varies across brain regions collected from the same individuals (as well as differing from the proportion of sibling cells found in the blood of the same animals), suggesting that cells from different genetic backgrounds may differ in their recruitment and/or proliferation across regions and local tissue contexts, or that this may be linked to stochastic bottleneck effects during brain development.

      (2) Their (admittedly smaller sample size) analyses of host-sibling gene expression showed that the local environment dominates genotype.

      All told, this thoughtful and thorough manuscript accomplishes two important goals. First, it all but closes a previously open question on the extent and cell origins of sibling chimerism. Second, it sets the stage for using this unique model system to examine, in a natural context, how genetic variation in microglia may impact brain development, function, and disease.

      The conclusions of this paper are well supported by the data, and the authors exert appropriate care when extrapolating their results that come from smaller samples. However, there are a few concerns that should be addressed.

      The "modest correlation" mentioned in lines 170-172 does not take into account the uncertainty in estimates of each chimeric cell proportion (although the plot shows those estimates nicely). This is particularly important for the macrophages, which are far less abundant. Perhaps a more appropriate way to model this would be in a binomial framework (with a random effect for individuals of origin). Here, you could model the sibling identity of each macrophage as a function of the proportion of sibling-origin microglia and then directly estimate the percent variance explained.

      We appreciate this good suggestion. We performed an analysis along these lines, and found that it supported the conclusion of a lack of strong relationship between microglial and macrophage chimerism. In particular (and as we now have added to the Methods):

      “To perform an analysis of Fig. 2D that takes into account the uncertainty in the estimate of the chimeric cell proportion, we performed a binomial generalized linear mixed-effects model analysis in R using the command glmer( y~(1|indiv) + chimerism_micro, family=binomial), where y is a vector (of length 1,333) containing the genomic identity of each macrophage (either host or twin), 1|indiv models a random effect for the identity of each animal, and chimerism_micro is the microglia chimerism of the animal’s brain. The fixed effects probability of chimerism_micro was 0.795, indicating that microglial chimerism fraction was not statistically significant as a predictor for macrophage chimerism fraction. The estimate for the intercept was -0.8115 and the estimate for chimerism_micro was 0.3106, which indicates that the probability of a cell is a macrophage given the microglia chimerism fraction was only 0.57 (plogis(-0.8115+0.3106)).”

      We have added the following in the main text:

      “We investigated further by performing a statistical test that takes into account the uncertainty in the estimates of the chimeric cell proportion using a binomial framework (Methods); in this analysis, microglia chimerism fraction was not a statistically significant predictor of macrophage chimerism fraction (Methods). This suggests that in addition to the cell’s genome, other factors such as local host environment play a role in differential recruitment, proliferation or survival of the sibling cells. (We note that macrophages often transit the fluid-filled perivascular space, with a substantially different migration history and arrival dynamics than microglia.)”

      Given this new analysis, and our original observation that the Pearson correlation was only 0.31, we believe that other factors in addition to the cell’s genome play a role in differential recruitment or survival of sibling cells.

      A similar (albeit more complicated because of the number of regions being compared) approach could be applied to more rigorously quantify the variation in chimerism across brain regions (L198-215; Figure 4). This would also help to answer the question of whether specific brain regions are more "amenable" to microglia chimerism than others.

      We performed the analysis along these lines and added the following in the Methods section:

      “We used the same framework to further analyze Fig. 4. We included brain region as a covariate in the binomial framework: glmer( y~(1|indiv) + brain_reg + assay, family=binomial), where, y is a vector (of length 48,439) containing the genomic identity of each microglia, and assay is either “Drop-seq” or “10X”. The brain regions assayed in Fig. 4 are the cortex, hippocampus, hypothalamus, striatum, thalamus, and basal forebrain. All these brain regions were statistically significant predictors for microglia chimerism fraction (all P-values<2x10<sup>-16</sup>), supporting the conclusion that chimerism varies across brain regions. We also re-analyzed Supplementary Fig. 4 (Fig. 4B in original manuscript) using the same framework and found that 18 out of 27 brain substructures were statistically significant predictors for microglia chimerism fraction.”

      We have added the following sentences in the main text:

      “We used the binomial generalized linear mixed-model framework and found that all brain regions were statistically significant predictors for microglia chimerism fraction, supporting the conclusion that chimerism varies across brain regions (Methods).

      Analysis of finer brain substructures showed a similar result (Supplementary Fig. 4; the binomial generalized linear mixed-model framework determined that 18 out of 27 brain substructures were statistically significant as predictors for microglia chimerism fraction, Methods).”

      While the sample size is small, it would be exciting to see if any microglia eQTL are driven by sibling chimerism across the marmosets.

      We like this idea, but our study is underpowered for eQTL analysis since we only have 14 data points in the correlation analysis (eight cases in which an animal’s brain hosted microglia derived from a single sibling, plus three cases in which an animal’s brain hosted microglia derived from two siblings, collectively allowing 8 + (2*3)=14 pairwise analyses).

      L290-292: The authors should propose ways in which they could test the two different explanations proposed in this paragraph. For instance, a simulation-based modeling approach could potentially differentiate more stochastic bottleneck effects from recruitment-like effects.

      While intriguing, the gene expression comparison (Figure 5) is extremely underpowered. It would be helpful to clarify this and note the statistical thresholds used for identifying DEGs (the black points in the figure).

      We agree; to help clarify this for readers, we added the following sentence at the end of the paragraph discussing Fig. 5A-C.

      “In all eleven individual marmosets, analysis identified genes whose differential expression distinguished microglia with the two sibling genomes (hundreds of genes in total), documenting a substantial effect of sibling genetic differences on microglial gene expression. However, we did not find any gene whose expression level recurrently distinguished “host” microglia (microglia with the same genome as neural cell types) from “guest” microglia (microglia with the sibling genome), aside from the XIST gene (a proxy for sibling sex differences, which were of course common) (Supplementary Fig. 5, Fig. 5A-C). In other words, although there were always gene-expression differences between sibling microglia, none of them consistently distinguished between host and guest microglia, suggesting that they were instead due to sibling genetic differences. We note that both analyses are power-limited, as the number of microglia in most animals, especially guest microglia, were modest (Supplementary Fig. 5); thus, we cannot rule out the possibility that there may be one or more genes whose expression levels reflect developmental histories (host vs. guest origin), just as there are likely far more genes (than the hundreds we identified) that can have sibling expression differences due e.g. to genetic differences between siblings. We sought to increase power (beyond single-gene analysis) by using latent factor analysis (Ling et al., 2024) to identify and quantify the expression of microglial gene-expression programs; however, even this analysis did not find any gene expression programs that exhibited consistent host-twin differences in expression levels (Methods).”

      And in the caption of Fig. 5A-C, we have included the statistical threshold for identifying DEGs:

      “In (A) to (C), each point represents a gene; its location on the plot represents the level of expression of that gene among microglia with two different genomes in the same animal. x- and y-axes: normalized gene expression levels (number of transcripts per 100,000 transcripts). FC: fold-change of gene expression, female/male for XIST. Fold-change and P-values were calculated using the binomTest method from the edgeR package (Robinson et al., 2010). Differentially expressed genes (black dots) were defined as: FDR Q-value<0.05 and fold-change>1.5 (in either direction) and the gene must be expressed in at least 10% of at least one of the two sets of microglia being compared.”

      Reviewer #2 (Public review):

      Summary:

      This manuscript reports a novel and quite important study of chimerism among common marmosets. As the authors discuss, it has been known for years that marmosets display chimerism across a number of tissues. However, as the authors also recognize, the scope and details of this chimerism have been controversial. Some prior publications have suggested that the chimerism only involves cells derived from hematopoietic stem cells, while other publications have suggested more cell types can also be chimeric, including a wide range of cell types present in multiple organs. The present authors address this question and several other important issues by using snRNA-seq to track the expression of host and sibling-derived mRNAs across multiple tissues and cell types. The results are clear and provide strong evidence that all chimeric cells are derived from hematopoietic cell lineages.

      This work will have an impact on studies using marmosets to investigate various biological questions but will have the biggest impact on neuroscience and studies of cellular function within the brain. The demonstration that microglia and macrophages from different siblings from a single pregnancy, with different genomes expressing different transcriptomes, are commonly present within specific brain structures of a single individual opens a number of new opportunities to study microglia and macrophage function as well as interactions between microglia, macrophages, and other cell types.

      Strengths:

      The paper has a number of important strengths. This analysis employs the first unambiguous approach providing a clear answer to the question of whether sibling-derived chimeric cells arise only from hematopoietic lineages or from a wider array of embryonic sources. That is a long-standing open question and these snRNA-seq data seem to provide a clear answer, at least for the brain, liver, and kidney. In addition, the present authors investigate quantitative variation in chimeric cell proportions across several dimensions, comparing the proportion of chimeric cells across individual marmosets, across organs within an individual, and across brain regions within an individual. All these are significant questions, and the answers have important implications for multiple research areas. Marmosets are increasingly being used for a range of neuroscience studies, and a better understanding of the process that leads to the chimerism of microglia and macrophages in the marmoset brain is a valuable and timely contribution. But this work also has implications for other lines of study. Third, the snRNA-seq data will be made available through the Brain Initiative NeMO portal and the software used to quantify host vs. sibling cell proportions in different biosamples will be available through GitHub.

      Weaknesses:

      I find no major weaknesses, but several minor ones. First, the main text of the manuscript provides no information about the specific animals used in this study, other than sex. Some basic information about the sources of animals and their ages at the time of study would be useful within the main paper, even though more information will be available in the supplementary material.

      We moved the table containing animal information (age at time of study, sex, source, tissues analyzed) from Supplementary Table 1 into the main text as Table 1. We also added the following sentences starting on line 140:

      “Brain snRNA-seq was performed on 11 animals (6 adults, 3 neonates and 1 six months old; Table 1). All were unrelated except for CJ006 and CJ007 which are birth siblings, and CJ025 and CJ026 which are (non-birth) siblings. All animals come from the three main marmoset colonies that comprise the animals in our facilities: New England Primate Research Center (NEPRC), CLEA Japan, and from a non-clinical contract research organization in Massachusetts. All adult marmosets had no known previous disease and were selected as part of a larger project to create a single cell atlas of the marmoset brain. The three neonates had died shortly after birth due to unknown reasons and were subsequently selected for snRNA-seq analysis.”

      Second, it is not clear why only 14 pairs of animals were used for estimating the correlation of chimerism levels in microglia and macrophages. Is this lower than the total number of pairwise comparisons possible in order to avoid using non-independent samples? Some explanation would be helpful.

      Only birth siblings (twins and triplets) can be meaningfully included in this analysis. The 14 pairs of animals we used to estimate the correlation of chimerism levels in microglia and macrophages included all pairs that we could use for this analysis: eight cases in which an animal’s brain hosted microglia derived from a single sibling, plus three cases in which an animal’s brain hosted microglia derived from two siblings, collectively allowing 8 + (2*3)=14 pairwise analyses.

      Finally, I think more analysis of the consistency and variability of gene expression in microglia across different regions of the brain would be valuable. Are there genetic pathways expressed similarly in host and sibling microglia, regardless of region of the brain? Are there pathways that are consistently expressed differently in host vs sibling microglia regardless of brain region?

      For brain-region differences in microglial gene expression, we are under-powered and would only be scratching the surface of a question (interesting but beyond the focus and scope of this paper) that needs deeper experimental sampling.

      For the questions about sibling-sibling differences (regardless of which sibling is host) and recurring host-sibling differences, we can do a stronger analysis, because these analyses have similar power to each other. We describe this analysis in the revised manuscript as follows:

      “In all eleven individual marmosets, analysis identified genes whose differential expression distinguished microglia with the two sibling genomes (hundreds of genes in total), documenting a substantial effect of sibling genetic differences on microglial gene expression. However, we did not find any gene whose expression level recurrently distinguished “host” microglia (microglia with the same genome as neural cell types) from “guest” microglia (microglia with the sibling genome), aside from the XIST gene (a proxy for sibling sex differences, which were of course common) (Supplementary Fig. 5, Fig. 5A-C). In other words, although there were always gene-expression differences between sibling microglia, none of them consistently distinguished between host and guest microglia, suggesting that they were instead due to sibling genetic differences. We note that both analyses are power-limited, as the number of microglia in most animals, especially guest microglia, were modest (Supplementary Fig. 5); thus, we cannot rule out the possibility that there may be one or more genes whose expression levels reflect developmental histories (host vs. guest origin), just as there are likely far more genes (than the hundreds we identified) that can have sibling expression differences due e.g. to genetic differences between siblings.”

      We also, as suggested, tried to get beyond single-gene analyses to expression of programs/pathways, by performing latent factor analysis on the single-cell gene expression measurements. 

      “Following the method described in (Ling et al., 2024), we performed latent factor analysis using the probabilistic estimation of expression residuals (PEER, Stegle et al., 2010) on the gene-by-donor matrix expression of microglia. We started by creating a gene-by-cell matrix of microglia gene expression from all animals, and we normalized the matrix using SCT transform version 2 (Choudhary and Satija, 2022) with 3000 variable features. We obtained the Pearson residuals from SCT normalization and summed up the residuals across cells with the same genome to obtain a gene-by-donor matrix of expression measurements of microglia. We used this matrix as input to PEER and ran the tool with a provided number of factors from 9 to 12. For each gene-expression latent factor, to evaluate whether host/sibling identity had a consistent effect on expression levels, we performed a linear regression with host/sibling identity using glm(peer_factor_k ~ host_or_twin). For all factors, the P-values for the effect of host_or_twin were all insignificant (greater than 0.1), indicating that no PEER factor associated with host-vs-twin identity. Thus, our results found no large-scale gene expression program that was consistently expressed differently between hosts and twins.”

      We have added the text above to the Methods section, and we added the following at the end of the section on Gene-expression comparisons of host- to sibling-derived microglia (lines 264-267):

      “We sought to increase power (beyond single-gene analysis) by using latent factor analysis (Ling et al., 2024) to identify and quantify the expression of microglial gene-expression programs; however, even this analysis did not find any gene expression programs that exhibited consistent host-twin differences in expression levels (Methods).”

      Gene-expression pathways/factors did (within some animals) did show host-twin differences in expression levels, but without a consistent host-twin direction of effect that was shared across the many host-twin comparisons. In particular, we used the PEER analysis that we have performed above and calculated the host-sibling expression level difference for each latent factor. Many factors differed in expression in individual cases, though none did so in all cases nor in a consistent-sign manner:

      Author response image 1.

      Difference between host and sibling expression of gene-expression latent factors for each of the 12 factors computed (using PEER) from the single-cell dataset. For a given factor, the factor expression value of the sibling-genome cells is subtracted from that of the host-genome cells and the difference is divided by the maximum of the absolute value of all elements in that factor.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In the introduction (line 62), the authors mention that chimerism might have shaped behavior in marmosets (and perhaps been selected for). It would be helpful to see this revisited in the discussion. Is it possible that additional genetic variation in immune cells (resident and circulating) provides adaptive benefits and/or disease resistance? In the case of microglia, could the proportion of sibling cells be related (either positively or negatively) to local/regional pathology?

      We liked this suggestion and have added the following in the Discussion:

      “Chimerism could also enable interesting future analyses of whether there are adaptive benefits of chimerism in marmoset immune cells, among whom chimerism could in principle allow presentation of a wider variety of antigens for adaptive immunity. In a recent outbreak of yellow fever in Brazil in 2016-2018, marmosets were found to be less susceptible than other primates that lack immune system chimerism, including the howler monkeys (Alouatta), robust capuchins (Sapajus), and titi monkeys (Callicebus) (de Azebedo Fernandes, et al., 2021). In studying future outbreaks in marmosets, one could use single-cell RNA-seq and the methods described here to study how genetically distinct immune cells (in the same animal) have differentially migrated to affected tissues and/or assumed "activated" immune cell states. Recent innovations in spatial transcriptomics with sequencing readouts (that detect SNP alleles) may also make it possible to identify any differential recruitment of genetically distinct immune cells to focal infection sites.”

      Minor comments:

      L300 delete "temporal.”

      We have revised the text accordingly.

      L305: "more-restricted" should not be hyphenated.

      We have revised the text accordingly.

      L309: "from the non-cell" - delete "the.”

      We have revised the text accordingly.

      L367: Louvain, not Louvaine.

      We have revised the text accordingly.

      Figure 2B can be removed - it does not add much information and takes up a lot of space.

      We have moved Figure 2B to panel J Supplementary Fig. 1 (it is now displayed together with all other animals).

      The same can be said for Figure 4B, which is too tiny. There might be more effective ways to show this variation across animals.

      We have moved Figure 4B to Supplementary Fig. 4 and we have increased the font sizes to make the text in the figures more readable.

      Reviewer #2 (Recommendations for the authors):

      I would suggest providing some basic information about the sources of study animals within the main text. At a minimum, it would be useful to state which colonies are represented in the data, and if there is anything significant about the individual animal histories (e.g. prior exposure to surgical intervention or infectious disease). I believe this basic information should be in the main text, despite the inclusion of a broader range of information in the supplements.

      We appreciate this suggestion and revised lines 143 to 149 of the main text as follows:

      “All animals come from the three main marmoset colonies that comprise the animals in our facilities: New England Primate Research Center (NEPRC), CLEA Japan, and from a non-clinical contract research organization. All adult marmosets had no known previous disease and were selected as part of a larger project to create a single-cell atlas of the marmoset brain (Krienen et al., 2020; Krienen et al., 2023). The three neonates died shortly after birth due to unknown reasons and were subsequently selected for snRNA-seq analysis.”

      I would include the species name (Callithrix jacchus) in line 48.

      “On lines 47-48, we now indicate the name of the genus: “Chimerism is common, however, in the Callitrichidae family that consists of the marmosets (Callithrix) and their close relatives the tamarins (Saguinus)...”

      Then on line 65, we now indicate the species name: “Here, we analyze chimerism in the common marmoset (Callithrix jacchus) brain, liver, kidney and blood,...”

      The word "organisms" in line 59 should be "organs.”

      We have modified the text accordingly.

      Lines 100-101: I would suggest this would be clearer to readers if it read: "The relative likelihoods of the original source of each cell could be strongly...".

      We have modified the text accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aims to investigate the development of infants' responses to music by examining neural activity via EEG and spontaneous body kinematics using video-based analysis. The authors also explore the role of musical pitch in eliciting neural and motor responses, comparing infants at 3, 6, and 12 months of age.

      Strengths:

      A key strength of the study lies in its analysis of body kinematics and modeling of stimulus-motor coupling, demonstrating how the amplitude envelope of music predicts infant movement, and how higher musical pitch may enhance auditory-motor synchronization.

      Weaknesses:

      The neural data analysis is currently limited to auditory evoked potentials aligned with beat timing. A more comprehensive approach is needed to robustly support the proposed developmental trajectory of neural responses to music.

      We thank the reviewer for this comment and would like to clarify that there has been a misunderstanding: our EEG analyses were time-locked to actual tone onsets, not to expected beat positions. For both music and shuffled conditions, ERPs were computed by epoching around all real auditory events present in each stimulus. This approach ensures that the AEPs reflect neural responses to actual auditory events rather than to predicted or expected events that do not exist in the shuffled stimuli. We have now clarified this further in the revised manuscript (p. 9).

      Reviewer #2 (Public review):

      Summary:

      Infants' auditory brain responses reveal processing of music (clearly different from shuffled music patterns) from the age of 3 months; however, they do not show a related increase in spontaneous movement activity to music until the age of 12 months.

      Strengths:

      This is a nice paper, well designed, with sophisticated analyses and presenting clear results that make a lot of sense to this reviewer. The additions of EEG recordings in response to music presentations at 3 different infant ages are interesting, and the manipulation of the music stimuli into shuffled, high, and low pitch to capture differences in brain response and spontaneous movements is good. I really enjoyed reading this work and the well-written manuscript.

      Weaknesses:

      I only have two comments. The first is a change to the title. Maybe the title should refer to the first "postnatal" year, rather than the first year of life. There are controversies about when life really starts; it could be in the womb, so using postnatal to refer to the period after birth resolves that debate.

      Thank you very much for your thoughtful suggestion regarding the title. To ensure clarity and to unambiguously indicate that our study focuses on the period after birth, we agree that specifying "first postnatal year” in the title is appropriate. We have revised the title accordingly.

      The other comment relates to the 10 Principal Movements (PMs) identified. I was wondering about the rationale for identifying these different PMs and to what extent many PMs entered in the analyses may hinder more general pattern differences. Infants' spontaneous movements are very variable and poorly differentiated in early development. Maybe, instead of starting with 10 distinct PMs, a first analysis could be run using the combined Quantity of Movements (QoM) without PM distinctions to capture an overall motor response to music. Maybe only 2 PMs could be entered in the analysis, for the arms and for the legs, regardless of the patterns generated. Maybe the authors have done such an analysis already, but describing an overall motor response, before going into specific patterns of motor activation, could be useful to describe the level of motor response. Again, infants provide extremely variable patterns of response, and such variability may potentially hinder an overall effect if the QoM were treated as a cumulated measure rather than one with differentiated patterns.

      We agree that due to the high variability and limited differentiation of infant motor responses at this age, it is important to consider an overall measure of movement in addition to specific PMs. To address exactly this, we had included an analysis in which we combined all 10 PMs into a single global QoM metric. This ‘All PMs’ measure reflects the overall motor response to the different auditory stimuli. For clarity, this result is presented in Figure 5, where we show the denoised global QoM signal and highlight the observed Condition × Age interaction (which averaged QoM for all PMs and is therefore equivalent to QoM without PM distinction). We now emphasize this analysis more clearly in the Results section (p. 16).

      Reviewer #3 (Public review):

      Summary:

      This study provides a detailed investigation of neural auditory responses and spontaneous movements in infants listening to music. Analyses of EEG data (event-related potentials and steady-state responses) first highlighted that infants at 3, 6, and 12 months of age and adults showed enhanced auditory responses to music than shuffled music. 6-month-olds also exhibited enhanced P1 response to high-pitch vs low-pitch stimuli, but not the other groups. Besides, whole body spontaneous movements of infants were decomposed into 10 principal components. Kinematic analyses revealed that the quantity of movement was higher in response to music than shuffled music only at 12 months of age. Although Granger causality analysis suggested that infants' movement was related to the music intensity changes, particularly in the high-pitch condition, infants did not exhibit phase-locked movement responses to musical events, and the low movement periodicity was not coordinated with music.

      Strengths:

      This study investigates an important topic on the development of music perception and translation to action and dance. It targets a crucial developmental period that is difficult to explore. It evaluates two modalities by measuring neural auditory responses and kinematics, while cross-modal development is rarely evaluated. Overall, the study fills a clear gap in the literature.

      Besides, the study uses state-of-the-art analyses. All steps are clearly detailed. The manuscript is very clear, well-written, and pleasant to read. Figures are well-designed and informative.

      Weaknesses:

      (1) Differences in neural responses to high-pitch vs low-pitch stimuli between 6-month-olds and other infants are difficult to interpret.

      We agree with the reviewer that the differences in neural responses to high-pitch versus low-pitch stimuli between 6-month-olds and other infants are difficult to interpret. We have offered several possible explanations for these findings, including developmental changes in auditory plasticity, social interaction effects, maturation of the auditory system, and arousal or exposure differences. If the reviewer has additional perspectives or alternative explanations, we would be very pleased to incorporate them into the revised manuscript.

      (2) Making some links between the neural and movement responses that are described in this manuscript could be expected, given the study goal. Although kinematic analyses suggested that movement responses are not phase-locked to the music stimuli, analyses of Granger causality between motion velocity and neural responses could be relevant.

      We appreciate the suggestion that exploring links between neural and movement responses would be valuable, especially given the study's goals. We were initially cautious about interpreting potential Granger-causal relations between neural and motor activity, as temporal scale differences between the two measures can easily bias directionality estimates. Neural responses typically occur on the scale of milliseconds, whereas movement unfolds over seconds. As a result, an apparent directional relation might emerge simply due to these intrinsic timescale differences rather than reflecting genuine causal influence.

      Nevertheless, we agree that this relationship warrants further investigation and added the following analyses to the supplements (p. 9). Accordingly, we conducted additional exploratory analyses to examine whether ERP amplitudes correlated with movement measures. To this end, we computed correlations between neural and movement responses using participant-averaged data (not single trials). For neural measures, we extracted mean ERP amplitudes in the time window post-tone-onset encompassing the P1 component derived from cluster-based analyses. For movement measures, we used: (1) total movement quantity (mean velocity across the entire trial), and (2) Granger causality F-values reflecting music-to-movement coupling strength. These analyses included comparisons between music and shuffled music conditions, as well as between high- and low-pitch conditions. We therefore ran two linear mixed-effects models, with ERP amplitudes as response variables and either QoM or Granger causality F-values as fixed effects. Infants were modelled as random intercepts. Our results showed no significant correlations between ERP amplitudes and movement quantity, irrespective of conditions (p>.124), and neither when comparing music vs shuffled music (p>.111) nor when comparing high vs low pitch (p>.071) across all age groups. We also do not find significant correlations between ERP amplitudes and Granger causality F-values, irrespective of conditions (p>.164), and when comparing music vs shuffled music (p>.494) or high vs low pitch (p>.175) across all age groups. The absence of robust correlations suggests that neural sensitivity to musical structure (as indexed by ERPs) and motor responsiveness to music (as indexed by movement quantity or coupling strength) develop somewhat independently during the first year of life. This dissociation aligns with broader developmental theories proposing that perceptual sensitivity often precedes and enables later motor coordination, rather than developing together.

      (3) The study considers groups of infants at different ages, but infants within each group might be at different stages of motor development. Was this assessed behaviorally? Would it be possible to explore or take into account this possible inter-individual variability?

      We agree this is important. Infants in each age group were within a quite narrow age range (3 months: M=113.04 days, SD=5.68 days, Range=98-120 days, 6 months: M=195.88 days, SD=9.46 days, Range=182-211 days,12-13 months: M=380.44 days, SD=14.93 days, range=361-413 days), as detailed in the sample description on p. 37. Despite this, we asked parents to report on infants' major motor milestones, specifically their ability to sit and/or walk. At 6 months, 25% of infants were able to sit (N = 20), and at 12 months, 50% of infants were able to walk (N = 18). Given the relatively small group sizes for these milestones, we are concerned that conducting detailed analyses could yield unstable or misleading results that may not generalize beyond our sample. Therefore, we chose to focus on broader analyses that are more robust given our current dataset. We fully support your suggestion that future studies with larger samples and more comprehensive motor assessments will better clarify these developmental trajectories.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      While the analysis and findings on auditory-evoked spontaneous movement are highly interesting, the results from the neural data raise questions about the genuine role of music in the observed evoked and induced responses.

      General comments on the findings related to neural data

      (1) The main neural finding is a larger response in the Music condition compared to the Shuffled Music condition. To address their hypothesis, the authors computed the AEP to tones at the beat position and compared responses between the Music and Shuffled Music conditions, aligning the onset to the expected beat position. However, given that inter-onset intervals were permuted in the Shuffled condition, an AEP time-locked to the expected beat position is not meaningful, as no tone is expected at that time. Therefore, it is expected to have a relatively flat AEP in response to the shuffled condition. Furthermore, given the reduced regularity in the Shuffled condition, the observed difference in ASSR at the beat frequency is expected. Similar results could be obtained using an isochronous sequence of pure tones and a shuffled version of the same sequence. Therefore, these two analyses do not strongly support the conclusion of infants' enhanced neural responses to music.

      The authors could consider comparing AEPs by aligning onsets in the Shuffled condition to the actual tone positions, potentially focusing only on tones with sufficiently long preceding and following IOIs to avoid confounds from short intervals. The two conditions could then be compared with correction for the number of tones. Potential differences in this case could have suggested an impact beyond the auditory evoked responses.

      We agree that ASSR analyses at the beat frequency is not enough to evidence enhanced neural responses to music. However, we would like to clarify that for the AEP analyses, the EEG data were epoched to all actual tone onsets rather than the expected beat positions, therefore adding to the ASSR analysis. Thus, for the shuffled music condition, the EEG was aligned with the real tone onsets present in that sequence, not with hypothetical beat positions derived from a regular rhythm. This approach ensures that the AEPs reflect neural responses to actual auditory events rather than to predicted or expected events that do not exist in the shuffled stimuli.

      We further clarify this in the results section on p. 9

      “Figure 2 shows the average ERPs to the bassline notes in the auditory stimuli, with EEG data time-locked to actual tone onsets (see Methods for details).”

      Finally, following the reviewer’s suggestion, we carried out three control analyses: 1) including only epochs corresponding to bassline tones whose prior inter-onset interval (IOI) exceeded the median IOI duration, 2) including only epochs corresponding to bassline tones whose subsequent IOI exceeded the median IOI duration, and 3) including only epochs corresponding to both melody and bassline tones whose prior and subsequent IOI exceeded the median IOI duration. These analyses yielded event-related potentials in the shuffled music condition that were highly similar to those obtained when all epochs were included (see Figure S1). Therefore, the greater neural response to music compared with shuffled music likely reflects an effect of predictability in the musical condition or, more generally, infants’ disengagement with the shuffled stimuli.

      It would also be helpful to see whether the authors explored other approaches for evaluating neural responses across conditions, such as brain-stimulus synchronization, coherence measures, or temporal response functions (TRF), and whether these yielded comparable results.

      Thank you for this question. We have not explored these approaches, but we agree that alternative methods for evaluating neural responses, such as brain-stimulus synchronization, coherence measures, or temporal response functions (TRF), could offer complementary insights. Given the scope and focus of the present work, and the already extensive set of neural and behavioral measures reported, we chose to prioritize analyses most directly relevant to our initial research questions. Incorporating further methods might risk complicating the narrative and obscuring the key findings. We appreciate the value of these additional methods and consider them promising avenues for future investigations.

      (2) Another important finding concerns the difference in AEPs between the High Pitch and Low Pitch conditions in 6-month-old infants, a pattern not observed in the younger (3-month) or older (12-month and adult) groups. The authors interpret this as heightened sensitivity to high-pitch sounds, typical of infant-directed speech. However, the absence of this effect at 12 months raises questions. It would be helpful to consider whether this pattern may be influenced by data quality differences across age groups. Additionally, the authors could discuss this observation in relation to studies showing stronger neural tracking of rhythms in infants, particularly for low-frequency sounds (e.g., Lenc et al., Developmental Science, 2022).

      This is an interesting consideration that we investigated further. Regarding data quality differences, we considered different measures and now report these in the methods section (p. 30) and supplements (p. 1).

      “We conducted two analyses to compare the EEG data quality across age groups. First, we compared the number of trials that were included in the final analysis per age group. The trial number did not differ significantly across age groups (p > .361). Second, we calculated the SNR by dividing the EEG power at the frequency of interest (i.e., 2.25 Hz, matching the musical beat) by the background noise in surrounding bins (3rd to 5th bin, see ASSR methodology for further details; c.f., Christodoulou et al., 2018; Cirelli et al., 2014). This division yields a signal-to-noise ratio that can be averaged across conditions and compared across age groups to assess variations in signal quality (especially when focusing on the pitch conditions with the same beat frequency). Here, we find that all three age groups show considerable SNR above 1 (3m: M = 2.569, SD = 1.104; 6m: M = 2.743, SD = 1.001; 12m: M = 1.907, SD = 0.749), with no statistically significant differences (three t-tests, FDR-corrected, p > .134). Importantly, our key comparison of High vs. Low Pitch was performed within each age group, thus controlling for any overall differences in signal quality across groups. Together, these two analyses indicate that signal quality was comparable across age groups.”

      Overall, these control analyses seem to support the observed high-pitch sensitivity in the neural response of 6-month-olds, specifically, and in line with previous research investigating this age range (Trainor & Zacharias, 1998; Fernald & Kuhl, 1987). What is more is that there might be some particular changes towards the end of the first year that mark infants’ widening of their attention towards others (beyond their primary caregivers) and objects in their environment (Cooper et al., 1997; Newman & Hussain, 2006), as well as a decrease in exposure to face-to-face interactions with their primary caregivers (Jayaraman et al., 2015). Taken together, research shows that infants' preference for infant-directed speech decreases significantly between 4.5 and 9 months, coinciding with developmental changes in attentional systems and social interaction patterns. This might explain the absence of high-pitch sensitivity in 12-month-olds. However, further research is needed to determine if and in which contexts high-pitch sensitivity to music changes throughout infancy.

      We also edited the discussion in order to compare our results to those of Lenc et al., 2023, p. 23: “It should also be noted that our musical stimuli comprised polyphonic (two-voice) music, carrying sound frequencies falling within the typical range of infant-directed song (~200-400 Hz, Cirelli et al., 2020; Nguyen, Reisner, et al., 2023b; Trainor & Zacharias, 1998). As such, our results might specifically speak for infants’ ability to separate (and prioritize among) simultaneous communicative auditory streams (Marie & Trainor, 2013; Trainor, 2015). Indeed, other studies presenting one-voice pure tone sequences (single isochronous and isotonous tones) with high vs. low pitch - notably at frequencies outside our range (130 vs. 1237 Hz) - have reported stronger neural responses to relatively low frequencies (Lenc et al., 2023). Together, these contrasting observations suggest that pitch prioritization changes not only throughout development but also depends on the polyphonic complexity and spectral characteristics of the perceived stimuli. Further research might investigate this interesting issue further.”

      (3) It would also be helpful if the authors provided more detailed information on the stimuli, including both temporal/rhythmic and spectral content, for the original music, high-pitch and low-pitch variations, and shuffled versions.

      Absolutely. We agree that this is important to report. We have added a Table to the Results (Table 1) and a Table S1 with M, SD and range of the envelope to further describe the temporal and spectral features of the Stimuli.

      General comments on the findings related to body kinematics

      (4) Quantification of movement based on the PMs did not lead to any differences between the High Pitch and Low Pitch conditions. However, Granger causality showed high prediction strength for the High Pitch condition. In the discussion, the authors proposed that high-pitch music might have led to higher arousal. If this were the case, one might expect to observe increased movement in the High Pitch condition relative to the Low Pitch condition in the PM analyses. I propose that the authors revise the discussion to address the misalignment between different findings.

      We thank the reviewer for highlighting this important point and welcome the suggestion to clarify the relationship between movement quantification based on principle movements (PM) and the Granger causality results. We agree that the apparent discrepancy between these measures merits further clarification. We note that the discrepancy suggests that Granger causality may capture subtler temporal coordination between movements and the music, rather than gross movement magnitude. We have incorporated this reasoning into the revised discussion paragraph (page 23-24), which now reads as:

      “If increased arousal were to result in greater overall movement, we would expect higher movement levels in the high pitch condition; however, this was not observed. QoM analyses based on the PMs did not reveal significant differences between the high pitch and low pitch conditions. This discrepancy may arise because Granger causality captures subtler temporal coordination between movement and music rather than gross movement quantity. Thus, high-pitch music may modulate the timing and coordination of motor responses without necessarily increasing the overall amount of movement. In line with prior work (e.g., Bigand et al., 2024), this interpretation emphasizes that musical coordination often involves changes in coupling strength rather than movement quantity per se.”

      (5) The authors report a lack of periodicity and phase-locked movement in infants. Considering the developmental stage, I assume that spontaneous movements to music have emerged over short periods during each exposition period. Probably to further investigate movement periodicity, which has been previously suggested, the authors can first automatically extract periods of periodic movement and further evaluate the tempo/frequency and synchronization with the stimulus during these specific periods.

      We thank the reviewer for this thoughtful suggestion. We conducted similar analyses prior to submission, using methods comparable to previous studies (Fujii et al., 2014). These analyses did not yield additional insights beyond those already presented in the manuscript, so we opted not to include them initially. For completeness, we briefly mention these results on p. 19:

      “Robustness analyses based on thresholding of variation in the time series to identify movement burst epochs (similar to Fujii et al., 2014) yielded consistent results. No significant movement-to-music synchronization was found across age groups (all ps > .563).“

      It is important to clarify that while movement periodicity in infants listening to music has been previously suggested, the evidence for actual synchronization to musical beats remains limited and has been frequently misinterpreted in the literature. The seminal study by Zentner and Eerola (2010) is often cited as evidence for infant rhythmic entrainment, but their findings actually demonstrated tempo flexibility rather than synchronization, i.e., infants moved faster when the music was faster. Similarly, Fujii et al. (2014) found that while individual infants showed some movement-to-music coordination, this occurred in only 2 out of 11 tested infants (18%), and the authors emphasized that "movement-to-music synchronization is rare in infants and observed at an individual level".

      (6) A last general comment is that the authors try to explain the findings of the current study, providing hypotheses, for instance, on the origin of differences in the neural response to high and low pitch only at 6 months. It would be helpful if the authors also consider the misalignment of results with previous findings.

      We thank the reviewer for this comment and acknowledge the importance of placing our findings in the context of prior research on infant pitch perception, including some apparent inconsistencies such as those noted for Lenc et al. (2023), which we have addressed in our response to comment 2. We agree that results inevitably vary across studies due to differences in methods, stimuli, and participant samples—all factors that contribute to some variability in developmental trajectories observed in the literature.

      Importantly, our observation of a transient difference in neural responses to high versus low pitch emerging at 6 months aligns with existing evidence indicating significant neural reorganization occurring around this age (Carr et al., 2022) and continuing toward 12 months (Kuhl et al., 2014). This may reflect a sensitive developmental window during which infants show heightened sensitivity to prosodic features important for early social and communicative interactions. After this window, attentional and auditory processing priorities shift, which could explain the subsequent decline in pitch sensitivity.

      We emphasize that these interpretations are preliminary, and further systematic investigations—preferably longitudinal studies incorporating diverse pitch ranges and multimodal attentional and neural measures—are needed to delineate the developmental course of pitch sensitivity comprehensively.

      Reviewer #2 (Recommendations for the authors):

      Thank you for the opportunity to read this interesting work.

      Thank you for the constructive comments.

      Reviewer #3 (Recommendations for the authors):

      (1) I would suggest replacing "first year of life" with "first post-natal year".

      Thank you for the suggestion. In line with yours and Reviewer #2’s comments, we have revised the title to “first postnatal year”.

      (2) Precising the music paradigm and the stimuli nature/timing would be useful at the beginning of the Results section.

      We agree and have added two tables (Table 1 and Table S1 for continued information on the envelope) for further information about the paradigm and stimuli to the beginning of the results section (p.8).

      In addition, the stimuli are also shared on a repository: https://doi.org/10.48557/DCSCFO.

      (3) Since the infants moved during the experiment, EEG data might show movement artefacts. Was the approach used to correct these artefacts satisfactory, even in 12-month-olds who moved more?

      We appreciate the reviewer’s important question regarding artifact correction in infant EEG data, especially given increased movement in older infants. We recognize that movement-related artifacts are an inherent challenge in EEG recordings with infants, and complete elimination of such artifacts is technically difficult (if not impossible). However, several points support the robustness of our ERP findings despite spontaneous movement:

      First, we used a two‐stage pipeline to maximize artifact removal without bias: First, Artifact Subspace Reconstruction (ASR) repaired brief, high‐variance artifacts by reconstructing contaminated channels from clean data. Second, Independent Component Analysis (ICA, as implemented in ICLabel) decomposed the ASR‐cleaned EEG into independent components, allowing us to remove residual non‐neural artifacts (e.g., eye movements) based on their spatial and spectral features. Both ASR and ICA operate agnostically to condition or age group and automatically, without subjective decisions, ensuring unbiased cleaning and reliable ERP comparisons.

      As noted in the response to R1 Comment (2), we also compared the EEG data quality across age groups and conditions. The trial number did not differ significantly across age groups (p > .361). Second, we calculated the SNR by dividing the EEG power at the frequency of interest and found no statistically significant differences across age groups (three t-tests, FDR-corrected, p > .134). Together, these two analyses indicate that signal quality was comparable across age groups.

      Infant movements during the session were sporadic and, most importantly not time-locked to tone onsets (see Fig S2). Because artifact rejection (namely, Artifact Subspace Reconstruction and Independent Component Analysis) discarded only those epochs containing large, transient artifacts irrespective of condition, residual movement-related noise would not systematically inflate ERPs.

      (4) The timing of the P200 response peak could be specified in adults as for infants.

      The timing of the P200 in adults is mentioned on page 9: “[…] a second positivity peaking at 158 ms post-stimulus (so-called “P200”, here reaching an amplitude of 0.85 µV).” The timing of the infant P2 is specified on p 10 and 11: “The P2 ranged between 307 and 325 ms post-stimulus and peaked at 316 ms, reaching an average amplitude of 1.026 µV.”

      (5) In infants, the evocation of "peaking at 212ms" is not completely clear: does this timing correspond to the P1 peak at 3 months of age or to the time when the response to music was enhanced compared to shuffled music?

      Thank you for highlighting the need for greater clarity regarding the timing of the P1 peak and its relation to the observed enhancement. We have revised the text to explicitly state that 212 ms corresponds to the P1 peak in 3-month-old infants within the window where the response to music was significantly enhanced compared to shuffled music.

      p.9: “Importantly, and in line with the adults’ data, all infant groups exhibited enhanced P1 amplitudes in response to music compared to shuffled music. Cluster-based permutation (nPerm=1000) testing revealed that 3-month-old infants’ P1 amplitude was enhanced between 177 and 305 ms post-stimulus (cluster-t=1111.90, p=.002). Within this window, the P1 peaked at 212 ms and reached an amplitude of 1.8 µV.”

      (6) It might be useful to put the results of this study into perspective with other studies of infant motor development (e.g., Hinnekens et al, eLife 2023).

      Thank you for pointing out this study. We have integrated the Hinnekens et al. (2023) findings into our discussion of infant motor development toward dance-like behaviors. p.22 “Taking a broader perspective on infants’ motor development, our findings align with research on locomotion across the first 14 months of life, which shows that as the number of motor primitives increases, their intrinsic variability decreases (Hinnekens et al., 2023). Viewed together, these patterns point toward a gradual refinement of motor control: the human motor system first develops the capacity to control individual muscles, and gradually to integrate them into motor synergies that support complex, coordinated behaviours, such as locomotion, musical synchronization, and dance.”

      (7) Regarding the progressive maturation of the auditory/linguistic pathways during infancy, the authors might also refer to (Dubois et al, Cerebral Cortex 2016).

      Thank you for the suggestion. We added the study to the discussion on page 22: “This developmental trajectory aligns with neuroimaging evidence showing that while the ventral linguistic pathway (connecting temporal and frontal regions via the extreme capsule) is well-established at birth, the dorsal pathway—particularly the arcuate fasciculus connecting temporal regions to inferior frontal areas—continues maturing throughout the first postnatal months, with different maturational timelines for dorsal versus ventral connections (Dubois et al., 2016).“

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important methodological issue - the fragility of meta-analytic findings - by extending fragility concepts beyond trial-level analysis. The proposed EOIMETA framework provides a generalizable and analytically tractable approach that complements existing methods such as the traditional Fragility Index and Atal et al.'s algorithm. The findings are significant in showing that even large meta-analyses can be highly fragile, with results overturned by very small numbers of event recodings or additions. The evidence is clearly presented, supported by applications to vitamin D supplementation trials, and contributes meaningfully to ongoing debates about the robustness of meta-analytic evidence. Overall, the strength of evidence is moderate to strong, though some clarifications would further enhance interpretability.

      Strengths:

      (1) The manuscript tackles a highly relevant methodological question on the robustness of meta-analytic evidence.

      (2) EOIMETA represents an innovative extension of fragility concepts from single trials to meta-analyses.

      (3) The applications are clearly presented and highlight the potential importance of fragility considerations for evidence synthesis.

      Weaknesses:

      (1) The rationale and mathematical details behind the proposed EOI and ROAR methods are insufficiently explained. Readers are asked to rely on external sources (Grimes, 2022; 2024b) without adequate exposition here. At a minimum, the definitions, intuition, and key formulas should be summarized in the manuscript to ensure comprehensibility.

      (2) EOIMETA is described as being applicable when heterogeneity is low, but guidance is missing on how to interpret results when heterogeneity is high (e.g., large I²). Clarification in the Results/Discussion is needed, and ideally, a simulation or illustrative example could be added.

      (3) The manuscript would benefit from side-by-side comparisons between the traditional FI at the trial level and EOIMETA at the meta-analytic level. This would contextualize the proposed approach and underscore the added value of EOIMETA.

      (4) Scope of FI: The statement that FI applies only to binary outcomes is inaccurate. While originally developed for dichotomous endpoints, extensions exist (e.g., Continuous Fragility Index, CFI). The manuscript should clarify that EOIMETA focuses on binary outcomes, but FI, as a concept, has been generalized.

      Reviewer #2 (Public review):

      Summary:

      The study expands existing analytical tools originally developed for randomized controlled trials with dichotomous outcomes to assess the potential impact of missing data, adapting them for meta-analytical contexts. These tools evaluate how missing data may influence meta-analyses where p-value distributions cluster around significance thresholds, often leading to conflicting meta-analyses addressing the same research question. The approach quantifies the number of recodings (adding events to the experimental group and/or removing events from the control group) required for a meta-analysis to lose or gain statistical significance. The author developed an R package to perform fragility and redaction analyses and to compare these methods with a previously established approach by Atal et al. (2019), also integrated into the package. Overall, the study provides valuable insights by applying existing analytical tools from randomized controlled trials to meta-analytical contexts.

      Strengths:

      The author's results support his claims. Analyzing the fragility of a given meta-analysis could be a valuable approach for identifying early signs of fragility within a specific topic or body of evidence. If fragility is detected alongside results that hover around the significance threshold, adjusting the significance cutoff as a function of sample size should be considered before making any binary decision regarding statistical significance for that body of evidence. Although the primary goal of meta-analysis is effect estimation, conclusions often still rely on threshold-based interpretations, which is understandable. In some of the examples presented by Atal et al. (2019), the event recoding required to shift a meta-analysis from significant to non-significant (or vice versa) produced only minimal changes in the effect size estimation. Therefore, in bodies of evidence where meta-analyses are fragile or where results cluster near the null, it may be appropriate to adjust the cutoff. Conducting such analyses-identifying fragility early and adapting thresholds accordingly-could help flag fragile bodies of evidence and prevent future conflicting meta-analyses on the same question, thereby reducing research waste and improving reproducibility.

      Weaknesses:

      It would be valuable to include additional bodies of conflicting literature in which meta-analyses have demonstrated fragility. This would allow for a more thorough assessment of the consistency of these analytical tools, their differences, and whether this particular body of literature favored one methodology over another. The method proposed by Atal et al. was applied to numerous meta-analyses and demonstrated consistent performance. I believe there is room for improvement, as both the EOI and ROAR appear to be very promising tools for identifying fragility in meta-analytical contexts.

      I believe the manuscript should be improved in terms of reporting, with clearer statements of the study's and methods' limitations, and by incorporating additional bodies of evidence to strengthen its claims.

      Reviewer #3 (Public review):

      Summary and strengths:

      In this manuscript, Grimes presents an extension of the Ellipse of Insignificant (EOI) and Region of Attainable Redaction (ROAR) metrics to the meta-analysis setting as metrics for fragility and robustness evaluation of meta-analysis. The author applies these metrics to three meta-analyses of Vitamin D and cancer mortality, finding substantial fragility in their conclusions. Overall, I think extension/adaptation is a conceptually valuable addition to meta-analysis evaluation, and the manuscript is generally well-written.

      Specific comments:

      (1) The manuscript would benefit from a clearer explanation of in what sense EOIMETA is generalizable. The author mentions this several times, but without a clear explanation of what they mean here.

      (2) The authors mentioned the proposed tools assume low between-study heterogeneity. Could the author illustrate mathematically in the paper how the between-study heterogeneity would influence the proposed measures? Moreover, the between-study heterogeneity is high in Zhang et al's 2022 study. It would be a good place to comment on the influence of such high heterogeneity on the results, and specifying a practical heterogeneity cutoff would better guide future users.

      (3) I think clarifying the concepts of "small effect", "fragile result", and "unreliable result" would be helpful for preventing misinterpretation by future users. I am concerned that the audience may be confusing these concepts. A small effect may be related to a fragile meta-analysis result. A fragile meta-analysis doesn't necessarily mean wrong/untrustworthy results. A fragile but precise estimate can still reflect a true effect, but whether that size of true effect is clinically meaningful is another question. Clarifying the effect magnitude, fragility, and reliability in the discussion would be helpful.

      I am very appreciative of the insightful comments you all shared, and in light of them have made several clarifications and revisions. Thank you again, I am grateful to have received such considered feedback and I hope I’ve addressed any outstanding issues. I have replied to each reviewer’s recommendations in this document sequentially for ease of scanning, and am most grateful for the summary strengths and weaknesses, which I am also incorporated into these replies. Thank you again!

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The manuscript makes the important argument that many meta-analyses are inherently fragile, which aligns with prior work (e.g., PMID: 40999337). Please add the reference to the statements.

      Excellent point, thank you – I’ve expanded the discussion of fragility analysis, and its application to meta-analysis, including this reference.

      (2) The rationale and mathematical underpinnings of the proposed EOI and ROAR methods are not sufficiently explained. While the authors cite Grimes (2022, 2024b), readers are expected to rely heavily on these external sources without adequate exposition in the current paper. This limits the ability to fully evaluate the reasonableness of the methods or to reproduce the approach. I strongly recommend expanding the description of EOI and ROAR within the manuscript.

      I agree fully – I was a little remiss in this scope, as I was worried about overwhelming the reader. However, I was too sparse with detail and have now extended the text this way to describe the methods intuitively as possible (see Discussion, subsection “Ellipse of Insignificance and Region of Attainable Redaction”

      (3) In the Methods, the authors note that EOIMETA is applicable when between-study heterogeneity is low. However, the manuscript provides little guidance on how to interpret results when heterogeneity is high (e.g., larger I² values). I recommend clarifying this issue in the Results or Discussion sections, emphasizing the limitations of EOIMETA under high heterogeneity. Ideally, the authors could include either a small simulation study or an illustrative example to demonstrate the performance of the method in such settings.

      This is an excellent question, and I was remiss for not considering it better in the manuscript. Originally, the simple idea was to just pool the results for EOI, in which case heterogeneity would be an issue. But I then subsequently added weighed-inverse variance methods to account for situations with increased heterogeneity, so my initial comment was not strictly correct. I’ve changed the text in several places, notably in the methods and in the discussion (see reply point 5).

      (4) While EOIMETA is introduced as a generalizable fragility metric for meta-analyses, the illustrative examples would benefit from clearer comparisons with the traditional Fragility Index (FI). Because FI is well established in the RCT literature and familiar to many readers, presenting side-by-side results (e.g., FI at the trial level versus EOIMETA at the meta-analytic level) would provide important context. Such comparisons would also highlight the added value of EOIMETA, underscoring that even when individual trials appear robust under FI, the pooled meta-analysis may remain fragile.

      This is an excellent idea! The new table is given below. Note that traditional FI are not defined for non-significant results, and EOI is ambiguous for counts <2.

      (5) In the Discussion currently states that the Fragility Index (FI) applies only to binary outcomes. This is not entirely accurate. While the original FI was indeed developed for dichotomous endpoints, subsequent methodological work has extended the concept to other data types, including continuous outcomes (continuous fragility index, CFI). The manuscript should acknowledge this distinction: EOIMETA presently focuses on binary outcomes at the meta-analytic level, but FI more broadly is not restricted to binary data. Adding this clarification, with appropriate citations, would improve accuracy and place EOIMETA more clearly within the broader fragility literature.

      Thank you for this catch – clarified now in the discussion:

      Reviewer #2 (Recommendations for the authors):

      (1) Typos/inconsistencies/writing clarifications: All table and figure legends and titles are missing a period at the end of each sentence. In the sentence "to be estimated by bootstrap methods. Initially, we ran...", there should be a space between "methods" and "Initially" (line 113).

      Apologies, these are now remedied.

      (2) In Table 2, the total number of patients in the meta-analysis of all 12 studies is reported as 133,262, whereas the text states 133,475 patients. Based on my calculations from Figure 2, the total appears to be 133,262. Could you please clarify this discrepancy?

      Certainly – your calculations are correct. The text figure was a typo based on a very early draft where the summation function was not correctly run, and doubled counted some cases. This was fixed for the figure but not the text. The text should now match, thank you for spotting this. There are some issues with figure 2, which I will address in next few points.

      (3) Regarding this point, the meta-analysis by Zhang et al. (2019) shows some inconsistencies in the reported number of patients in the paper. According to the data provided on GitHub the total number of patients is 37671. However, Table 1 of the paper lists 38538 patients, and the main text states "5 RCTs involving 39168 patients." Similarly, for Guo et al. (2023), the main text reports that the meta-analysis included 11 RCTs with 112165 patients, whereas the table lists 111952, which appears consistent with the data available on GitHub. There is also a discrepancy in Zhang et al. (2022), which cites 61853 patients in the introduction but 61223 patients in Table 1. These inconsistencies should be clarified, as even small discrepancies in reported sample sizes can undermine the credibility of the analyses presented.

      Well-spotted – the incorrect figures are artefacts of an early draft with a double-counting summation function, and I should have spotted them and removed them prior to submission. To clarify, the correct figures from each study (which agree with github data) are given in the corrected table 1.

      Thus, there are 38,538 subjects in the Zhang et al 2019 analysis, which matches the first sheet of the github listing. The confusion comes from sheet 2 which was included only with this, which breaks these events down into events / non-events (hence the total non-events being 37,671) but keeps the old labels. This is needlessly confusing, and accordingly I have re-uploaded the data with correct headers for sheet 2.  This summation problem was also apparent in the total of figure 2, which has been replaced with a correct version now. Thank you for spotting this!

      (4) In line 158, who does "He" refer to? Please clarify this in more detail.

      Apologies, this was a typo and should have read “the” – now corrected.

      (5) The discrepant results of the RCT by Scragg et al. (2018) between the meta-analysis by Zhang et al. and that by Guo et al. could be presented in a table. This could be included as supplementary material or, preferably, in the main text (Results section).

      To avoid confusion, I will add a version of this to the github files for interested users to explore.

      (6) In the legend of Figure 2, a period is missing at the end of the sentence. Additionally, although it is generally understood, it would be helpful to specify that the numbers in parentheses represent the confidence intervals. Please confirm whether these are 95%, 89%, or 99% confidence intervals.

      Apologies, these are 95% CIs. Clarified now in updated legends.

      (7) The statement of "The more recent and robust methods for fragility analysis (EOI) and redaction (ROAR) have potential applications beyond fragile-by-design RCTs, extending to cohort studies, preclinical work, and even ecological studies, as stated by the author" in line 163. Could you please provide references supporting these claims? I believe the relevant references may be included in the EOI paper, but it would be helpful to cite them here as well.

      This has recently been used in new analysis now cited in the introduction with fuller description of method for context. Please see response to reviewer 1, points 2

      (8) Since the study was previously published as a preprint (https://www.medrxiv.org/content/10.1101/2025.08.15.25333793v1.full-text), this should be mentioned in the manuscript.

      Added as a note now.

      (9) It would also be valuable to include a figure illustrating ROAR for the same meta-analyses presented in Figure 1 for EOI, possibly as supplementary material.

      See reply to point 10.

      (10) Finally, it would be interesting to provide plots of both EOI and ROAR for the meta-analyses of all 12 included studies. These graphs could be replicated using the code examples provided by the author in the original EOI and ROAR publications.

      These have now been added to the github repository as supplementary material.

      (11a) Replications of EOI fragility: eoicfunc.R (github): - In the code provided on GitHub, an error occurred in the "EllipseFromEquation" function within eoifunc. This was due to the PlaneGeometry package not being available for the latest version of R. I attempted several installation methods (using devtools, remotes, and GitHub, as well as direct installation from a URL). However, after adjusting the code, I was able to run the analyses. For the full cohort, including all 12 studies using the EOI approach, I obtained a Minimal Experimental Arm only recoding (xi) = 14 and a Minimal Control Arm only recoding (yi) = 15, whereas the authors reported that 5 recodings were sufficient. It appears that differences in code versions or functions might have slightly affected the results. After downgrading R and running the eoic function with PlaneGeometry successfully installed, the fragility index for the EOI approach was 15 rather than 5.

      Apologies for the issue with PlaneGeometry, I will try to fix this for future iterations. The difference you see is an artefact of running EOIFUNC on pooled data, rather than the dedicated EOIMETA function, with the chief difference being that EOIFUNC doesn’t apply WIV correction.  If we simply pool events, this is the output:

      Author response image 1.

      If the reviewer uses the EOIMETA function which employs inverse weighing, then to define each trial we use a vector of events and non-events in each arm. For all the 12 studies, this would be (in R code syntax, or import from github file)

      Author response image 2.

      Then they will obtain:

      Author response image 3.

      If the reviewer runs a simple pooler analysis with weighed inverse correction turned off, they should return a similar answer as a simple eoifunc call, save the zero count correction difference. But EOIMETA weighs the sample, and is reported in main paper.

      (12) I recalculated the eoic function for Zhang et al. (2019) and found a fragility index (dmin) of 1. FECKUP Vector Length: 0.5722. Minimal Experimental Arm Recoding (xi): 0.7738. Minimal Control Arm Recoding (yi): 0.8499.

      This again appears to be an artefact of using eoifunc rather than eoimeta; with eoimeta, which uses WIV to adjust the studies for heterogeneity effects, this is the reported output:

      Author response image 4.

      (13) Using the previous code (before downgrading R and loading PlaneGeometry), I recalculated the EOI for Zhang et al. (2022) and found Minimal Experimental Arm only recoding (xi) = 55 and Minimal Control Arm only recoding (yi) = 59-results slightly closer to those reported by the authors. After properly loading PlaneGeometry, I recalculated and obtained for Zhang et al. (2022): Fragility index (dmin) = 57; FECKUP Vector Length = 39.948; Minimal Experimental Arm Recoding (xi) = 54.5436; Minimal Control Arm Recoding (yi) = 58.635.

      Again this appears to be a difference in using eoifunc or eoimeta as a call -  I can replicate this result using EOIFUNC:

      Author response image 5:

      But adjusting for study weighing with eoimeta:

      Author response image 6.

      (14) For Guo et al. (2022), the EOI fragility index was 17 [dmin = 17]. FECKUP Vector Length: 11.3721. Minimal Experimental Arm Recoding (xi): -15.6825. Minimal Control Arm Recoding (yi): -16.5167. However, the authors report an EOI fragility of 38. Since I was able to load PlaneGeometry properly and run eoicfunc.R (from GitHub) without errors, the discrepancies likely reflect minor coding or version inconsistencies rather than software limitations.

      These again stem from using eoifunc on simple pooled data versus eoimeta, which adjusts by study.

      (15) Replications of ROAR fragility: roarfunc.R (github): - For Guo et al. (2022), the ROAR fragility calculated using roarfunc.R was 16 [rmin (Redaction Fragility Index) = 16]. FOCK Vector Length: 15.942. Minimal Experimental Arm Redaction (xc): 15.9442. Minimal Control Arm Redaction (yc): 978.8906. In the main text, the author reports a redaction fragility of 37. What might explain these discrepancies?

      Again, this stems from EOIMETA versus EOIFUNC (and roarfunc calls without weighed adjustment). As the reviewer has observed, the fragility increases when there is no study level adjustment, which we have now added to the discussion text.

      (16) In generic_run.R, line 6 contains a bug - it is missing a forward slash (/) between the directory path and the filename. The correct line of code should be: pathload = paste0(pathname, "/", filename, exname). The same issue occurs in generalcode.R.

      Apologies, I will correct this in the upload!

      (17) Theoretical framework: Is there any other method available for comparison besides the one proposed by Atal et al.? Could you include a brief literature review describing alternative approaches?

      To my knowledge, there is not – Xing et al (now referenced) covered this earlier in the year, and I have included an expanded background for this purpose. Please see reply to reviewer 1, point 1.

      (18a) There appears to be no heterogeneity in the meta-analysis in terms of effect sizes and I², likely because most values are quite large, yet the included studies address very different populations (e.g., patients with COPD, NSCLC survivors, older adults, women, and GI cancer survivors). This could have been explained more clearly, including how such diverse literature might influence fragility indices or whether there is a logical rationale for combining these studies. Could you perform a sensitivity analysis or provide a conceptual explanation of how the heterogeneity - or lack thereof - across these trials may affect the fragility indices? Although I² values are small, the conceptual heterogeneity among studies suggests that the pooled results may be comparing fundamentally different clinical contexts, which requires clarification.

      I think this is a very pertinent point, I am unsure as to why these authors combined such diverse populations without any consideration of whether they were comparable, but this is a common problem in meta-analysis. I have added the following to the discussion to address this problem:

      “The use of vitamin D meta-analyses in this work was chosen as illustrative rather than specific, but it is worth noting that there are methodological concerns with much vitamin D research. (Grimes aet al., 2024). The three studies cited in this work report relatively low heterogeneity in their meta-analysis in both effect sizes and I<sup>2</sup> values, but it is worth noting that the included studies addressed very different populations, including patients with Chronic Obstructive Pulmonary Disease, Non small cell lung cancer survivors, women only cohorts, older adults, and gastrological cancer survivors. These groups have presumably different risk factors for cancer deaths, and why the authors of these studies combined the cohorts with fundamentally different clinical contexts is unclear. Why the heterogeneity appeared so relatively low in different groups is also a curious feature. This goes beyond the scope of the current work, but serves as an example of the reality that meta-analysis is only as strong as its underlying data and methodological rigor in comparing like-with-like, and the conclusions drawn from them must always be seen in context.”

      Reviewer #3 (Recommendations for the authors):

      (1) Line 156, acronym FI not defined.

      Apologies, I this is now defined at the outset as “fragility index”.

      (2) Line 158, typo "He"?

      Apologies again, this was a typo and was supposed to read “the”, fixed now.

      (3) Across the manuscript, I think the "re-coding" phrasing may confuse clinical readers. Maybe rephrasing to "flipping event classification" or "flipping group" would be better.

      Excellent point – this has now been modified at the outset.

    1. Author Response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Although the data are generally solid and well interpreted, a control showing that protein depletion works properly in cell-cycle arrested cells is lacking, both when using siRNAs and degron-based depletion.

      We now demonstrate in Fig. S9 efficient degron-mediated depletion of both NUF2 and SPC24 in cell-cycle arrested cells by Western blotting. We show similar data for siRNA knockdowns. Our siRNA knockdown experiments include a “siDEATH” control that induces cytotoxicity by targeting several essential genes. In Fig. S6a we now show that siDEATH transfection results in strong cytotoxicity and cell death in cycling as well as cell cycle arrested G1/S and G2/M populations indicating efficient protein depletion. Additionally, in Fig. S6b we now show depletion NCAPH2 protein levels by siRNA knockdown in cycling as well as cell cycle arrested cell populations by Western blot analysis. We mention these results on page 11 and page 13.

      Reviewer #2 (Public review):

      The filtering strategy used in the screen imposes significant constraints, as it selects only for non-essential or functionally redundant genes. This is a critical point, as key regulators of chromatin organisation - such as components of the condensin and cohesin complexes-are typically essential for viability. Similarly, known effectors of centromere behaviour (e.g., work by the Fachinetti's lab) often lead to aneuploidy, micronuclei formation, and cell cycle arrest in G1. The implication of this selection criterion should be clearly discussed, as it fundamentally shapes the interpretation of the study's findings.

      We discussed our hit selection criteria on page 8 and in the Methods section. Some of the concerns regarding a bias towards non-essential genes are alleviated by the fact that our screen is limited to a relative short duration of 72 hours rather than the longer timepoints that are generally used to assess essentiality in pooled CRISPR-KO screens, allowing us to identify genes that may be essential if eliminated permanently. In support of this notion, we identify subunits of the essential condensin and cohesin complexes as hits with only limited effect on cell viability. In this case, the Z-score for change in cell number upon NCAPH2 knockout was -0.26 indicating only a mild reduction compared to the average cell number across all targets.

      Other confounding effects on hit selection due to micronuclei formation, cell cycle effects etc. are minimized as we closely monitor micronuclei formation and cell viability in our screen. Finally, aneuploidy is similarly not a confounding factor in hit identification since, as we previously demonstrated, the Ripley’s K-based clustering score is robust to changes in spot number (Keikhosravi, A., et al. 2025).

      A major limitation of the study is the lack of connection between centromere clustering and its biological significance. It remains unclear whether this clustering is a meaningful proxy for higher-order genome organisation. Additionally, the study does not explore potential links to cell identity or transcriptional landscapes. Readers may struggle to grasp the broader relevance of the findings: if gene knockouts that alter centromere positioning do not affect cell viability or cell cycle progression, does this imply that centromere clustering - and by extension, interphase genome organisation - is not biologically significant?

      We appreciate these points. Given the presence of one centromere on each chromosome, we used centromeres as surrogate landmarks of higher-order nuclear genome organization and considered centromere patterns as a general indicator of overall genome organization. While the relationship of centromere patterns to other genome features is poorly understood in mammalian cells, a link is suggested by observations in other organisms. For example, in yeast, the clustering of centromeres reflects the overall Rabl configuration of chromosomes. Having said that, we agree that our extrapolation to overall genome organization is somewhat speculative, and we have toned down these conclusions throughout the manuscript.

      We agree that one of the most interesting questions emerging from our study is whether centromere clustering has a functional role. In follow-up studies we will use some of the key regulator identified in these screens to perturb the native centromere distribution and assay for various cellular responses including in gene expression and genome integrity. These studies will be the subject of future publications.

      Another point requiring clarification is the conclusion that the four identified genes represent independent pathways regulating centromere clustering. In reality, all of these proteins localise to centromeres. For example, SPC24 and NUF2 are components of the NDC80 complex; Ki-67, a chromosome periphery protein, has been mapped to centromeres; and CAP-Hs, a subunit of the condensin II complex that during G1 promotes CENP-A deposition. Given their shared localisation, it would be informative to assess aneuploidy indices following depletion of each factor. Chromosome-specific probes could help determine whether centromere dysfunction leads to general mis-segregation or reflects distinct molecular mechanisms. Additionally, exploring whether Ki-67 mutants that affect its surfactant-like properties influence centromere clustering could provide a more mechanistic insight.

      We thank the reviewer for this comment. We now clarify the relationship of these proteins to centromeres in more detail on page 12. While they all have some relationship to centromeres, as would be expected if they contributed to centromere clustering, they represent multiple distinct pathways and processes.

      The observed effects on clustering are unlikely due to aneuploidy as only very limited aneuploidy is observed in our cells and because Ripley’s K measurement of centromere clustering is robust to change in chromosome copy number. Follow-up studies using live cell imaging approaches are currently in progress to address some of these mechanistic questions.

      Finally, the additive effects observed mild mis-segregation effects are amplified when two proteins within the same pathway are depleted. This possibility should be considered in the interpretation of the data.

      We rephrased the text on page 14 based on the reviewer’s recommendations.

      Reviewer #3 (Public review):

      Given the authors' suggestion that disorderly mitotic progression underlies the changes in centromere clustering in the subsequent interphase, I think it would be beneficial to showcase examples of disorderly mitosis in the AID samples and perhaps even quantify the misalignment on the metaphase plate.

      We now include in Fig. S11 examples of disordered mitotic nuclei observed in the absence of NUF2 or SPC24.

      I don't quite agree with the description that centromeres cluster into chromocenters (p4 para 2, p17 para 1, and other instances in the manuscript). To the best of my knowledge, chromocenters primarily consist of clustered pericentromeric heterochromatin, while the centromeres are studded on the chromocenter surface. This has been beautifully demonstrated in mouse cells (Guenatri et al., JCB, 2004), but it is true in other systems like flies and plants as well.

      We have modified this description on page 4.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Proper characterisation of the cell lines used in the manuscript. Tagged proteins have been known to affect protein levels compared to the parental cell, and where this is the case (or not), it needs to be transparently shown in the manuscript.

      The cell lines to conditionally deplete NCAPH2 and KI67 have previously been published, and they have been characterized to show normal expression levels of the tagged protein (Takagi et al., 2018). We also show quantification of Western blots to compare protein level of tagged SPC24 and NUF2 to that of the untagged proteins in the parental cell line (Fig. S8e-f) and discuss these results on page 11 and page 12.

      (2) Demonstration of protein depletion in the degron cell lines.

      We showed efficient protein depletion in the degron cell lines (Fig. S8c and S8d). In addition, we now show in Fig. S9 depletion of SPC24 and NUF2 in cells arrested at G1/S and G2/M.

      (3) The study examines centromere clustering, but not genome architecture. While it is understood that a complete investigation of genome architecture is beyond the scope of the current study, the interpretation does not match the data. The authors are suggested to pay attention to this point throughout the manuscript and consider their findings in terms of centromere clustering rather than genome architecture, including changing the title accordingly.

      We have toned down our statements regarding overall genome organization throughout the manuscript. Since centromeres are a natural fiducial marker for overall genome organization and a link to overall genome organization has been suggested in some organisms such as yeast, we have retained the wording in a few select instances, including the title. We also make it clear that we do not intend to draw conclusions regarding TADs or even compartments but consider centromere patterns an indicator of overall genome organization.

      Reviewer #1 (Recommendations for the authors):

      (1) Controls of depletion by western blot in synchronized cells (siRNAs and degrons) are lacking.

      We now show Western blots demonstrating efficient depletion of the target proteins in degron (Fig. S9) and siRNA treated cell-cycle arrested cells (Fig. S6b).

      It would have been very nice to discuss the implications of these findings further. For example, do centromere clustering changes gene expression/repression of pericentromeric heterochromatin expression? Is centromere clustering associated with specific diseases? How is global chromatin organization affecting gene expression/genome stability, etc? Although some of these aspects are unknown, a discussion about them would have been nice.

      We appreciate these interesting points. These questions are the subject of our ongoing follow up studies. We now discuss possible consequences of centromere re-organization on gene expression and genome stability on page 18.

      Reviewer #2 (Recommendations for the authors):

      Major Comments:

      (1) Clarify Scope and Avoid Overinterpretation

      (a) The study exclusively investigates centromere positioning, without addressing broader aspects of genome architecture.

      (b) There is no established link presented between centromere positioning and higher-order genome organisation.

      We have toned down our statements regarding overall genome organization throughout the manuscript. Since centromeres are a natural fiducial marker for overall genome organization and observations in yeast suggest such a link, we have retained the wording in a few select instances. We make it clear that we do not intend to draw conclusions regarding TADs or even compartments but consider centromere patterns an indicator of overall genome organization.

      (c) The exclusion criteria used in the screen should be clearly explained, including the implications of selecting only non-essential or redundant genes.

      We discuss on page 8 and in the Methods section the exclusion criteria used in the screen, including the implications for identifying essential genes.

      (d) The authors should discuss why the identified proteins significantly affect centromere clustering but do not impact cell cycle progression.

      We now discuss this topic briefly on page 9. While some hits are expected to affect both cell-cycle progression and centromere clustering (Fig. S4c), it is not a priori expected that all hits would affect both.

      (2) Supplementary Figure 1

      This figure appears unnecessary. The co-localisation between CENP-C and CENP-A is well established in the literature, and the scoring provided does not add essential new information.

      The data was included in response to repeat questions from a centromere expert. We prefer to retain this data for completeness.

      (3) Differential Hits between Cell Lines 

      For hits that behave differently across cell lines, expression data should be provided. Are the genes equally expressed in both cell types? What is the level of depletion achieved?

      It is possible that cell-type specific hits arise due to difference in expression. Cell-type specific hits may also arise due multiple other reason including cancer vs. non-cancer origin, hTERT-immortalization, cell growth properties, variation in underlying DNA sequences of the Cas9 target loci, initial state of centromere clustering to name a few. Each of these possibilities requires additional experiments to identify the exact reason for cell-type specificity of a given factor. A full analysis of the reason for cell-type specificity is, however, beyond the scope of current study.

      (4) Efficiency of Cell Cycle-Specific Degradation

      Degradation efficiency likely varies across cell cycle stages. The authors should provide Western blots showing the extent of protein depletion at each cell cycle block.

      We provide Western blot data in Fig. S9 to demonstrate efficient knockdown of proteins in G1/S and G2/M arrested cells.

      (5) Figure S6 - Validation of New Cell Lines

      Genotyping data for the newly generated cell lines should be included, along with Western blots using protein-specific antibodies (not just the tag), compared to the parental cell line.

      We provide in Fig. S7c-d genotyping data and in Fig. S8e-f Western blot data to compare levels of tagged and untagged proteins.

      (6) Figure S7 - G2/M Block Efficiency

      The G2/M block appears suboptimal after 20 hours in RO-3306, with only ~50% of cells in G2/M and just 21-27% for Ki-67, where most cells remain in S phase. This raises concerns about the interpretation of mitotic depletion effects. It is possible that cells never progressed from G1 or completed S phase without Ki-67. Prior studies (van Schaik et al., 2022; Stamatiou et al., 2024) have shown delayed and uneven replication of centromeric/pericentromeric regions upon Ki-67 depletion during S phase, which could affect the readout. Live-cell imaging would be a more robust approach to confirm mitotic status.

      For KI67 after RO-3306 treatment, 73 and 67% cells were arrested at the G2/M boundary in the presence or absence of KI67, respectively (Fig. S10a-b). Upon release from G2/M arrest, the proportion of G1 cells increased from 6-13% to 28-60% in all four factors tested (Fig. S10b, and d). Please note that our results are not directly dependent on release efficiency, since we use single-cell staging (Fig. 3b) and selectively analyze only G1 populations (Fig. 5c).

      We are currently working towards live cell imaging, but this requires development and characterization of additional cell lines which is beyond the scope of this study.

      Statistical analyses of cell cycle phase distributions should also be included.

      We include statistical analyses of cell cycle phase distributions in Fig. S4c and Fig. S10c-d by performing t-tests with FDR corrections to compare percentage of cells in either in G1, S or G2 in the presence and absence of each factor tested.

      (7) Aneuploidy Assessment

      Aneuploidy scores for the four key proteins should be provided, ideally using centromere-specific FISH probes.

      While an aneuploidy score for each hit would be interesting piece of information, we showed in a previous publication that the Ripley’s K-based Clustering Score method used here is robust to aneuploidy (Keikhosravi et al., 2025) and aneuploidy would thus not lead to spurious identification of these proteins in our screen.

      (8) Add-Back Experiment (Page 14)

      While the add-back experiment is conceptually strong, its execution could be improved. <br /> It should be performed on synchronised cells: deplete the protein in G2/M, arrest in thymidine, then release into G1 without the protein to observe the unclustering phenotype.

      Re-expression should occur during the block, followed by release and analysis in the next G1 phase. This would better demonstrate whether clustering defects from the previous division can be rescued.

      We have attempted these types of long-term depletion experiments in cell-cycle arrested cells, but have observed significant viability defects, making results uninterpretable.

      (9) Statistical Analyses

      Several figures lack statistical analysis, which is essential for data interpretation:

      (a) Figure 1B-E

      (b) Figure 3I

      (c) Figure 4B

      (d) Figure 5B, C, G

      (e) Supplementary Figures S4B and S7

      Statistical analyses were performed for a) Fig. 1b-e, b) Fig. 3i, c) Fig. 4b, d) Fig. 5b-c and the details of the test are mentioned in the corresponding figure legends. We also include statistical tests for Fig. 5g, S5b and S7c-d.

      Minor Comments:

      (1) Page 9: "Reassuringly, in line with known centromere-nucleoli association (Bury, Moodie et al. 2020, van Schaik, Manzo et al. 2022)..."

      The citation "van Schaik, Manzo et al. 2022" is incorrect and should be revised.

      We have removed this reference.

      (2) Page 10:

      "...were grouped into six categories: regulators of chromatin structure, kinetochore proteins, nucleolar proteins, nuclear pore complex components..."

      The authors should note that NUP160, listed as a nuclear pore complex hit, is also a kinetochore component during mitosis and may be linked to mitotic defects.

      We now mention this on page 10.

      (3) Page 12:

      "Progression through S phase was equally efficient in the presence or absence of KI67."

      While bulk S phase progression may appear unaffected, refined analyses (e.g., Repli-seq, EdU patterning) have shown delayed replication of centromeric/pericentromeric regions upon Ki-67 depletion. This should be acknowledged, especially given the study's focus on centromeres (see Schaik et al., 2022; Stamatiou et al., 2024).

      Our statement was meant to describe the results we observed in this study. We indicate that overall progression is not affected, but subtle effects may persist, and we cite the relevant references on page 13.

      (4) Page 12:

      "KI67 is a well-known marker of cell proliferation..."

      The first study demonstrating the dependency of chromosome periphery on Ki-67 was Booth et al., 2014, which should be cited.

      This citation has been added.

      Reviewer #3 (Recommendations for the authors):

      (1) On page 14, paragraph 1, the authors suggest that NCAPH2 and SPC24 act independently on centromere clustering. I'm not convinced that this is the right interpretation of the data. Rather, the lack of an additive phenotype following NCAPH2 and SPC24 dual depletion suggests to me that these two proteins are acting in the same pathway.

      We show that knockdown of NCAPH2 and SPC24 results in opposite effects in centromere clustering. However, knockdown of SPC24 in NCAPH2-AID cells produces an intermediate level of clustering compared to depletion of NCAPH2 or SPC24 knockdown alone. This indicates additive effects. We have modified our description of these results on p. 14.

      (2) The analysis and experimental design in Figure 5g could be improved. For one, I would add statistical comparisons like the other figure panels. Second, the authors would ideally perform AID depletion in a synchronized G2 population before washout during the subsequent G1. This design might make some of the more subtle changes (e.g., KI67-AID) more obvious.

      We now include statistical analysis in Fig. 5g. We have attempted long-term depletion experiments in cell-cycle arrested cells, but have observed significant viability defects, making results uninterpretable.

      (3) In the discussion, the authors allude to centromere clustering data from the NDC80 complex, HMGA1, and other HMGs but fail to direct the reader to where they may find the data. If these data are in Tables S4 and S5, perhaps the authors could make these tables more reader-friendly?

      For each target, the mean Z-score of two biological replicates based on Clustering Score is located in column H in Table S4 and S5.

      (4) In my opinion, the term 'clustering score' comes across a bit ambiguous. In most cases, this term appears to refer to the distance between centromeric foci but is used occasionally to refer to the number of centromeric spots. For example, on page 9, paragraph 1, line 3, cluster/clustering is used three times but with slightly different meanings. Perhaps the authors can consider using the word 'clustering' to indicate the number of spots, 'dispersion' to indicate distance between centromeres, and 'radial distribution' to indicate distance from the nuclear center? Or other ways to improve the consistency of the descriptive terms.

      We apologize for not being clear. The Clustering Score is a very specific parameter derived from use of a Ripley’s K clustering algorithm as described in Materials and Methods. We now ensure that the term is used correctly throughout and that the other terms are also used consistently.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses

      As presented, the manuscript has limitations that weaken support for the central conclusions drawn by the authors. Many of the findings align with prior work on this topic, but do not extend those findings substantially.

      An overarching limitation is the lack of temporal resolution in the manipulations relative to the behavioral assays. This is particularly important for anxiety-like behaviors, as antecedent exposures can alter performance. In the open field and elevated zero maze assays, testing occurred 30 minutes after CNO injection. During much of this interval, the targeted neurons were likely active, making it difficult to determine whether observed behavioral changes were primary - resulting directly from SuM neuronal activity - or secondary, reflecting a stress-like state induced by prolonged activation of SuM and related circuits. This concern also applies to the chronic inhibition of ventral subiculum (vSub) neurons during 10 days of CSDS.

      We appreciate the reviewer's concern regarding the timing of CNO administration relative to behavioral testing. The 30-minute interval was selected according to some previous studies[1, 2]. This window ensures stable and specific neuronal manipulation while minimizing off-target effects and was strictly performed through all experiments. We acknowledge that shorter interval (~15 mins) can be efficient to produce biological effect in vivo[3, 4]. We repeated chemogenetic tests 2-3 times to make sure to get reliable data for statistical analysis. However, we cannot exclude potential side-effects caused by chemogenetically prolonged activation of SuM because of its poor temporal resolution compared to optogenetic manipulation. We agree that employing techniques with higher temporal resolution, such as optogenetics, in future studies would provide an excellent complement to these findings.

      The combination of stressors (foot shock and CSDS) and behavioral assays further complicates interpretation. The precise role of SuM neurons, including SANs, remains unclear. Both vSub and dSub neurons responded to foot shock, but only vSub neurons showed activity differences associated with open-arm transitions in the EZM.

      We agree that the use of multiple stressors (foot shock and CSDS) adds complexity to the interpretation. Our rationale was to test the generality of the SuM response and the role of SANs across different stress modalities (acute vs. chronic). The key finding is that while both vSub and dSub projections to the SuM were activated by the acute stressor of foot shock (Figure 5N-R), only the vSub-SuM pathway showed a significant increase in calcium activity specifically during the anxiety-provoking transition from the closed to the open arms of the EZM (Figure 5I-M). This dissociation suggests a selective role for the vSub-SuM circuit in encoding anxiety-related information, beyond a general response to stress.

      In light of prior studies linking SuM to locomotion (Farrell et al., Science 2021; Escobedo et al., eLife 2024), the absence of analyses connecting subpopulations to locomotor changes weakens the claim that vSub neurons selectively encode anxiety. Because open- and closed-arm transitions are inherently tied to locomotor activity, locomotion must be carefully controlled to avoid confounding interpretations.

      We thank the reviewer for highlighting the important studies linking the SuM to locomotion. We acknowledge this known function and carefully considered it in our analyses. Non-selective activation of the entire SuM didn’t affect total distance traveled in open field and elevated zero maze (Supplemental Figure 2 B-C). Although the locomotion of mice in OF and EZM was affected while targeting SANs, we also compared the travel distance in the central area of OF, to some extent, to minimize the influence of locomotion on the estimation of anxiety produced avoidance to the central area (Figure 4 I). We agree that future work delineating the specific subpopulations within the SuM that regulate locomotion versus anxiety would be highly valuable.

      Another limitation is the narrow behavioral scope. Beyond open field and EZM, no additional assays were used to assess how SAN reactivation affects other behaviors. Without richer behavioral analyses, interpretations about fear engrams, freezing, or broader stress-related functions of SuM remain incomplete.

      In addition, small n values across several datasets reduce confidence in the strength of the conclusions.

      We acknowledge that the primary focus on OF and EZM tests is a limitation in fully characterizing the behavioral profile of SAN manipulation. These tests were selected as they are well-validated, standard assays for anxiety-like behavior in rodents[5–10]. However, we also included the reward-seeking test, where activation of SANs significantly suppressed sucrose consumption (Figure 4L), suggesting a broader impact on motivational state that is often linked to anxiety. We fully agree with the reviewer that employing a richer behavioral battery—such as tests for social avoidance, conditioned place aversion, or Pavlovian fear conditioning—in future studies will be essential to comprehensively define the functional scope of SuM SANs and to conclusively dissect their role from fear memory engrams.

      Figure level concerns:

      (1) Figure 1: In Figure 1, the acute recruitment of SuM neurons by for shock is paired with changes in neural activity induced by social defeat stress. Although interesting, the connections of changes induced by a chronic stressor to Fos induction following acute foot shock are unclear and do not establish a baseline for the studies in Figure 3 on activation of SANs by social stressors.

      Thank you for this important comment. We agree that directly linking acute foot shock-induced cFos expression with chronic social defeat stress (CSDS) electrophysiological changes may create an interpretive gap. In Figure 1, we aimed to demonstrate that both acute (foot shock) and chronic (CSDS) stressors can activate SuM neurons, using complementary methods (cFos for acute, in vivo recording for chronic). We did not intend to imply that the same neuronal population responds identically to both stressors.

      To address this, we have clarified in the text that the purpose of Figure 1 is to show that SuM is responsive to diverse stressors, rather than to establish a direct mechanistic link between acute and chronic activation patterns. The baseline for SAN studies in Figure 3 is established through the TRAP2 tagging protocol following foot shock, independent of the CSDS model. We acknowledge that future studies should compare SAN recruitment across acute vs. chronic stressors to better define their functional overlap.

      (2) Figure 2: The chemogenetic experiments using AAV-hSyn-Gq-DREADDs lack data or images, or hit maps showing viral spread across animals. This omission is critical given the small size of SuM, where viral spread directly determines which neurons are manipulated. Without this, it is difficult to interpret findings in the context of prior studies on SuM circuits involved in threats and rewards.

      Please see Supplemental Figure 2 for the infection area of AAV.

      (3) Figure 3: The TRAP experiments show that the number of labeled neurons following foot shock (Figure 3F) is approximately double that of baseline home-cage animals, though y-axis scaling complicates interpretation. It is unclear whether this reflects true Fos induction, low TRAP efficiency, or baseline recombination.

      We thank the reviewer for pointing out the axis scaling issue. We have modified the y-axis to start from 0. The SuM nucleus has been reported to play role in the awake of rodents, it’s reasonable to have some basal neuronal activation after 4-OHT i.p. injection.

      Overlap analyses are also limited. For example, it is not shown what proportion of foot shock SANs are reactivated by subsequent foot shock. Comparisons of Fos induction after sucrose reward are also weakened by the very low Fos signal observed. If sucrose reward does not robustly induce Fos in SuM, its utility in distinguishing reward- versus stress-activated neurons is questionable. Thus, conclusions about overlap between SANs and socially stressed neurons remain uncertain due to the missing quantification of Fos+ populations.

      Thank you for the question. We have replaced the reactivation chance graph with a new reactivation percent analysis graph to show the proportion of SANs that reactivated by subsequent sucrose reward or stress. The rationale we use social stress other than foot shock is to show the potential generality of foot-shock tagged neurons. The lower expression of cFos after sucrose exposure suggest first, the SuM may not involve in reward regulation, which we agree with you; second, those SANs are more likely to modulate anxiety-like behavior but not reward.

      (4) Supplemental Figure 3: The claim that "SANs in the SuM encode anxiety but not fear memory" is not well supported. Inhibition of SANs (Gi-DREADDs) did not alter freezing behavior, but the absence of change could reflect technical issues (e.g., insufficient TRAP efficiency, low expression of Gi-DREADDs). Moreover, the manuscript does not provide a positive control showing that SuM SANs inhibition alters anxiety-like behavior, making it difficult to interpret the negative result. Prior work (Escobedo et al., eLife 2024) suggests SuM neurons drive active responses, not freezing, raising further interpretive questions.

      We agree that here we didn’t provide enough data to confirm there is no regulation effect of SuM-SANs on fear memory. Relevant statement has been removed to avoid any further misunderstanding.

      (5) Figure 4: The statement that corticosterone concentration is "usually used to estimate whether an individual is anxious" (line 236) is an overstatement. Corticosterone fluctuates dynamically across the day and responds to a broad range of stimuli beyond anxiety.

      Thank you for your kind reminder. Corticosterone/cortisol, the primary stress hormone, is a well-established biomarker whose levels are elevated in response to stress and in anxiety states.[11, 12]. Some studies also reported that supplying corticosterone can produce anxiety-like behaviors in rodents[13–16]. We collect the blood sample at the same timepoint in Figure 4 C-D. We agree that line 236 is a kind of overstatement and has modified.

      (6) Figures 5-6: The conclusion that vSub neurons encode anxiety-like behavior is not firmly supported. Data from photo-activating terminals in SuM is shown for ex vivo recording, but not in vivo behavior, which would strengthen support for this conclusion. Both vSub and dSub neurons responded to foot shock. The key evidence comes from apparent differential recruitment during open-arm exploration. However, the timing appears to lag arm entry, no data are provided for closed-arm entry, and there is heterogeneity across animals. These limitations reduce confidence in the authors' central claim regarding vSub-specific encoding of anxiety.

      We thank the reviewer for this important point. To address the concern regarding the in vivo behavioral encoding specificity of the vSub-SuM pathway, we further analyzed the in vivo fiber photometry data. The new analysis revealed that calcium activity in vSub-SuM projection neurons exhibited bidirectional, instantaneous, and specific changes during transitions between the open and closed arms of the elevated plus maze: their activity significantly and immediately decreased when mice moved from the open arm to the closed arm (new results shown in Supplemental Figure 5), and conversely, significantly and immediately increased upon transitioning from the closed to the open arm. However, under the same behavioral events, dSub-SuM projection neurons showed no significant change in activity. We hope this finding could strengthens the role of the vSub-SuM pathway in encoding anxiety-like behavior.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      (1) From the data presented, the authors conclude that "the SuM is the critical brain region that regulates anxiety" (line 190). This interpretation appears overstated, as it downplays well-established contributions of other brain regions and does not place SuM's role within a broader network context. The data support that SuM neurons are recruited by foot shock and, to a lesser extent, by acute social stress. However, the alterations in activity of SuM subpopulations following chronic stress reported in Figure 1 remain largely unexplored, limiting insight into their functional relevance.

      Thank you for the suggestion. We have modified the line 190 with cautious “In this study, we combined multiple methods to determine whether the SuM is a brain region that involve in modulating anxiety.”

      (2) The limited temporal resolution of DREADD-based manipulations leaves alternative explanations untested. For example, if SANs encode signals of threat, generalized stress, or nociception, then prolonged activation could indirectly alter behavior in the open field and EZM assays, rather than reflecting direct anxiety regulation.

      We discussed the DREADD method in the first part in our response.

      (3) The conclusion that "SuM store information about stress but not memory" (line 240) is not fully supported, particularly with respect to possible roles in memory. The lack of a role in memory of events, as opposed to the output of threat or stress memory, may be true, but is functionally untested in presented experiments. The data do indicate activation of the SuM neuron by foot shock, which has been previously reported (Escobedo et al eLife 2024). The changes in SuM activity following chronic stress (Figure 1) are intriguing, but their relationship to "stress information storage" is not clearly established.

      Thank you for your valuable comments. Foot-shock-activated neurons may play role in modulate any of the following anxiety-like behaviors and emotional memory (fear memory). We realized that we didn’t fully test all aspects of anxiety and memory, thus resulting in some overstatements in the manuscript. It is more proper to focus on “anxiety avoidance” according to the reduced open-arm exploration in EZM/EPM.

      Reviewer #2 (Public review):

      This manuscript investigates the neural mechanisms of anxiety and identifies the supramammillary nucleus (SuM) as a critical hub in mediating anxiety-related behaviors. The authors describe a population of neurons in the SuM that are activated by acute and chronic stress. While their activity is not required for fear memory recall, reactivation of these neurons after chronic stress robustly increases anxiety-like behaviors as well as physiological stress markers. Circuit analysis further shows that these stress-activated neurons are driven by inputs from the ventral, but not dorsal, subiculum, and inhibition of this pathway exerts an anxiolytic effect.

      The study provides an elegant integration of techniques to link stress, neuronal ensembles, and circuit function, thereby advancing our understanding of the neural substrates of anxiety. A particularly notable point is the selective role of these stress-activated neurons in anxiety, but not in associative fear memory, which highlights functional distinctions between neural circuits underlying anxiety and fear.

      Some aspects would benefit from clarification. For example, how selective is the recruitment of this population to stress compared with other aversive states, and how should one best interpret their definition as "stress-activated neurons" given the relatively modest overlap across stress exposures? In addition, the use of the term "engram" in this context raises conceptual questions. Is it appropriate to describe a neuronal ensemble encoding an emotional state as an engram, a term usually tied to specific memory recall?

      Overall, this work makes a valuable contribution by identifying SuM stress-activated neurons and their ventral subiculum inputs as central elements of the circuitry underlying anxiety. These findings provide a valuable framework for future studies investigating anxiety circuitry and may inform the development of targeted interventions for stress-related disorders.

      We thank the reviewer for raising these important points. We agree that further clarification is warranted. In our study, we compared SAN reactivation across different stimuli: foot shock (acute physical stress), social stress (chronic psychosocial stress), and sucrose reward (non-aversive positive stimulus). As shown in Figure 3, SANs in the supramammillary nucleus (SuM) were significantly reactivated by social stress but not by sucrose reward. Moreover, the c-Fos response in SuM was markedly higher after foot shock compared to home cage controls (Figure 1). While we did not test all possible aversive states (e.g., pain, sickness), our data support that SuM SANs are preferentially recruited by stressors rather than by reward or neutral conditions. We acknowledge that the overlap across stress modalities is not complete, which may reflect differences in stress intensity, duration, or circuit engagement. Future work will systematically compare SAN recruitment across diverse aversive and non-aversive states to further define their selectivity.

      The term “stress-activated neurons” (SANs) here refers to neurons that are reliably activated by at least one type of stressor and can be reactivated by subsequent stress exposure. The partial overlap across stressors likely reflects the diversity of stress responses and the possibility that distinct subpopulations within SuM may encode different aspects of aversive experience. Importantly, chemogenetic activation of SANs was sufficient to induce anxiety-like behavior and elevate corticosterone (Figure 4), supporting their functional role in stress-related behavioral and physiological outputs. We have revised the manuscript to clarify that SANs represent a stress-responsive ensemble rather than a uniform population activated identically by all stressors.

      We appreciate the reviewer’s conceptual caution. In the revised manuscript, we intentionally avoided using the term “engram” to describe SANs. Our focus is on a stress-activated neuronal ensemble that drives anxiety-like behavior, not on memory recall per se. We refer to SANs as an “ensemble” or “population” rather than an engram, consistent with the TRAP-based labeling approach used to capture neurons activated during a specific experience. We agree that “engram” is best reserved for memory-encoding cells and will ensure this distinction remains clear throughout the text.

      Reviewer #3 (Public review):

      Weaknesses:

      The strength of some of the evidence is judged to be incomplete. The paper provides good evidence that SuM contains stress-responsive neurons, and the activity of these neurons increases some measure of anxiety-like behavior. However, the evidence that the vSub-SuM projection "encodes anxiety" and that the SuM is a key regulator of anxiety is judged to be incomplete. The claim that SuM generates an "anxiety engram" is also judged to be incompletely supported by the evidence. Namely, what is unclear is whether these cells/regions encode anxiety per se versus modulate behaviors (like exploration) that tend to correlate with anxiety. Since many brain regions respond to footshock and other stressors, the response of SuM to these stimuli is not strong evidence for a role in anxiety. I am not convinced that the identified SuM cells have a specific anxiety function. As the authors mention in the introduction, SuM regulates exploration and theta activity. Since theta potently regulates hippocampal function, there is the concern that SuM manipulations could have broad effects. As shown in Supplementary Figure 2, stimulating stress-responsive cells in SuM potently reduces general locomotor exploration. This raises concerns that the manipulation could have broader effects that go beyond just changes in anxiety-like behavior. Furthermore, the meaning of an "anxiety engram" is unclear. Would this engram encode stress, the sense of a potential threat, or the behavioral response? A more developed analysis of the behavioral correlates of SuM activity and the behavioral effects of SuM manipulations could give insight into these questions.

      We appreciate the reviewer’s thoughtful critique regarding the specificity of SuM’s role in anxiety and the interpretation of our findings. We acknowledge that SuM has broad functions, including regulating exploration and hippocampal theta. However, our data show that general SuM activation increases anxiety-like measures (reduced open-arm time in EZM, decreased center exploration in OF) without altering total locomotion (Fig. 2, Suppl. Fig. 2). The locomotor reduction in SAN activation experiments (Suppl. Fig. 2F–G) was observed alongside clear anxiety-like behavioral changes (e.g. suppressed reward seeking), suggesting that the effects are not solely due to motor suppression. We agree that the methods we used to estimate anxiety-like behaviors base on mice movement when testing, and this could be a shortage of this research when trying to link the data to anxiety. Therefore it will be more proper to interpret the results as modulation of anxiety-like behavior (anxiety related avoidance) but not anxiety itself. We have modified the manuscript to describe more precise to avoid overstatement.

      Our fiber photometry data (Fig. 5) show that vSub–SuM projection neurons increase activity specifically when mice enter open arms of the EZM—a behavioral transition associated with anxiety—whereas dSub–SuM projections do not. This activity correlates with anxiety-related behavior, not merely with movement or stress per se.

      We also agree that the term “engram” may be misleading in this context. In the manuscript, we refer to SANs as a “stress-activated neuronal ensemble” rather than an anxiety engram. Our data indicate that these neurons are recruited by stress and their reactivation produces more anxiety related avoidance to open arms. We have revised the text to avoid conceptual overreach and to clarify that SuM SANs likely contribute to a state of sustained anxiety/avoidance.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting, including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      Readers would also benefit from noting that the subjects were male in the abstract and discussion of the limitations of the exclusion of females.

      Thank you for the suggestion. We have included the full statistical detail in a separate sheet as Table 1. Also, we have modified the title of the manuscript to reflect the sex of the mice.

      Reviewer #1 (Recommendations for the authors):

      (1) In line 211, the authors state, "we recorded neuronal action potentials via multichannel extracellular recording while the mice were moving in the EPM, a traditional type of maze used to test anxiety in rodents,". However, it is unclear what data is presented in the paper, that is, extracellular recordings from SuM in mice on the elevated plus maze.

      We have deleted the description of multichannel recording data in EPM as the data was removed earlier.

      Minor corrections to the text and figures.

      (2) For bar plots, perhaps clarify how the data is presented. For example, in Figure 4, "The data in B, D, E and I-L are presented as the means {plus minus} SEMs," but this does not appear to be plotted as a mean with SEM error bars because the error bars cover all the values.

      Corrected.

      (3) In Figure 5, the white text for EGFP in panel B is very difficult to see.

      Corrected.

      (4) For Figure 5D, it would be helpful to more clearly specify which neurons in SuM were recorded from. Was it SANs or all SuM neurons?

      We did whole-cell recording on all SuM neurons.

      (5) Fos2A-iCreERT2 is mislabeled as "Fos2A-iCreERT" in the methods.

      Corrected.

      (6) The sentence at line 139 "To make sure foot shock induced anxiety won't last until manipulation, we subjected139mice to an acute stress protocol involving foot shocks and then performed the elevated plus140maze (EPM) and elevated zero maze (EZM) tests to evaluate anxiety on days 2 and 7," is unclear as written.

      Thank you for pointing this. We have modified the sentence to make it more clear. “To make sure mice are on similar basal condition while applying chemo-genetic manipulation, we subjected mice to an acute stress protocol involving foot shocks and then performed the elevated plus maze (EPM) and elevated zero maze (EZM) tests to evaluate anxiety on days 2 and 7 (Figure 4 A). The mice that experienced foot shocks showed decreases in the exploration time in the open arms on day 2. However, acute stress-induced anxiety was not detected on day 7 (Figure 4 B), which allow us to compare the reactivation of SANs produced anxiety-like behavior between groups at the same baseline.”

      (7) The details of the viral injections used for ex vivo electrophysiology are not sufficient to understand the experiment and the implications of the data. Which neurons (SANs?) are recorded from, what percent of those had inputs, were the sub-neurons globally labeled or just SANs?

      We performed whole-cell recording on global SuM neurons to show if the projection is innervated by glutamergic neurons in Sub as shown in Figure 5-B that the projection neurons in Sub are exclusively vglut1 expressed. Based on this aim of the experiment, we didn’t keep any neurons that were not response to the light stimulation, therefore can’t calculate the input percent in this case. We have added words to clearly show that we did global SuM neurons in Methods.

      (8) The scale used in Figure 6C renders that data unreadable. 120 to 40% changes in body weight are well beyond the variability in the data.

      We have modified the axis (90 to 110%) to show the body weight change clearer.

      (9) The dose of CNO used, 5 mg/kg, is high, and using lower doses or other DREADD ligands is worth considering.

      Thank you for your valuable comment. We have noticed that people are using relatively lower dose of CNO or other DREADD ligands that are reported much higher affinity and less side-effect. The dose of 5mg/kg was adapted from earlier papers that using DREADD and show no obvious side-effect in mice[17], e.g locomotion (S Figure 2B), in our experiments, so we keep using this dose in this project to make it consistent across different cohorts of experiments. We are switching to DCZ to avoid any potential side-effect of CNO in the following experiments based on this project.

      Reviewer #2 (Recommendations for the authors):

      This is a strong manuscript that provides important insights into the role of the supramammillary nucleus (SuM) and its inputs from the ventral subiculum in regulating anxiety. The combination of behavioral, imaging, electrophysiological, and circuit manipulation approaches is impressive, and the distinction the authors propose between anxiety-related and fear-related circuits is conceptually important.

      There are, however, some points that I think need clarification. The authors emphasize that the hippocampus is essential for fear memory recall, yet they do not directly evaluate whether the SuM-hippocampal pathway might contribute differentially to anxiety versus fear memory. Addressing this would help to explain where the dissociation between the two processes arises.

      Thank you for the suggestion. We realized that we didn’t collect enough data to exclude the role of those SANs on memory, especially fear memory, a memory formation bases on strong emotional training as aforementioned. The data and relevant discussion have been removed to avoid misunderstanding and overstatement.

      I am also not fully convinced about the definition of the "stress-activated neurons" (SANs). The overlap across repeated stress exposures is quite modest (around 20%), which suggests that this population may not be strictly stress-specific but rather a dynamic subset that is preferentially, though not exclusively, engaged by stress. Related to this, the use of the term "engram" raises conceptual questions. Since the classic engram refers to an ensemble encoding and recalling a specific memory, it is not obvious whether it is appropriate to apply the term to a neuronal population that appears to represent a persistent emotional state. The authors should consider justifying this choice of terminology more carefully or adopting a different term.

      Thank you for your important comments. Yes we agree that the SANs in this manuscript are more likely dynamic subset other than exclusive foot-stress engaged “engram”. That’s why we use “stress-activated neurons” but not “engram” to describe this neuronal ensemble. To avoid further misleading, we have made some modification to reduce the use of “engram” across the manuscript.

      Some parts of the text also need more precision. For example, the statement in lines 63-65 that "few studies have explored emotion-related engram cells" is potentially misleading, as most engram studies focus on memories with a strong emotional component. The rationale for this claim should be clarified.

      This sentence has been deleted since it is not necessary to link the text and misleading.

      In Figure 1, the choice of methods is also puzzling: cFos immunostaining is used after shock delivery, while electrophysiology is used for the CSDS paradigm. It would be helpful to explain why different readouts were chosen for different stress models, and whether this may affect the comparability of the results.

      Thank you for this important comment. In Figure 1, we aimed to demonstrate that both acute (foot shock) and chronic (CSDS) stressors can activate SuM neurons, using complementary methods (cFos for acute, in vivo recording for chronic). The reason we chose different method is that acute stress produces transit effect while chronic stress produces long-lasting effect. To our knowledge, cFos is a well-established marker for strong neuronal activation, but with short lifespan (~4-6 hours) and suits acute paradigm better. In vivo recording allows us to compare the neuronal activity before and after chronic experiments within subjects and has ability to reveal cumulative effect which cFos cannot. To address this, we have clarified in the text that the purpose of Figure 1 in Line 112-113: “To investigate if SuM would be responsive to diverse stressors, we next examined whether chronic stress, which different mechanism underlying…”

      Finally, some additional details would strengthen the presentation. The discussion of corticosterone and other physiological markers could be expanded to indicate whether these effects were robust across stress paradigms. Similarly, the relatively modest overlap between SANs activated by different stressors could be framed more explicitly as part of a broader principle of flexible ensemble recruitment in anxiety-related circuits.

      Thank you for your suggestion. We have added more discussion about the corticosterone and the flexibility of SANs in the manuscript. See Line 267-270: “The serum corticosterone concentration can be used as a marker of stress-induced change in the peripheral blood. Previous studies showed serum corticosterone can be increased by various stress stimulation [39–42]; meanwhile, intentionally supplementing the diet with corticosterone can induce anxiety-like behaviors in rodents[43].” and Line 275-281: “However, the reactivation rate of SANs caused by different stressor was relatively lower than the initial activation rate caused by foot shock (Figure 3). This suggests that stress-activated neuronal clusters may have more flexible recruitment principles, with only a small number of neurons potentially encoding emotional information, while most other neurons remain involved in encoding other neural activities. Studies in other field, particularly studies of memory engram, has shown that the sets of neurons activated during learning are dynamic and exhibit high flexibility [44, 45].”

      Overall, the work is of high quality and provides a valuable contribution to the field, but addressing these points would help sharpen the mechanistic claims and ensure that the conceptual framework is as clear and precise as the experimental data.

      Reviewer #3 (Recommendations for the authors):

      (1) Since increased SuM activity is hypothesized to mediate the effects of stress on anxiety-like behavior, a logical step would be to test for necessity by silencing the stress-activated SuM cells.

      We agree this is a logical and valuable experiment. While our current study focused primarily on the sufficiency of SuM/SAN activation to induce anxiety-like behavior, we acknowledge that inhibition experiments would provide critical complementary evidence for necessity. We have added a statement in the Discussion noting that “future studies should examine whether silencing SuM SANs, either during stress exposure or during anxiety testing, can prevent or reduce stress-induced anxiety”. This will help establish a more complete causal role.

      (2) Discuss what is meant by "anxiety engram" and what features of anxiety the labeled cells might encode.

      We concur that “stress-activated neuron (SAN)” is a more precise descriptor than “engram” in this context. We have revised the text to avoid the potentially misleading term “engram” and instead refer to a “stress-activated neuron”. The labeled cells are preferentially reactivated by stress (not reward), and their activation promotes both behavioral avoidance and physiological stress markers (corticosterone). They likely contribute to the maintenance of an anxious state under perceived threat, rather than encoding discrete threat cues or memories.

      (3) A more nuanced analysis of behavioral correlates of SuM activity and/or the behavioral effects of SuM manipulations would strengthen this paper.

      To provide a more nuanced understanding of the behavioral correlates, we have performed additional analyses on our fiber photometry data (now presented in Supplemental Figure 6). and have also planned additional experiments for the future study to deepen our understanding.

      References:

      (1) Jendryka M, Palchaudhuri M, Ursu D, van der Veen B, Liss B, Kätzel D, et al. Pharmacokinetic and pharmacodynamic actions of clozapine-N-oxide, clozapine, and compound 21 in DREADD-based chemogenetics in mice. Sci Rep. 2019;9.

      (2) Koike H, Demars MP, Short JA, Nabel EM, Akbarian S, Baxter MG, et al. Chemogenetic Inactivation of Dorsal Anterior Cingulate Cortex Neurons Disrupts Attentional Behavior in Mouse. Neuropsychopharmacology. 2016;41:1014–1023.

      (3) Guettier J-M, Gautam D, Scarselli M, Ruiz De Azua I, Li JH, Rosemond E, et al. A chemical-genetic approach to study G protein regulation of cell function in vivo. Proceedings of the National Academy of Sciences. 2009;106:19197–19202.

      (4) Wess J, Nakajima K, Jain S. Novel designer receptors to probe GPCR signaling and physiology. Trends Pharmacol Sci. 2013;34:385–392.

      (5) Kraeuter AK, Guest PC, Sarnyai Z. The Elevated Plus Maze Test for Measuring Anxiety-Like Behavior in Rodents. Methods in Molecular Biology, vol. 1916, Humana Press Inc.; 2019. p. 69–74.

      (6) Kraeuter AK, Guest PC, Sarnyai Z. The Open Field Test for Measuring Locomotor Activity and Anxiety-Like Behavior. Methods in Molecular Biology, vol. 1916, Humana Press Inc.; 2019. p. 99–103.

      (7) Wall PM, Messier C. Methodological and conceptual issues in the use of the elevated plus-maze as a psychological measurement instrument of animal anxiety-like behavior. Neurosci Biobehav Rev. 2001;25:275–286.

      (8) Carobrez AP, Bertoglio LJ. Ethological and temporal analyses of anxiety-like behavior: The elevated plus-maze model 20 years on. Neurosci Biobehav Rev. 2005;29:1193–1205.

      (9) Seibenhener ML, Wooten MC. Use of the open field maze to measure locomotor and anxiety-like behavior in mice. Journal of Visualized Experiments. 2015. 6 February 2015. https://doi.org/10.3791/52434.

      (10) Prut L, Belzung C. The open field as a paradigm to measure the effects of drugs on anxiety-like behaviors: A review. Eur J Pharmacol. 2003;463:3–33.

      (11) Chen Y, Zhou X, Chu B, Xie Q, Liu Z, Luo D, et al. Restraint Stress, Foot Shock and Corticosterone Differentially Alter Autophagy in the Rat Hippocampus, Basolateral Amygdala and Prefrontal Cortex. Neurochem Res. 2024;49:492–506.

      (12) Hassell JE, Nguyen KT, Gates CA, Lowry CA. The Impact of Stressor Exposure and Glucocorticoids on Anxiety and Fear. Curr. Top. Behav. Neurosci., vol. 43, Springer; 2019. p. 271–321.

      (13) Peng B, Xu Q, Liu J, Guo S, Borgland SL, Liu S. Corticosterone attenuates reward-seeking behavior and increases anxiety via D2 receptor signaling in ventral tegmental area dopamine neurons. Journal of Neuroscience. 2021;41:1566–1581.

      (14) Myers B, Greenwood-Van Meerveld B. Elevated corticosterone in the amygdala leads to persistant increases in anxiety-like behavior and pain sensitivity. Behavioural Brain Research. 2010;214:465–469.

      (15) Demuyser T, Deneyer L, Bentea E, Albertini G, Van Liefferinge J, Merckx E, et al. In-depth behavioral characterization of the corticosterone mouse model and the critical involvement of housing conditions. Physiol Behav. 2016;156:199–207.

      (16) Shoji H, Maeda Y, Miyakawa T. Chronic corticosterone exposure causes anxiety- and depression-related behaviors with altered gut microbial and brain metabolomic profiles in adult male C57BL/6J mice. Molecular Brain . 2024;17.

      (17) Manvich DF, Webster KA, Foster SL, Farrell MS, Ritchie JC, Porter JH, et al. The DREADD agonist clozapine N-oxide (CNO) is reverse-metabolized to clozapine and produces clozapine-like interoceptive stimulus effects in rats and mice. Sci Rep. 2018;8.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      GID/CTLH-type RING ligases are huge multi-protein complexes that play an important role in protein ubiquitylation. The subunits of its core complex are distinct and form a defined structural arrangement, but there can be variations in subunit composition, such as exchange of RanBP9 and RanBP10. In this study, van gen Hassend and Schindelin provide new crystal structures of (parts of) key subunits and use those structures to elucidate the molecular details of the pairwise binding between those subunits. They identify key residues that mediate binding partner specificity. Using in vitro binding assays with purified protein, they show that altering those residues can switch specificity to a different binding partner.

      Strengths:

      This is a technically demanding study that sheds light on an interesting structural biology problem in residue-level detail. The combination of crystallization, structural modeling, and binding assays with purified mutant proteins is elegant and, in my eyes, convincing.

      Weaknesses:

      I mainly have some suggestions for further clarification, especially for a broad audience beyond the structural biology community.

      We thank the reviewer for the careful evaluation of our manuscript and for the positive and encouraging assessment of our work. We also thank the reviewer for the constructive suggestions to improve clarity for a broader audience and have revised the manuscript accordingly.

      (1) The authors establish what they call an 'engineering toolkit' for the controlled assembly of alternative compositions of the GID complex. The mutagenesis results are great for the specific questions asked in this manuscript. It would be great if they could elaborate on the more general significance of this 'toolkit' - is there anything from a technical point of view that can be generalized? Is there a biological interest in altering the ring composition for functional studies?

      We thank the reviewer for raising this important point. Beyond addressing the specific pairwise assembly mechanisms analyzed in this study, we agree that the broader significance of this engineering toolkit warrants further discussion. The residue-level understanding of CTLH-CRA interfaces not only explains assembly specificity but also enables rational manipulation of ring composition in a controlled manner. We have therefore expanded the end of the discussion section to outline generalizable strategies for CRA-interface disruption and to highlight potential biological applications of altering ring composition for functional studies.

      (2) Along the same lines, the mutagenesis required to rewire Twa1 binding was very complex (8 mutations). While this is impressive work, the 'big picture conclusion' from this part is not as clear as for the simpler RanBP9/10. It would be great if the authors could provide more context as to what this is useful for (e.g., potential for in vivo or in vitro functional studies, maybe even with clinical significance?)

      We thank the reviewer for this important comment and agree that the broader implications of the more complex Twa1 rewiring were not sufficiently emphasized in the original manuscript. Through the competition ITC experiments (Fig. 5), we aimed to demonstrate a concrete application of the Twa1. At the same time, we recognize that additional use cases are conceivable. To address this point, we have expanded the discussion section to clarify the conceptual significance of Twa1 rewiring and briefly outline further potential applications of controlled interface manipulation. These additions aim to better contextualize the broader relevance of this approach beyond the specific mechanistic questions addressed in this study.

      (3) For many new crystal structures, the authors used truncated, fused, or otherwise modified versions of the proteins for technical reasons. It would be helpful if the authors could provide reasoning why those modifications are unlikely to change the conclusions of those experiments compared to the full-length proteins (which are challenging to work with for technical reasons). For instance, could the authors use folding prediction (AlphaFold) that incorporates information of their resolved structures and predicts the impact of the omitted parts of the proteins? The authors used AlphaFold for some aspects of the study, which could be expanded.

      We agree with the reviewer that the transferability of the domain constructs to the corresponding full-length proteins is an important consideration. In the original version of the manuscript, we addressed this point by fitting the experimentally determined CTLH-CRA domain structures of muskelin and RanBP9 into the cryo-EM maps of the full-length complexes (Fig. 5d), demonstrating that the applied truncations and fusion strategies are compatible with the architecture observed in the intact assembly. Following the reviewer’s suggestion, we have further strengthened this analysis by adding a new Supplementary Figure 1. In this figure, the experimentally determined CTLH-CRA domain structures are superposed with full-length AlphaFold predictions. This comparison shows that removal of flexible linker regions, such as those between the CTLH and CRA motifs or at terminal segments, does not alter the overall fold or the binding interfaces of the domains. Together, these analyses support the conclusion that the domain constructs faithfully represent the structural and interaction properties of the full-length proteins.

      Reviewer #2 (Public review):

      Summary:

      This is a very interesting study focusing on a remarkable oligomerization domain, the LisH-CTLH-CRA module. The module is found in a diverse set of proteins across evolution. The present manuscript focuses on the extraordinary elaboration of this domain in GID/CTLH RING E3 ubiquitin ligases, which assemble into a gigantic, highly ordered, oval-shaped megadalton complex with strict subunit specificity. The arrangement of LisH-CTLHCRA modules from several distinct subunits is required to form the oval on the outside of the assembly, allowing functional entities to recruit and modify substrates in the center. Although previous structures had shown that data revealed that CTLH-CRA dimerization interfaces share a conserved helical architecture, the molecular rules that govern subunit pairing have not been explored. This was a daunting task in protein biochemistry that was achieved in the present study, which defines this "assembly specificity code" at the structural and residue-specific level.

      The authors used X-ray crystallography to solve high-resolution structures of mammalian CTLH-CRA domains, including RANBP9, RANBP10, TWA1, MAEA, and the heterodimeric complex between RANBP9 and MKLN. They further examined and characterized assemblies by quantitative methods (ITC and SEC-MALS) and qualitatively using nondenaturing gels. Some of their ITC measurements were particularly clever and involved competitive titrations and titrations of varying partners depending on protein behavior. The experiments allowed the authors to discover that affinities for interactions between partners is exceptionally tight, in the pM-nM range, and to distill the basis for specificity while also inferring that additional interactions beyond the LisH-CTLH-CRA modules likely also contribute to stability. Beyond discovering how the native pairings are achieved, the authors were able to use this new structural knowledge to reengineer interfaces to achieve different preferred partnerings.

      Strengths:

      Nearly everything about this work is exceptionally strong.

      (1) The question is interesting for the native complexes, and even beyond that, has potential implications for the design of novel molecular machines.

      (2) The experimental data and analyses are quantitative, rigorous, and thorough.

      (3) The paper is a great read - scholarly and really interesting.

      (4) The figures are exceptional in every possible way. They present very complex and intricate interactions with exquisite clarity. The authors are to be commended for outstanding use of color and color-coding throughout the study, including in cartoons to help track what was studied in what experiments. And the figures are also outstanding aesthetically.

      Weaknesses:

      There are no major weaknesses of note, but I can make a few recommendations for editing the text.

      We are very grateful to the reviewer for this exceptionally positive and thoughtful assessment of our work. We sincerely appreciate the recognition of both the conceptual scope and the technical depth of the study. We are particularly encouraged by the reviewer’s comments regarding the clarity and presentation of the figures. Considerable effort went into ensuring that the structural and biochemical complexity of the CTLH assemblies could be conveyed in a clear and accessible manner, and we are grateful that this was appreciated. We thank the reviewer for the constructive recommendations for textual improvements.

      Reviewer #3 (Public review):

      Summary:

      Protein complexes, like the GID/CTLH-type E3 ligase, adopt a complex three-dimensional structure, which is of functional importance. Several domains are known to be involved in shaping the complexes. Structural information based on cryo-EM is available, but its resolution does not always provide detailed information on protein-protein interactions. The work by van gen Hassend and Schindelin provides additional structural data based on crystal structures.

      Strengths:

      The work is solid and very carefully performed. It provides high-resolution insights into the domain architecture, which helps to understand the protein-protein interactions on a detailed molecular level. They also include mutant data and can thereby draw conclusions on the specificity of the domain interactions. These data are probably very helpful for others who work on a functional level with protein complexes containing these domains.

      Weaknesses:

      The manuscript contains a lot of useful, very detailed information. This information is likely very helpful to investigate functional and regulatory aspects of the protein complexes, whose assembly relies on the LisH-CTLHCRA modules. However, this goes beyond the scope of this manuscript.

      We thank the reviewer for the detailed review of our manuscript and for the constructive and positive remarks. We greatly appreciate the recognition of the high-resolution structural insights and the value of combining crystallographic data with mutational analyses to elucidate domain-specific interactions. We are also grateful for the acknowledgment that these findings may serve as a useful resource for future functional and regulatory studies of LisH-CTLH-CRA-containing complexes. While such aspects extend beyond the immediate scope of the present study, we hope that the structural framework provided here will facilitate and inspire future investigations addressing these questions.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) For the ITC measurements that are less accurate, the authors may want to represent that in the figures with an approximate sign.

      We thank the reviewer for this helpful suggestion. After consideration, we decided not to introduce an approximate sign in the main figures, as this would be inconsistent with the graphical conventions used throughout the manuscript (there is also no equal sign). Since the associated errors are reported directly alongside each K<sub>D</sub> value, we believe that the precision of the measurements is sufficiently conveyed. However, we agree that explicitly marking estimated values can be appropriate in specific cases. We have therefore added approximate signs in Supplementary Fig. 5 for the K<sub>D</sub> estimation of self-association.

      (2) The names of the proteins are from mammals and should probably be capitalized.

      We agree that capitalization is generally appropriate for mammalian protein names. In particular, for proteins such as Rmnd5a, which is identical in sequence between mouse and human, the use of human-style nomenclature would indeed be fully justified. Originally, we chose the current nomenclature to distinguish the proteins studied here from strictly human versions, as most constructs are derived from mouse and one (muskelin) from rat. This approach also avoids inconsistencies between the mouse and rat proteins within the manuscript and maintains alignment with the nomenclature used in our previous publications. For the sake of consistency and continuity, we have therefore retained the original formatting throughout the manuscript.

      (3) For the sequence alignments, it would be good to specify in the legend which organisms these are from, and where the differences are in mouse and rat proteins used in the study, and the human proteins.

      We appreciate this constructive suggestion. We have revised the sequence alignment legends to clearly specify the organism of origin for each sequence included in the analysis. In addition, we have added a new Supplementary Figure 1 presenting the AlphaFold predictions of the mouse proteins and rat muskelin used in this study. Within these models, sequence differences relative to the human proteins are indicated, and variations within the CTLH-CRA domains are explicitly annotated. These additions clarify how the constructs analyzed here relate to their human counterparts.

      (4) A few points about the referencing:

      (a) It was reference 27 that first described the dual-sided interactions where the CRA domain weaves back and forth such that CTLH-CRAN and LisH-CRAC mediate the contacts on the two sides. This should be cited.

      We fully agree and added the reference accordingly.

      (b) To this reviewer's knowledge, it was references 13 and 9 that resolved the daisy-chain of helical LisH-CTLHCRA interactions around the oval helical structures.

      We agree with the reviewer that references 13 and 9 resolved the helical LisH-CTLH-CRA daisy-chain arrangement around the oval structure. Reference 13 was already cited in the original manuscript, and we have now added reference 9 to appropriately acknowledge this contribution. We have retained reference 14, although it did not resolve the helical daisy-chain architecture, as it described a related oval assembly of CTLH complex components that remains relevant in the structural context discussed.

      (c) A cryo-EM map with RANBP10 was shown at low resolution in reference 8.

      We agree with the reviewer that a low-resolution cryo-EM map including RANBP10 was reported in reference 8. Our original wording was not sufficiently precise and may have given the impression that RANBP10 had not been characterized. Our intention was to convey that, although cryo-EM maps exist, detailed atomic-level information on subunit interfaces was lacking. We have revised the paragraph accordingly to clarify this point and now cite reference 8 explicitly in this context.

      (d) The Discussion requires referencing.

      We agree with the reviewer that additional referencing improves the clarity and contextualization of the Discussion. We have revised the Discussion section accordingly and added appropriate references to support the statements made.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Diana et al. present a Monte Carlo-based method to perform spike inference from calcium imaging data. A particular strength of their approach is that they can estimate not only averages but also uncertainties of the modeled process. The authors focus on the quantification of spike time uncertainties in simulated data and in data recorded with high sampling rate in cebellar slices with GCaMP8f, and they demonstrate the high temporal precision that can be achieved with their method to estimate spike timing.

      Strengths:

      - The author provide a solid ground work for sequential Monte Carlo-based spike inference, which extends previous work of Pnevmatikakis et al., Greenberg et al. and others.

      - The integration of two states (silence vs. burst firing) seems to improve the performance of the model.

      - The acquisition of a GCaMP8f dataset in cerebellum is useful and helps make the point that high spike time inference precision is possible under certain conditions.

      Weaknesses:

      - Although the algorithm is compared (in the revised manuscript) to other models to infer individual spikes (e.g., MLSpike), these comparisons could be more comprehensive. Future work that benchmarks this and other algorithms under varying conditions (e.g., noise levels, temporal resolution, calcium indicators) would help assess and confirm robustness and useability of this algorithm.

      The metrics used for comparison follow the field's benchmarking conventions (see the CASCADE paper, Rupprecht et al. 2021). Indeed, improved standardized methods would be ideal to develop, which is beyond the scope of this manuscript.

      - The mathematical complexity underlying the method may pose challenges for experimentalist who may want to use the methods for their analyses. While this is not a weakness of the approach itself, this highlights the need for further validation and benchmarking in future work, to build user confidence.

      We acknowledge the challenges of understanding the mathematics underlying our method, but such a study is necessary to ensure its accuracy and reliability. Indeed, we will strive to improve the technique's user-friendliness in future instantiations.

      Reviewer #2 (Public review):

      Summary:

      Methods to infer action potentials from fluorescence-based measurements of intracellular calcium dynamics are important for optical measurements of activity across large populations of neurons. The variety of existing methods can be separated into two broad classes: a) model-independent approaches that are trained on ground truth datasets (e.g., deep networks), and b) approaches based on a model of the processes that link action potentials to calcium signals. Models usually contains parameters describing biophysical variables, such as rate constants of the calcium dynamics and features of the calcium indicator. The method presented here, PGBAR, is model-based and uses a Bayesian approach. A novelty of PGBAR is that static parameters and state variables are jointly estimated using particle Gibbs sampling, a sequential Monte Carlo technique that can efficiently sample the latent embedding space.

      Strengths:

      A main strength of PGBAR is that it provides probability distributions rather than point estimates of spike times. This is different from most other methods and may be an important feature in cases when estimates of uncertainty are desired. Another important feature of PGBAR is that it estimates not only the state variable representing spiking activity, but also other variables such as baseline fluctuations and stationary model variables, in a joint process. PGBAR can therefore provide more information than various other methods. The information in the github repository is well-organized.

      Weaknesses:

      On the other hand, the accuracy of spike train reconstructions is not higher than that of other model-based approaches, and clearly lower than the accuracy of a model-independent approach based on a deep network. The authors demonstrate convincingly that PGBAR can resolve inter-spike intervals in the range of 5 ms using fluorescence data obtained with a very fast genetically encoded calcium indicator at very high sampling rates (line scans at >= 1 kHz).

      In the revision, Figure 9 shows that temporal accuracy is very similar between PGBAR and the supervised method, CASCADE, and that PGBAR has a lower false positive rate. These results support the effectiveness of unsupervised Monte Carlo sampling, even with a simple autoregressive model.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I'd like to thank the authors for their revisions. Their comments have addressed all my concerns, and I thank them for the clarifications. I have no further comments, except a few minor notes that the authors may consider or not:

      - The paragraph starting in line 367 is newly written and not yet as clear and mature as other parts of the manuscript. It is at several sentences roughly clear what it is about, but the precision of the wording is lacking. For example "distributions of the average time from ground-truth" seems a bit unclear, maybe "distributions of the average time of estimate spikes from ground-truth spikes" instead. Similarly, "the false detection rate, defined as the difference between detected and ground-truth spikes ..." could be rephrased using the difference between "numbers of spikes" instead of the difference between "spikes". But all of this is minor.

      - In the new Figure 9A, the error bars for the MLSpike method seem to be absent. In the same figure legend, it should be "excess" instead of "excess".

      We thank the reviewer for the feedback. We revised the wording of the new paragraph in response to the reviewer’s suggestions, restored the missing error bar in Figure 9, and corrected the figure legend.

      Reviewer #2 (Recommendations for the authors):

      Comparison to CASCADE: as far as I know there are no CASCADE models that have been trained on ground truth data in the regime of very fast (line scan) sampling, which is rarely used. A fair comparison of spike time estimates between PGBAR and CASCADE should take this into account. This can be done by training a new CASCADE model using the dataset of this paper. Given that performance of PGBAR and CASCADE is very similar already now (except for the false positive rate), a CASCADE model optimized for high sampling rate may be expected to catch up with (or even exceed) the performance of PGBAR. At a minimum, this possibility should be discussed.

      While this may be true, retraining a CASCADE model on high-frequency ground-truth data is beyond the scope of this manuscript. Indeed, a retrained CASCADE model optimized for line-scan or GCaMP8f data could improve performance and potentially match or exceed PGBAR, particularly in reducing false positives.

      Our aim, however, is not to benchmark supervised methods under their optimal retraining conditions, but to provide an unsupervised alternative that does not rely on labeled training data. In practice, retraining supervised models is constrained by the availability of suitable ground-truth datasets and by the uncertainty in how the method generalizes to acquisition regimes that differ substantially from the training set.

      We have therefore added a sentence in the Discussion (at the end of the subsection Comparison with benchmark datasets):

      [...] “While retraining supervised methods such as CASCADE on high-frequency or GCaMP8f ground-truth datasets could further improve its performance, limitations in dataset availability and generalization across acquisition regimes motivate complementary, training-free approaches such as PGBAR.”

      As stated in the manuscript, future extensions, such as using nonlinear biophysical models as the generative model for Monte Carlo–based inference, may further improve spike estimation accuracy.

    1. Author response:

      We thank the reviewing editor and the reviewers for their careful evaluation of our manuscript “Early sleep dependent sensory gating in the olfactory system”, and for their constructive feedback. We are encouraged by the overall positive assessment of the work.

      In the revised version, we will address all the points raised by the reviewers. Below, we outlined the main aspects of the revision.

      (1) Contextualization within prior literature.

      We will expand the text to better situate our findings within the existing literature and clarify the specific contribution of our work, particularly with respect to state dependent changes in olfactory bulb activity.

      (2) Distinction between sleep and urethane anaesthesia.

      We will revise the text to more clearly distinguish findings obtained during natural sleep from those obtained under urethane anaesthesia. While avoiding direct equivalence between states, we will clarify that the comparison is intended to highlight shared features of slow wave brain dynamics associated with sensory gating.

      (3) Clarification of analytical methods and statistical criteria.

      We will provide additional details regarding normalisation procedures, surrogate based analysis, and statistical criteria used to assess the presence or absence of coherence and phase amplitude coupling, ensuring consistency across figures.

      (4) Improvements in figures in terminology.

      We will revise figure annotations to improve clarity (axis, colour scales, units and labelling) and ensure consistent terminology throughout the manuscript.

      We believe these revisions will further strengthen the manuscript while preserving its central conclusions.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public Review):

      Strengths

      (1) The definition of highly variable yet highly reproducible sulci such as the slocs-v feeds the community with new anatomo-functional landmarks (which is emphasized by the provision of a probability map in supp. mat., which in my opinion should be proposed in the main body).

      We agree with Reviewer 2 that there is merit to including the probability maps as a main text Figure rather than Supplementary Figure. We have now added it to the main text.

      Weaknesses

      (1) While the identification of the sulci has been done thoroughly with expert validation, the sulci have not been labeled in a way that enables the demonstration of the reproducibility of the labeling.

      Our group was unable to use an approach amenable to calculating inter-rater agreements to expedite the process of defining thousands of sulci at the individual level in multiple regions as this was our first study comprehensively documenting the sulcal organization of this region. Nevertheless, our method followed a rigorous, three-tiered procedure to ensure accurate sulcal definitions were identified in all participants. In the case of this study, authors YT and TG first defined sulci. These sulci were then checked by a trained expert (EHW). Finally, sulcal definitions were finalized by the senior author, an expert neuroanatomist (KSW). We emphasize that this process has produced reproducible anatomical results when charting other regions such as posteromedial cortex (Willbrand et al., 2023 Science Advances; Willbrand et al., 2023 Communications Biology; Maboudian et al., 2024 The Journal of Neuroscience; Ramos Benitez et al., 2024 Neuropsychologia), ventral temporal cortex (Miller et al., 2020 Scientific Reports; Parker et al., 2023 Brain Structure and Function), and lateral prefrontal cortex (Miller et al., 2021 The Journal of Neuroscience; Voorhies et al., 2021 Nature Communications; Yao et al., 2022 Cerebral Cortex; Willbrand et al., 2022 Brain Structure and Function; Willbrand et al., 2023 The Journal of Neuroscience; Willbrand et al., 2024 Brain Structure and Function) across age groups, species, and clinical populations. For the present study, by the time the final tier of our method was reached, we emphasize that a very small percentage (~2%) of sulcal definitions were actually modified. We will include an exact percentage in future publications in LPC/LOPJ.

      Our Methods have been edited to describe these features (Pages 21-22):

      “As this is the first time the sulcal expanse of LPC/LOPJ was comprehensively charted with a focus on pTS, the location of each sulcus was confirmed through a three-tiered procedure for each participant in each hemisphere. First, trained independent raters (Y.T. and T.G.) identified sulci. Second, these definitions were checked by a trained expert (E.H.W.). Third, these labels were finalized by a neuroanatomist (K.S.W.). We emphasize that this procedure has produced reproducible results in our prior work across the cortex (Miller et al. 2021; Voorhies et al. 2021; Yao et al. 2022; Willbrand et al. 2023; Willbrand et al. 2022; Willbrand et al. 2024; Parker et al. 2023; Miller et al. 2020; Willbrand et al. 2022; Willbrand et al. 2023; Maboudian et al. 2024; Ramos Benitez et al. 2024). All LPC sulci were then manually defined and saved as .label files in FreeSurfer using tksurfer tools, from which morphological and anatomical features were extracted. We defined LPC/LPOJ sulci for each participant based on the most recent schematics of sulcal patterning by Petrides (2019) as well as pial, inflated, and smoothed white matter (smoothwm) FreeSurfer cortical surface reconstructions of each individual. In some cases, the precise start or end point of a sulcus can be difficult to determine on a surface (Borne et al., 2020); however, examining consensus across multiple surfaces allowed us to clearly determine each sulcal boundary in each individual. For four example hemispheres with these 13-17 sulci identified, see Fig. 1a (Supplementary Fig. 5 for all hemispheres). The specific criteria to identify the slocs and pAngs are outlined in Fig. 1b.”

      Reviewer #3 (Public Review):

      Weaknesses

      (1) The numbers of subjects are inherently limited both in number as well as in being typically developing young adults.

      First, although the sample size of the present study is small in number in comparison to large N, group-level neuroimaging analyses, it is comparable to precision neuroimaging studies examining sulcal features in individual participants (for example, Cachia et al., 2021 Frontiers in Neuroanatomy; Garrison et al., 2015 Nature Communications; Lopez-Persem et al., 2019 The Journal of Neuroscience; Miller et al., 2021 The Journal of Neuroscience; Roell et al., 2021 Developmental Cognitive Neuroscience; Voorhies et al., 2021 Nature Communications; Weiner, 2019 The Anatomical Record; Willbrand, et al., 2022 Science Advances; Willbrand, et al., 2022 Brain Structure & Function; Yao et al., 2022 Cerebral Cortex). We discuss this point in detail in the Limitations subsection of the Discussion (Page 17):

      “This manual method is also arduous and time-consuming, which, on the one hand, limits the sample size in terms of number of participants, while on the other, results in thousands of precisely defined sulci. This push-pull relationship reflects a broader conversation in the human brain mapping and cognitive neuroscience fields between a balance of large N studies and “precision imaging” studies in individual participants (Gratton et al., 2022; Naselaris et al., 2021; Rosenberg and Finn, 2022). Though our sample size is comparable to other studies that produced reliable results relating sulcal morphology to brain function and cognition (for example, Cachia et al., 2021; Garrison et al., 2015; Lopez-Persem et al., 2019; Miller et al., 2021; Roell et al., 2021; Voorhies et al., 2021; Weiner, 2019; Willbrand et al., 2022a, 2022b; Yao et al., 2022), ongoing work that uses deep learning algorithms to automatically define sulci should result in much larger sample sizes in future studies (Borne et al., 2020; Lee et al., 2024, 2025; Lyu et al., 2021). The time-consuming manual definitions of primary, secondary, and PTS also limit the cortical expanse explored in each study, thus restricting the present study to LPC/LPOJ.”

      Second, we utilized a young adult sample as this is what is the standard of the field when charting features of sulci for the first time (for example, Paus et al., 1996 Cerebral Cortex; Chiavaras & Petrides, 2000 Journal of Comparative Neurology; Segal & Petrides, 2012 European Journal of Neuroscience; Zlatkina & Petrides, 2014 Proceedings of the Royal Society B Biological Science; Sprung-Much & Petrides, 2018 Brain Structure & Function; Miller et al., 2021 The Journal of Neuroscience; Willbrand et al., 2022 Science Advances; Willbrand et al., 2023 Communications Biology; Drudik et al., 2023 Cerebral Cortex). Nevertheless, it is indeed crucial to confirm that this schematic is translatable to other age groups; however this exploration is beyond the scope of the present project and is for future investigation. We have added text to the Limitations subsection of the Discussion to emphasize the points (Pages 17-18):

      “Additionally, the scope of the present study is limited in that the sample was only in young adults. This sample was selected as it is the standard of the field when charting features of sulci for the first time (for example, Paus et al. 1996; Chiavaras and Petrides 2000; Segal and Petrides 2012; Zlatkina and Petrides 2014; Sprung-Much and Petrides 2018; Miller et al. 2021; Willbrand et al. 2022; Willbrand et al. 2023; Drudik et al. 2023). Nevertheless, it is necessary to explore how well this updated schematic translates to different age groups, species, and clinical populations.”

      Finally, it is worth mentioning that we have begun preliminary analyses on the translatability of this schematic, and have shown that it does hold in a pediatric sample (ages 6-18 years old; Author response image 1).

      Author response image 1.

      Example pediatric participant with all LPC/LOPJ sulci identified in both hemispheres. Incidence rates for the variable pTS identified in the present work in a pediatric sample are included below (N = 79 participants)

      (2) While the paper begins by describing four new sulci, only one is explored further in greater detail.

      We focused on the slocs-v as it has a high incidence rate, making it amenable to our analytic pipelines relating sulci to cortical morphology, architecture, and function, as well as cognition (Miller et al., 2021 The Journal of Neuroscience; Voorhies et al., 2021 Nature Communications; Yao et al., 2022 Cerebral Cortex; Willbrand et al., 2022 Science Advances; Willbrand et al., 2023 The Journal of Neuroscience; Maboudian et al., 2024 The Journal of Neuroscience). However, we want to emphasize that throughout the paper there are multiple analyses that further describe the three more variable sulci: 1) detailing their sulcal patterning (Supplementary Tables 1-4) and 2) detailing their morphology and architecture (Supplementary Fig. 6). We do agree though that it is a worthwhile endeavor to further describe these sulci—especially if the data is readily available. As such, to complement our behavioral analysis identifying a relationship between the morphology of the consistent sulci and spatial orientation and considering the well-documented relationship between sulcal incidence and cognition (for review see Cachia et al., 2021 Frontiers in Neuroanatomy), we tested whether the number of variable sulci and the incidence of each variable sulcus specifically were related to spatial orientation. This procedure produced null results on all neuroanatomical variables, which we now mention in the Results (Page 11):

      “Finally, as in prior work examining variably-present PTS in other cortical expanses (for example, (Amiez et al., 2018; Cachia et al., 2014; Fornito et al., 2004; Willbrand et al., 2024b), we assessed whether the presence/absence of the more variable PTS identified in the present work (slocs-d, pAngs-v, and pAngs-d) was related to spatial orientation, reasoning, and processing speed task performance. We identified no significant associations between the presence/absence of these sulci in either hemisphere with performance on these tests (ps > .05).”

      (3) There is some tension between calling the discovered sulci new vs acknowledging they have already been reported, but not named.

      To resolve this tension, we have revised the text to 1) ensure proper acknowledgment that sulci have been noticed in this region, 2) point out that these sulci were left unnamed and undescribed, and 3) emphasize that one of the primary goals of this project was to comprehensively detail the sulcal organization of this region at a precise, individual-level considering these often-overlooked sulci.

      This is primarily done at the beginning of the Results (Pages 4-5), where we now write:

      “Four previously undescribed small and shallow sulci in the lateral parieto-occipital junction (LPOJ)

      In previous research in small sample sizes, neuroanatomists noticed shallow sulci in this cortical expanse, but did not describe them beyond including an unlabeled sulcus in their schematic at best (Supplementary Methods and Supplementary Figs. 1-4 for historical details). In the present study, we fully update this sulcal landscape considering these overlooked indentations. In addition to defining the 13 sulci previously described within the LPC/LPOJ, as well as the posterior superior temporal cortex in individual participants (Methods) (Petrides, 2019), we could also identify as many as four small and shallow PTS situated within the LPC/LPOJ that were highly variable across individuals and left undescribed until now (Supplementary Methods and Supplementary Figs. 1-4). Though we officially name and characterize features of these sulci in this paper for the first time, it is necessary to note that the location of these four sulci is consistent with the presence of variable “accessory sulci” in this cortical expanse mentioned in prior modern and classic studies (Supplementary Methods). For four example hemispheres with these 13-17 sulci identified, see Fig. 1a (Supplementary Fig. 5 for all hemispheres).”

      (4) The anatomy of the sulci, as opposed to their relation to other sulci, could be described in greater detail.

      To detail these sulci above and beyond their relation to other sulci, we document the anatomical metrics of all sulci in Supplemental Figure 6:

      Results (Page 8):

      The morphological and architectural features of all LPC/LPOJ sulci are described in Supplementary Fig. 6.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study investigates how neuropeptidergic signaling affects sleep regulation in Drosophila larvae. The authors first conduct a screen of CRISPR knock-out lines of genes encoding enzymes or receptors for neuropeptides and monoamines. As a result of this screen, the authors follow up on one hit, the hugin receptor, PK2-R1. They use genetic approaches, including mutants and targeted manipulations of PK2-R1 activity in insulin-producing cells (IPCs) to increase total sleep amounts in 2nd instar larvae. Similarly, dilp3 and dilp5 null mutants and genetic silencing of IPCs show increases in sleep. The authors also show that hugin mutants and thermogenetic/optogenetic activation of hugin-expressing neurons caused reductions in sleep. Furthermore, they show through imaging-based approaches that hugin-expressing neurons activate IPCs. A key finding is that wash-on of hugin peptides, Hug-γ and PK-2, in ex vivo brain preparations activates larval IPCs, as assayed by CRTC::GFP imaging. The authors then examine how the PK2-R1, hugin, and IPC manipulations affect adult sleep. Finally, the authors examine how Ca2+ responses through CRTC::GFP imaging in adult IPCs are influenced by the wash-on of hugin peptides. The conclusions of this paper are somewhat well supported by data, but some aspects of the experimental approach and sleep analysis need to be clarified and extended.

      Strengths:

      (1) This paper builds on previously published studies that examine Drosophila larval sleep regulation. Through the power of Drosophila genetics, this study yields additional insights into what role neuropeptides play in the regulation of Drosophila larval sleep.

      (2) This study utilizes several diverse approaches to examine larval and adult sleep regulation, neural activity, and circuit connections. The impressive array of distinct analyses provides new understanding into how Drosophila sleep-wake circuitry in regulated across the lifespan.

      (3) The imaging approaches used to examine IPC activation upon hugin manipulation (either thermogenetic activation or wash-on of peptides) demonstrate a powerful approach for examining how changes in neuropeptidergic signaling affect downstream neurons. These experiments involve precise manipulations as the authors use both in vivo and ex vivo conditions to observe an effect on IPC activity.

      Weaknesses:

      Although the paper does have some strengths in principle, these strengths are not fully supported by the experimental approaches used by the authors. In particular:

      (1) The authors show total sleep amount over an 18-hour period for all the measures of 2nd instar larval sleep throughout the paper. However, published studies have shown that sleep changes over the course of 2nd instar development, so more precise time windows are necessary for the analyses in this study.

      (2) Previously published reports of sleep metrics in both Drosophila larvae and adults include the average number of sleep episodes (bout number) and the average length of sleep episodes (bout length). Neither of these metrics is included in the paper for either the larval sleep or adult sleep data. Not including these metrics makes it difficult for readers to compare the findings in this study to previously published papers in the established Drosophila sleep literature.

      (3) Because Drosophila adult & larval sleep is based on locomotion, the authors need to show the activity values for the experiments supporting their key conclusions. They do show travel distances in Figure 2 - Figure Supplement 1, however, it is not clear how these distances were calculated or how the distances relate to the overall activity of individual larvae during sleep experiments. It is also concerning that inactivation of the PK2-R1-expressing neurons causes a reduction in locomotion speed. This could partially explain the increase in sleep that they observe.

      (4) The authors rely on homozygous mutant larvae and adult flies to support many of their conclusions. They also rely on Gal4 lines with fairly broad expression in the Drosophila brain to support their conclusions. Adding more precise tissue-specific manipulations, including thermogenetic activation and inhibition of smaller populations of neurons in the study would be needed to increase confidence in the presented results. Similarly, demonstrating that larval development and feeding are not affected by the broad manipulations would strengthen the conclusions.

      (5) Many of the experiments presented in this study would benefit from genetic and temperature controls. These controls would increase confidence in the presented results.

      (6) The authors claim that their findings in larvae uncover the circuit basis for larval sleep regulation. However, there is very little comparison to published studies demonstrating that neuropeptides like Dh44 regulate larval sleep. Because hugin-expressing neurons have been shown to be downstream of Dh44 neurons, the authors need to include this as part of their discussion. The authors also do not explain why other neuropeptides in the initial screen are not pursued in the study. Given the effect that these manipulations have on larval sleep in their initial screen, it seems likely that other neuropeptidergic circuits regulate larval sleep.

      We thank Reviewer #1 for the constructive comments. According to the suggestions, we have compared the relative sleep amounts of wild-type control and Hugin/PK2-R1/IPCs mutants/manipulations between 6hr-period and 18-hour periods in the 2nd instar larval stage and found consistent sleep phenotypes. We have also showed the sleep metrics data of larva and adults. We have included additional data of locomotion and feeding behavior in wild-type control and Hugin/PK2-R1/IPCs mutants/manipulations, which suggest that sleep phenotypes of Hugin/PK2-R1/IPCs mutants/manipulations are less affected by locomotion and feeding behavior changes. As pointed out, our study could not exclude the possibility that in addition to the Hugin/PK2-R1/IPCs axis, other pathways including DH44 could act in larval sleep control. We have included these points in Discussion. Please see point-to-point responses for details.

      Reviewer #2 (Public review):

      Summary:

      This study examines larval sleep patterns and compares them to sleep regulation in adult flies. The authors demonstrate hallmark sleep characteristics in larvae, including sleep rebound and increased arousal thresholds. Through genetic and behavioral analyses, they identify PK2-R1 as a key receptor involved in sleep modulation, likely via the HuginPC-IPC signaling pathway. Loss of PK2-R1 results in increased sleep, which aligns with previous findings in hugin knockout mutants. While the study presents significant contributions to the field, further investigation is needed to address discrepancies with earlier research and strengthen mechanistic claims.

      Strengths:

      (1) The study explores a relatively understudied aspect of sleep regulation, focusing on larval development.

      (2) The use of an automated behavioral measurement system ensures precise quantification of sleep patterns.

      (3) The findings provide strong genetic and behavioral evidence supporting the role of the HuginPC-IPC pathway in sleep regulation.

      (4) The study has broader implications for understanding the evolution and functional divergence of sleep circuits.

      Weaknesses:

      (1) The manuscript does not sufficiently discuss previous studies, particularly concerning hugin mutants and their metabolic effects.

      (2) The specificity of IPC secretion mechanisms is unclear, particularly regarding potential indirect effects on Dilp2.

      (3) Alternative circuits, such as the HuginPC-DH44 pathway, require further consideration.

      (4) Functional connectivity between HuginPC neurons and IPCs is not directly validated.

      (5) Developmental differences in sleep regulatory mechanisms are not thoroughly examined.

      We thank Reviewer #2 for the positive comments. As suggested, our study could not exclude the possibility that in addition to the Hugin/PK2-R1/IPCs axis, alternative pathways including the Hugin/DH44 axis could contribute to sleep control in larvae. We have added these points in Discussion. We also have added additional data to show mechanistic differences of larval and adult sleep control. Please see point-to-point responses for details.

      Reviewer #3 (Public review):

      Summary:

      Sleep affects cognition and metabolism, evolving throughout development. In mammals, infants have fast sleep-wake cycles that stabilize in adults via circadian regulation. In this study, the author performed a genetic screen for neurotransmitters/peptides regulating sleep and identified the neuropeptide Hugin and its receptor PK2-R1 as essential components for sleep in Drosophila larvae. They showed that IPCs express Pk2-R1 and silencing IPCs resulted in a significant increase in the sleep amount, which was consistent with the effect they observed in PK2-R1 knock-out mutants. They also showed that Hugin peptides, secreted by a subset of Hugin neurons (Hug-PC), activate IPCs through the PK2-R1 receptor. This activation prompts IPCs to release insulin-like peptides (Dilps), which are implicated in the modulation of sleep. They showed that Hugin peptides induce a PK2-R1 dependent calcium (Ca²⁺) increase in IPCs, which they linked to the release of Dilp3, showing a connection between Hugin signaling to IPCs, Dilp3 release, and sleep regulation. Additionally, the activation of Hug-PC neurons reduced sleep amounts, while silencing them had the opposite effect. In contrast to the larval stage, the Hugin/PK2-R1 axis was not critical for sleep regulation in Drosophila adults, suggesting that this neuropeptidergic circuitry has divergent roles in sleep regulation across different stages of development.

      Strengths:

      This study used an updated system for sleep quantification in Drosophila larvae, and this method allowed precise measurement of larval sleep patterns which is essential for the understanding of sleep regulation.

      The authors performed unbiased genetics screening and successfully identified novel regulators for larval sleep, Hugin and its receptor PK2-R1, making a substantial contribution to the understanding of neuropeptidergic control of sleep regulation.

      They clearly demonstrated the mechanism by which Hugin-expressing neurons influence sleep through the activation of IPCs via PK2-R1 with Ca2+ responses and can modulate sleep.

      Based on the demonstrated activation of PK2-R1 by the human Hugin orthologue Neuromedin U, research on human sleep disorders may benefit from the discoveries from Drosophila since sleep-regulating mechanisms are conserved across species.

      Weaknesses:

      The study primarily focused on sleep regulation in Drosophila larvae, showing that the Hugin/PK2-R1 axis is critical for larval sleep but not necessary for adult sleep. The effects of the Hugin axis in the adult are, however, incompletely explained and somewhat inconsistent. PK2-R1 knockout adults also display increased sleep, as does HugPC silencing, at least for daytime sleep. The difference lies in Dilp3/5 mutant animals showing decreased sleep and IPCs seemingly responding with reduced Dilp3 release to PK-2 treatment (Figure 6). It seems difficult to reconcile the author's conclusions regarding this point without additional data. It could be argued that PK2-R1 still regulates adult sleep, but not via Hugin and IPCs/Dilps.

      Another issue might be that the authors show relative sleep levels for adults using Trikinetics monitoring. From the methods, it is not clear if the authors backcrossed their line to an isogenic wild-type background to normalize for line-specific effects on sleep. Thus, it is likely that each line has differences in total sleep time due to background effects, e.g., their Kir2.1 control line showed reduced sleep relative to the compared genotypes. This might limit the conclusions on the role of Hugin/PK2-R1 on adult sleep.

      We thank Reviewer #3 for the valuable comments. According to the suggestions, we have included additional data of adult sleep phenotypes with IPCs/Dilps and HugPC/PK-2 manipulations. We believe that these additional data further support the idea that the Hugin/PR2/IPCs axis acts differently in larval and adult sleep control.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Show all data as individual data points in the graphs. The use of box-and-whisker plots makes it difficult to determine how much variation there is in each experiment.

      According to the comments, we have changed all graphs to the dots-and-whisker plots (Figures 1–6; Figure 1—figure supplements 2–4; Figure 2—figure supplement 1; Figure 3—figure supplement 1 and 3; Figure 5—figure supplement 1; and Figure 6— figure supplements 1 and 3).

      (2) Show all larval sleep metrics (total sleep duration, bout #, bout length, & activity) over the first 6-hour period of 2nd instar development. Larval sleep changes over the course of 2nd instar development so showing an 18-hour period is not as informative for the different manipulations in the study. This also allows for a more thorough comparison to Szuperak et al 2018.

      According to the comments, we have shown all larval sleep metrics (total sleep duration, bout #, bout length, & activity) over the first 6 hours for PK2-R1 KO mutants (Figure 1-figure supplemental 5). These PK2-R1 mutant phenotypes are consistent with those described by our sleep amount data over an 18 hr period (Figure 1-figure supplemental 5). We thus consistently show all the sleep phenotype data in the 18 hr period window in the 2nd instar larvae in this paper.

      (3) Show activity values for every experiment. Behavior is based on locomotion, so there is a need to show that larvae in each manipulation do not have locomotive defects.

      According to the reviewer’s comments, we have shown the activity values for each experiment (Figure 2—figure supplement 1 and Figure 3—figure supplement 1). These data clearly indicated that changes in sleep amounts in each manipulation are not only due to locomotion alterations. We have thus added the sentence below at line 151156 in the manuscript.

      Locomotion changes were not consistently observed upon either activation or suppression of Hug neurons (Figure 3—figure supplement 1), suggesting that changes in sleep amounts is unrelated to locomotor alterations.

      (4) Provide additional explanation as to why PK2-R1 was pursued in the study. There are several candidates in Figure 1 - Figure Supplement 4 (like sNPF-Gal4, Dh31-Gal4, and DskGal4) that have effects on sleep. These have also not been studied in the context of larval sleep regulation.

      According to the reviewer’s comments, we have added the following sentences at line 108-114 in the manuscript.

      The role of PK2-R1 in larval sleep, on the other hand, has been unknown to date. Given its strong expression in insulin-producing cells (Schlegel et al., 2016) and its function as a receptor for the neuropeptide Hugin, which modulates feeding (Schoofs et al., 2014), we hypothesized that PK2-R1 might mediate neuropeptidergic signaling that links metabolic and sleep regulation during development. We thus focused on this gene as a candidate connecting behavioral and endocrine sleep control.

      (5) Insulin manipulations are known to disrupt Drosophila development (Rulifson et al, 2002). Therefore, it would be beneficial to show that larvae develop normally in dilp3 and dilp5 mutants by examining the time to pupal formation in these mutants compared to controls. If the mutant larvae take longer to reach the pupal stage, how do the authors know that the 2nd instar control and mutant larvae are the same developmental age? As indicated above, the developmental age of larvae does affect the total amount of sleep, so this could affect the authors' conclusions.

      We agree that this is an important point in this study. In each experiment, we carefully checked the developmental stage of larvae progeny by mouth hook analysis and measuring larval size and used only larvae with characteristics comparable to wildtype 2nd instar larvae. We have added these descriptions in Methods (line 411–416).

      (6) Figure 1 data is only supported by homozygous mutants & 1 fairly-broadly expressed Gal4 driver. The authors need to show that inactivation of PK2-R1 neurons with more tissuerestrictive Gal4 driver lines has the same effect as the other manipulations to further support the conclusions. Examining sleep in activation of PK2-R1 neurons with the broadly expressed Gal4 driver & UAS-TrpA1 would also provide better support for the conclusions.

      We agree. Indeed, we tried to narrow down to small subsets of neurons using multiple different Gal4 drivers, but unfortunately, we did not obtain potential candidates.

      Therefore, although our data show that the Hugin/PK2-R1axis contributes to sleep control in larvae, we cannot rule out the possibility that other axises could also function in larval sleep control. We mentioned this point in the original version of the submitted manuscript (line 134-137).

      (7) Provide more explanation as to how your methods of defining sleep compare/contrast to published papers. It is not clear how many frames = 1 sec in your recordings. The definition of sleep as 12 frames needs to include a time component as well. This allows for easier comparison to other published papers examining Drosophila larval sleep (Szuperak et al 2018; Churgin et al 2019; Poe et al 2023; Poe et al 2024).

      Our recordings were acquired at 0.87 frames per second. We have added this information in Method (line 431).

      (8) Figure 2 data is only supported by mutants & inactivation with 1 Gal4 driver per cell population. Showing activation of Gal4-expressing cells with UAS-TrpA1 would add more support to the conclusions.

      We have already showed the reduced sleep amounts in both HuginGAL4>ReaChR and HuginGAL4>TrpA larvae (Figure 3 C & D) in the original version.

      (9) Need to clarify in the methods how the authors calculated travel distances as a measure of locomotive activity. It's not clear if this is done during larval sleep experiments or in independent experiments. It is also not clear why the y-axes of Figure 2-Figure Supplement 1 are not consistent across the panels. Finally, the authors do see decreases in locomotive activity in PK2-R1>Kir2.1 and in dilp3 mutants, so the conclusions presented in the results section of the paper need to be modified to reflect those results.

      We calculated travel distances from the same video recording datasets used for sleep quantification. We have added this information in Method (line 431-435). As the reviewer indicated, locomotor activity was reduced in a part of conditions/mutants including PK2-R1 > Kir2.1 and dilp3 mutants, and therefore we cannot exclude the possibility that locomotion changes might contribute to sleep phenotypes. On the other hand, a large part of manipulations of Hugin neurons and IPCs caused a sleep increase without significant changes in locomotor activity (Figure 2—figure supplement 1 and Figure 3—figure supplement 1). It is thus likely that Hugin and IPCs contribute to sleep control independent of locomotion, whereas other neurons trapped by PK2-R1 GAL4 might contribute to locomotion control.

      (10) Given the role that hugin neurons play in Drosophila feeding (Schlegel et al, 2016), the authors should include feeding data for the hugin/PK2-R1 manipulations. It is also unclear from the methods if their thresholding for defining sleep can detect feeding behaviors. Changes in feeding behavior could explain some of the reported increases in sleep if feeding is not classified as a waking but is instead picked up as inactivity.

      We agree that this is an important point. According to reviewer’s points, we have added feeding amounts of the wild-type control and the HuginPC>Kir2.1 larvae (Figure 3-figure supplement 3). These data suggest that feeding amounts of the HuginPC>Kir2.1 larvae are significantly reduced compared to those of the control. Given that our data analysis typically categorized feeding behavior into “moving (not sleep)” (see Materials and Methods) and that HuginPC>Kir2.1 larvae showed increased sleep amounts compared to the wild-type control, it is likely that the increased sleep amounts in HuginPC>Kir2.1 larvae are unrelated to changes in feeding behavior.

      (11) The Hugin-IPC localization data (Figure 3E) would be better supported by the use of more specific synaptic and dendritic markers. Specifically, expressing Syt-eGFP (axon marker) in hugin neurons & DenMark (dendritic marker) in IPCs. Using GRASP or P2X2 to demonstrate the anatomical/functional connections between hugin & IPC neurons would also provide better support for this conclusion.

      According to the reviewer’s suggestion, we have added Syt-eGFP signals in HuginPC neurons (Figure 4—figure supplement 1). We also tried DenMark expression in IPCs, but we could not obtain dipl3>DenMark F1 progeny for unknown season. We also applied GRASP to the HuginPC-IPCs interaction, but we could not detect obvious GRASP signals. It is well known that peptidergic transmission is often independent of conventional synapse structures, called as volume transmission, in which peptidergic signals can transmit over a long-range distance to targeting neurons. It is thus possible that IPCs might receive Hugin signals from HuginPC neurons through volume transmission.

      (12) Figure 4 is missing temperature controls for thermal activation experiments. Also missinggenetic control for UAS/+. It would be more convincing to see experiments in Figure 4 with the more specific hug-PC-Gal4 line instead of the broadly expressed hugin-Gal4 line.

      According to reviewer’s comments, we have added the control data in Figure 4.

      (13) Representative images for Figure 4B & 4C would provide better support for the quantifications & conclusions presented.

      According to the reviewer’s suggestions, we show the representative imagine for Figure 4B and 4C (please see Author response image 1). We are, however, afraid that these images might not help readers’ further understanding in addition to the quantitative data, so we have decided to not add these images in the manuscript.

      Author response image 1.

      mCD8::mCherry (top) and CRTC::GFP (bottom) are shown under high-temperature conditions without ("−") or with ("+") hugin neuron activation. "-" denotes a high-temperature genetic control lacking LexAop-TrpA1, thus no thermogenetic activation occurs. CRTC::GFP is shown in pseudocolor.

      (14) A more zoomed-out image of all the IPC neurons in the bath application of hugin peptides (Figure 5D) would help with the interpretation of the results. It's not clear if the authors only measured the same exact neuron in each IPC cluster or if they examined all of the IPC neurons. If they measured all of the IPC neurons, did they observe similar results across the different neurons? How much variability is there in the response of IPC neurons to hugin peptide application?

      For Figure 5, we obtained images of multiple brains from each genotype and quantified the NLI values from all IPC neurons. For reference, we show plots of the CRTC signals of Figure 5C each brain by bran (Author response image 2). We have added detailed information of CRTC analysis in Methods (lines 552-554).

      Author response image 2.

      Distribution of CRTC signals across individual brains. Plots of nuclear localization index (NLI) for individual brains, corresponding to the conditions shown in Figure 5C. The x-axis represents each larval brain preparation, and each dot indicates the NLI value of a single IPC neuron. Horizontal bars represent the median within each brain. These plots illustrate variability both within and across individual brains.

      (15) The conclusion that application of Hug peptides results in dilp3 release is not well supported (Figure 5E). There is a large amount of variation in anti-dilp3 signal. Representative images for these quantifications would be beneficial. The authors also don't directly show that dilp3 vesicles are released. They only see a reduction in antibody accumulation in IPCs. Could there be other reasons for the reduction in accumulation in the IPCs? Would changes in dilp3 gene expression or membrane localization cause a reduction in signal? Showing that actual release of dilp3 is affected by Hug peptides using a reporter like ANF-GFP would be more convincing.

      According to the reviewer’s comments, we have added representative images (Figure 5—figure supplement 2). As for the ex vivo experiments in Fig5, we treated the extracted brain tissues with Hugin/NMU peptides for only 5minutes. It is thus most likely that reduction of Dilps in IPCs is mediated by Hugin/PK2-R1 signal-dependent secretion, rather than transcriptional control and/or degradation of Dilps.

      (16) Show all sleep metrics (total sleep duration, bout #, bout length, and activity) for adult sleep experiments. Showing relative total sleep for the adult experiments is confusing & would benefit from plots of total average sleep in minutes for each genotype.

      According to the reviewer’s comments, we have added the sleep metrics in adults (Figure 6; Figure 6-figure supplement 3).

      (17) The authors can't conclude that expression patterns of PK2-R1 & hug between larvae & adults are "almost comparable." They don't track neurons over development or immortalize neurons in larvae & check expression patterns in adults. They need to show some type of quantification to support these claims. Or revise the text to remove this conclusion.

      We agree. We have changed our augments as follow (line 211-214).

      Interestingly, the expression patterns of PK2-R1 and Hug as well as the morphology of HugPC neurons in adults appeared to be similar to those in larvae (Figure 6—figure supplement 2), implying that the differential roles of Hug in larvae vs adults are likely due to physiological differences in HugPC neurons and/or IPCs.

      (18) For Figure 6, what effect does genetic inactivation of IPCs have on adult sleep? A more specific manipulation of these cells would provide better support for the conclusion that IPC manipulations have distinct effects on larval & adult sleep. The sleep traces for the hugin manipulation & dilp mutants (Figure 6-Figure Supplement 1) also look inconsistent when comparing genetic controls in (Figure 6-Figure Supplement 1D) or when comparing the dilp mutants. Plotting this data as total sleep amount in the day & night (2 separate graphs) would be beneficial. It would also be helpful to see additional sleep traces for these experiments.

      According to the reviewer’s comments, we have added the sleep amounts of added dilp3 and dilp5 adults (Figure 6-figure supplement 1C-D) as well as IPC silencing (Figure6-figure supplement 3D) in a daytime/night time sleep-separated manner.

      (19) For Figure 6, what effect does thermogenetic activation of hugin neurons have on IPC activity? The authors demonstrate in Figure 5 that thermal activation results in an increase in larval IPC activity, but they do not show these experiments in the adult brain. These experiments would provide more support for their conclusion that hugin has differential effects on IPC activity depending on the developmental age (larvae vs adults).

      According to the reviewer’s comments, we performed thermo-activation of hugin neurons and found no significant effects on adult IPCs (see Author response image 3), consists with the ex vivo data in Figure 6.

      Author response image 3.

      (20) A figure legend is needed for Figure 7. The model is not self-explanatory, nor is there an adequate explanation in the discussion section.

      We have added legends (line 781-785).

      (21) Since hugin is known to be downstream of Dh44 in larvae, the discussion needs to include comparison to published work on Dh44 in larvae (Poe et al, 2023). The hugin receptor, PK2R1, is also expressed in Dh44 & DMS neurons (Schlegel et al, 2016), so a discussion of what role Dh44/DMS neurons may play in their model is necessary.

      We agree. We have added discussion as follow in Discussion (line 313-320).

      We cannot rule out the possibility that other neurons could function downstream of HuginPC neurons in sleep regulation. For instance, given that Dh44 neurons in the brain promote arousal (Poe et al. 2023) and are PK2-R1-positive (Schlegel et al. 2016), Hugin might control sleep in part through Dh44 neurons.

      (22) Minor point: Line 97 should say "resulted in a significant sleep increase." Currently, it says "decrease" which is not what is depicted in the figure.

      We appreciate the reviewer’s point. We have corrected this.

      (23) Minor point: Figure 5 should be renamed as Figure 4 since the text describing the results in Figure 5A & 5B occurs before the text describing the results in Figure 4.

      We do understand the point the reviewer arose. However, since Fig5A explains the experimental setup of the whole Fig5s, we would like to keep Fig5A at the original position.

      Reviewer #2 (Recommendations for the authors):

      First, the study would benefit from a more comprehensive discussion of previous research, particularly the work by Schlegel et al. (2016) and Melcher and Pankratz (2006). A key inconsistency that should be addressed is the observation that hugin mutant larvae exhibit reduced body size and feeding behavior, which may influence Dilp2 secretion. The selective effect on Dilp3 and Dilp5 without affecting Dilp2 warrants further clarification. Conducting conditional gene expression experiments to control hugin, dilp3, and dilp5 expression, along with neuronal activity modulation, would help determine whether the observed effects are direct or secondary consequences.

      According to the review’s comments, we tried to manipulate neuronal activity in IPCs, but unfortunately, expression of Kir2.1 in IPCs caused die or very weak animals. Instead, we cited a recent paper that shows a differential secretion of Dilp2 and Dilp6 in a stimulant-dependent manner (Suzawa et al. PNAS 2025) and added more discussion about selective Dilp3/5 secretion by Hugin-PK2-R1 signals (line 275-297).

      Second, the specificity of IPC secretion mechanisms should be clarified. Given that IPCs coexpress Dilp2, Dilp3, and Dilp5, it remains unclear how the pathway selectively modulates Dilp3 and Dilp5 while leaving Dilp2 unaffected. Additional experiments, such as electron microscopy, could provide insights into whether anatomical differences in vesicular pools influence peptide secretion. Since hugin mutants are reported to have reduced body size, confirming that Dilp2 secretion remains truly unchanged is crucial for eliminating potential indirect effects.

      We thank this reviewer for the valuable suggestions. Since the selective Dilp secretion mechanisms in IPCs are not the main scope in this paper, we would like to attempt detailed EM analysis in next studies. We cited a recent paper that shows a differential secretion of Dilp2 and Dilp6 from IPCs in a stimulant-dependent manner (Suzawa et al. PNAS 2025) and added more discussion about selective Dilp3/5 secretion by Hugin-PK2-R1 signals (line 275-297).

      Third, the study should explore the potential role of alternative circuits, such as the HuginPCDH44 pathway, in sleep regulation. The observation that DH44 mutants exhibit even greater sleep amounts than PK2-R1 mutants suggests the involvement of additional regulatory mechanisms. Prior studies indicate that HuginPC neurons may influence DH44 neuron activity, which could impact sleep. Furthermore, recent findings link DH44 with starvation-induced sleep loss in adult flies. Discussing and experimentally investigating the HuginPC-DH44 axis in larval sleep regulation would provide additional depth to the study.

      As far as we understand, any direct evidence for HuginPC→DH44 pathway has not been reported in larvae as well as adults. Instead, DH44 influences Hugin neuron activity in adults (King et al. 2017). We thus examined whether optogenetic DH44 activation could influence HuginPC activity using CRTC analysis, but unfortunately, we could not detect significant changes in HuginPC activity.

      Given that PK2-R1 is expressed in DH44-positive neurons (Schelgel et al 2016) and that DH44-positive neurons are localized at the regions to which HuginPC neurons innervate, it is still possible that the HuginPC→DH44 pathway might function in parallel to the HuginPC→IPCs pathway. We feel that this is quite an interesting possibility and should be a nice scope in the next paper.

      Fourth, validating the functional connectivity between HuginPC neurons and IPCs using calcium imaging would significantly enhance the study. Employing real-time calcium imaging with GCaMPs would provide direct evidence of synaptic activity between these neuronal populations. Such data would strengthen the claim that the observed sleep regulatory effects result from direct neural communication rather than secondary systemic influences.

      We agree. Indeed, we tried Ca<sup>2+</sup> imaging of HuginPC neurons and IPCs in living larvae as well as using ex vivo preparations, and realized that it was quite technically difficult to obtain reliable Ca<sup>2+</sup> dynamics data in the brain of living larvae/ex vivo brain tissue. Therefore, instead of live Ca<sup>2+</sup> imaging, we performed the CRTC analysis using fixed brain preparations. We have added the mention that we tried Ca<sup>2+</sup> imaging in the larval brain, but it did not work well (line 555-558).

      Finally, a more detailed discussion of developmental differences in sleep regulatory mechanisms would be beneficial. The manuscript should address why genes involved in sleep modulation during development may function differently from their roles in adult sleep regulation. Providing a conceptual framework or experimental evidence to explain these developmental differences would enhance the study's contribution to understanding the evolution of sleep circuits. Clarifying how these findings fit into broader sleep regulation models would increase the impact of the research.

      We agree. We would like to add discussions about how factors/circuits involved in sleep modulation during development may function differently from their roles in adult sleep regulation as follows (line 349-371), as it is rather difficult to discuss why.

      It is thus possible that Hugin/PK2-R1 signaling along the HugPC-IPCs circuitry is suppressed in adults. IPCs in adults receive multiple positive and negative modulatory inputs through GPCRs including the metabotropic GABA<sub>B</sub> receptors (Enell et al., 2010), which suppresses IPC activity and Dilp release in adult IPCs (Enell et al., 2010). It is thus plausible that such negative modulatory inputs to IPCs in adults might counteract with the Hugin/PK2-R1 axis to suppress Dilp release. In addition, our data suggest that Dilps modulate sleep amount in the opposite directions in larvae and adults (Figure 7). Comparing the expression levels and activities of GPCRs in larval and adult IPCs would be essential to better understand how the same modulatory signals over the course of development come to exert differential impacts on sleep. Interestingly, Hugin in adults appears irrelevant for the baseline sleep amount but is required for homeostatic regulation of sleep (Schwarz et al., 2021). Thus, testing if Hugin/PK2-R1 axis is involved in the homeostatic regulation of larval sleep, and how such a system compares to its adult counterpart, may further provide mechanistic insights into how homeostatic sleep regulation matures over development.

      By addressing these aspects, the manuscript will provide a clearer, more robust, and wellsupported analysis of larval sleep regulation. These refinements will help improve the study's clarity and impact, ensuring that its findings are effectively communicated to the research community.

      Reviewer #3 (Recommendations for the authors):

      (1) Line 97: "Silencing neurons expressing Oamb and PK2-R1 resulted in a significant sleep decrease?" But there is an increase in sleep amounts from Figure 1A. (Typo error).

      We thank the reviewer for pointing out this typo. We have corrected this typo in the revised version.

      (2) Line139: "HugPC and IPCs labeled by Dilp3-GAL4 are located in close proximity to each other." While proximity does not equal synaptic connections, direct connectivity of HugPC and IPCs was already shown in larval connectome analyses with HugPC providing the strongest input of larval IPCs (Hückesfeld et al. eLife 2021). This could be cited in this context instead.

      We agree. We have cited this paper in References (line 163).

      (3) Figure 2 Supplement 1: Locomotion speed is affected in PK2-R1 knockouts; what is the significance regarding the observed sleep increase?

      We agree that this is a very important point. As the reviewer pointed out, since locomotion speed was reduced in PK2-R1 KO larvae, sleep increase phenotype in PK2-R1 KO larvae might be in part due to reduction of locomotion. On the other hand, IPCs silencing by Kir2.1caused sleep increase phenotype without significant changes in locomotion (Figure 2; Figure 2-figure supplement 1). It is thus possible that since PK2-R1 is broadly expressed in the nervous system in addition to IPCs (Figure 2), PK2-R1 neurons other than IPCs might contribute to locomotion control.

      (4) Why are Dilp3 levels changing (increasing) in adult IPCs after PK-2 treatment? This is not mentioned in the text and is not discussed at all.

      As the reviewer indicated, this data is unexpected to us. At this moment, we could only assume that PK-2 could act in larval and adult IPCs in a different manner. We have added this sentence in Results (line 211-214).

      (5) It has been shown in other publications that Dilps play a role in sleep regulation (Cong et al., Sleep 2015), this study should be cited.

      We have cited this paper in References (line 224).

      (6) The order of discussing figure panels is sometimes confusing, e.g. Figure 6C is discussed at the very end after 6D-F.

      We agree. Indeed, we discussed a lot about this order during preparation of the first draft. However, we finally decided the current order, as grouping “sleep phenotype data” and “ex vivo data” should be easier to understand for readers. We thus keep the current order in the revised submission.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Henning et al. examine the impact of GABAergic feedback inhibition on the motion-sensitive pathway of flies. Based on a previous behavioral screen, the authors determined that C2 and C3, two GABAergic inhibitory feedback neurons in the optic lobes of the fly, are required for the optomotor response. Through a series of calcium imaging and disruption experiments, connectomics analysis, and follow-up behavioral assays, the authors concluded that C2 and C3 play a role in temporally sharpening visual motion responses. While this study employs a comprehensive array of experimental approaches, I have some reservations about the interpretation of the results in their current form. I strongly encourage the authors to provide additional data to solidify their conclusions. This is particularly relevant in determining whether this is a general phenomenon affecting vision or a specific effect on motion vision. Knowing this is also important for any speculation on the mechanisms of the observed temporal deficiencies.

      Strengths:

      This study uses a variety of experiments to provide a functional, anatomical, and behavioral description of the role of GABAergic inhibition in the visual system. This comprehensive data is relevant for anyone interested in understanding the intricacies of visual processing in the fly.

      Weaknesses:

      (1) The most fundamental criticism of this study is that the authors present a skewed view of the motion vision pathway in their results. While this issue is discussed, it is important to demonstrate that there are no temporal deficiencies in the lamina, which could be the case since C2 and C3, as noted in the connectomics analysis, project strongly to laminar interneurons. If the input dynamics are indeed disrupted, then the disruption seen in the motion vision pathway would reflect disruptions in temporal processing in general and suggest that these deficiencies are inherited downstream. A simple experiment could test this. Block C2, C3, and both together using Kir2.1 and Shibire independently, then record the ERG. Alternatively, one could image any other downstream neuron from the lamina that does not receive C2 or C3 input.

      Given the prominent connectivity of C2 and C3 to lamina neurons, we actually expected that lamina processing is also affected. We did the experiment of silencing C2 and recording in the lamina neuron L2 and found no significant difference in their response profile (Author response image 1).

      Author response image 1.

      Calcium responses of L2 axon terminals to full field ON and PFF flashes for controls (grey, N=8 flies, 59 cells) or while genetically silencing C2 using shibire<sup>ts</sup> (magenta, N=4 flies, 26 cells). Traces show mean +- SEM.

      We could include these data in the main manuscript, but we do not really feel comfortable in claiming that C2 and C3 have a specific role in motion processing only, even if it was predominantly affecting medulla neurons. To our knowledge, how peripheral visual circuitry contributes to any other visual behaviors, such as object detection, including the pursuit of mating partners, or escape behaviors, is not well understood. Instead, we added a sentence to the discussion stating that our work does not exclude that, given their wide connectivity, C2 and C3 are also involved in other visual computations.

      (2) Figure 6c. More analysis is required here, since the authors claim to have found a loss in inhibition (ND). However, the difference in excitation appears similar, at least in absolute magnitude (see panel 6c), for PD direction for the T4 C2 and C3 blocks. Also, I predict that C2 & C3 block statistically different from C3 only, why? In any case, it would be good to discuss the clear trend in the PD direction by showing the distribution of responses as violin plots to better understand the data. It would also be good to have some raw traces to be able to see the differences more clearly, not only polar plots and averages.

      We apologize: The plots in the manuscript show the mean across all cells, but the statistics were done more conservatively, across flies. We corrected this mismatch and the figure now shows the mean ± ste across flies after first averaging across cells within each fly. Thank you for pointing this out. Since we recorded n=6-8 flies per genotype, we did not include violin plots, which would indeed make sense if we showed data for each cell.

      (3) The behavioral experiments are done with a different disruptor than the physiological ones. One blocks chemical synapses, the other shunts the cells. While one would expect similar results in both, this is not a given. It would be great if the authors could test the behavioral experiments with Kir2.1, too.

      We have tried this experiment, but unfortunately, flies were not walking well on the ball, and we were not able to obtain data of sufficient quality.

      Reviewer #2 (Public review):

      Summary:

      The work by Henning et al. explores the role of feedback inhibition in motion vision circuits, providing the first identification of inhibitory inheritance in motion-selective T4 and T5 cells of Drosophila. This work advances our current knowledge in Drosophila motion vision and sets the way for further exploring the intricate details of direction-selective computations.

      Strengths:

      Among the strengths of this work is the verification of the GABAergic nature of C2 and C3 with genetic and immunohistochemical approaches. In addition, double-silencing C2&C3 experiments help to establish a functional role for these cells. The authors holistically use the Drosophila toolbox to identify neural morphologies, synaptic locations, network connectivity, neuronal functions, and the behavioral output.

      Weaknesses:

      The authors claim that C2 and C3 neurons are required for direction selectivity, as per the publication's title; however, even with their double silencing, the directional T4 & T5 responses are not completely abolished. Therefore, the contribution of this inherited feedback in direction-selective computations is not a prerequisite for its emergence, and the title could be re-adjusted.

      We adjusted the title to “are involved in motion detection.”

      Connectivity is assessed in one out of the two available connectome datasets; therefore, it would make the study stronger if the same connectivity patterns were identified in both datasets.

      We did not assume large differences between the datasets because Nern et al. 2025 described no major sexual dimorphism. To verify this, we now plotted C2 and C3 connectivity from the three major EM datasets that include C2/C3 connectivity, the female FAFB dataset (Zheng et al. 2018, Dorkenwald et al. 2024, Schlegel et al. 2024) the male visual system (Nern et al. 2025), and the 7-column dataset (Takemura et al. 2015) and see no major differences (Author response image 2 and Author response image 3).

      Author response image 2.

      Relative pres- and post-synaptic counts for C3 from 3 different data sets. Shown are up to ten post- or pre-synaptic partner neurons.

      Author response image 3.

      Relative pres- and post-synaptic counts for C2 from 3 different data sets. Shown are up to ten post- or pre-synaptic partner neurons.

      The mediating neural correlates from C2 & C3 to T4 & T5 are not clarified; rather, Mi1 is found to be one of them. The study could be improved if the same set of silencing experiments performed for C2-Mi1 were extended to C2 &C3-Tm1 or Tm4 to find the T5 neural mediators of this feedback inhibition loop. Stating more clearly from the connectomic analysis, the potential T5 mediators would be equally beneficial. Future experiments might also disentangle the parallel or separate functions of C2 and C3 neurons.

      We fully agree that one could go down this route. Given the widespread connectivity of C2 and C3, and the fact that these are time-consuming experiments with often complex genetics, we had decided to instead study the “compound effect” of C2 and C3 silencing by analyzing T4/T5 physiological properties and motion-guided behavior. We now explicitly explain this logic by saying, “To understand the compound effect of C2 and C3 on motion processing, we focused on the direction-selective T4/T5 neurons, which are downstream of many of the neurons that C2 and C3 directly connect to.”

      Finally, the authors' conclusions derive from the set of experiments they performed in a logical manner. Nonetheless, the Discussion could benefited from a more extensive explanation on the following matters: why do the ON-selective C2 and C3 neurons control OFF-generated behaviors, why the T4&T5 responses after C2&C3 silencing differ between stationary and moving stimuli and finally why C2 and not C3 had an effect in T5 DS responses, as the connectivity suggests C3 outputting to two out of the four major T5 cholinergic inputs.

      Apart from the behavioral screen results, we only tested ON edges in our more detailed behavioral characterizations. And while we show phenotypes for the OFF-DS cell T5, it is well established that inhibitory cells that respond to one contrast polarity can function in the pathway with the opposite contrast polarity (e.g., the OFF-selective Mi9 in the ON pathway). We realized that our narrative in the results section was misleading in this regard (we had given the ON selectivity of C2/C3 as one argument why we first focused on the ON pathway) and eliminated this argument.

      For the differential involvement of C2/C3 for T4/T5 responses to stationary and moving stimuli (C2 and C3 silencing affects both T4 and T5 DS responses, but mostly T4 flash responses): We mostly took the disinhibition of flash responses in T4 as a motivation to look more specifically at a potential role in motion-computation. We now added a sentence about the potential emergence of these flash responses to the already extensive discussion paragraph “How could inhibitory feedback neurons affect motion detection in the ON pathway?”

      Last, we added a discussion point about the relationship between C2 and C3 connectivity and the functional consequences, and discussed the fact that C3 connectivity alone does not correlate with a functional role of C3 (alone) in DS computation.

      Reviewer #3 (Public review):

      Summary:

      This article is about the neural circuitry underlying motion vision in the fruit fly. Specifically, it regards the roles of two identified neurons, called C2 and C3, that form columnar connections between neurons in the lamina and medulla, including neurons that are presynaptic to the elementary motion detectors T4 and T5. The approach takes advantage of specific fly lines in which one can disable the synaptic outputs of either or both of the C2/3 cell types. This is combined with optical recording from various neurons in the circuit, and with behavioral measurements of the turning reaction to moving stimuli.

      The experiments are planned logically. The effects of silencing the C2/C3 neurons are substantial in size. The dominant effect is to make the responses of downstream neurons more sustained, consistent with a circuit role in feedback or feedforward inhibition. Silencing C2/C3 also makes the motion-sensitive neurons T4/T5 less direction-selective. However, the turning response of the fly is affected only in subtle ways. Detection of motion appears unaffected. But the response fails to discriminate between two motion pulses that happen in close succession. One can conclude that C2/C3 are involved in the motion vision circuit, by sharpening responses in time, though they are not essential for its basic function of motion detection.

      Strengths:

      The combination of cutting-edge methods available in fruit fly neuroscience. Well-planned experiments carried out to a high standard. Convincing effects documenting the role of these neurons in neural processing and behavior.

      Weaknesses:

      The report could benefit from a mechanistic argument linking the effects at the level of single neurons, the resulting neural computations in elementary motion detectors, and the altered behavioral response to visual motion.

      We agree that we cannot fully draw this mechanistic argument, but we also do not think that this is a realistic goal of this study. Even in a scenario where one would measure the temporal and spatial properties of “all” neurons that are connected to C2 and C3, this would likely not reveal the full mechanisms linking the single neurons to DS computation, but would require silencing specific connections, or specific molecular components of the connection, or could be complemented by models. A beautiful example where such a mechanistic understanding was achieved, recently published in Nature, essentially focused on a single synaptic connection (between Mi9 and T4) (Groschner et al. 2024), and built on extensive work that had already highlighted the importance of these neurons. We would further argue that the field does not have a good understanding of how T4/T5 responses are translated into behavior. Although possible pathways emerge from connectomes, it is for example not understood why the temporal frequency tuning of T4/T5 substantially differs from the temporal frequency tuning of the optomotor response.

      We therefore would like to highlight that the focus of our study was not to connect all those pieces, but rather to highlight the hitherto unknown overall importance of inhibitory feedback neurons for visual computations along the visual hierarchy, from individual neuron properties, via DS computation, to the temporal precision of the optomotor response.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 52: "The functional significance of feedback neurons, particularly inhibitory feedback mechanisms, in early visual processing is not understood."

      This is incorrect not only because it is referred to as a general statement, but also because many studies have examined inhibition in flies. It may not be solely GABAergic inhibition, but that is just one type. While some discussions later address feedback from horizontal cells in the retina, etc., there is no mention of work on color vision, which requires feedback. Please rephrase.

      We now say “visual motion processing” in this sentence, and added a sentence on color vision: “... color-opponent signalling requires reciprocal inhibition between photoreceptors as well as feedback inhibition from distal medulla (Dm) neurons. (Schnaitmann et al., 2018, Heath et al., 2020, Schnaitmann et al., 2024). “

      (2) Line 197: "Because a previous studies" One or many?, but more important, please cite them.

      We corrected to “a previous study” and cite Tuthill et al. 2013

      (3) Line 172: I noticed a few minor grammatical errors and wording issues, such as the use of "we next" twice in one sentence. "To next identify potential GABAergic neurons that are important for motion computation in the ON pathway, we next intersected 12 InSITE-Gal4." I am bad at picking them out, but since I noticed them, I would strongly suggest looking at the text carefully again.

      We deleted one occurrence of ‘next’, thank you for catching that.

      (4) Question to the authors. Why did you use twice independent lines and not checkers for the white noise analysis in Figure 3e?

      We used flickering bars because many visual system neurons tested in our lab respond with a better signal-to-noise ratio as compared to checkerboards. Flickering bars also appear to be more suited to isolate the spatial surround of neurons. This type of stimulus has been successfully used in previous studies to extract receptive fields of neurons in the fly visual system (Arenz et al. 2017; Leong et al., 2016, Salazar-Gatzimas et al. 2016; Fisher et al. 2015, …).

      (5) Line 248: "Because C2 emerged as a prominent candidate from the behavioral screen, we focused on C2 and asked how silencing C2 affects..." Please state how here. I would need to go to the methods.

      We added a sentence “C2 was silenced by expression of UAS-shibire<sup>ts</sup> (UAS-shi<sup>ts</sup>) for temporal control of the inhibition of synaptic activity.”

      (6) Much of the work in the blowfly uses picrotoxinin to block GABAergic inhibition in the visual motion pathway. It would be useful to mention some of this early work and its results, particularly that of Single et al. (1997). It might be interesting to reinterpret their results.

      Thank you for pointing this out. We added this paragraph to the discussion: ‘Work in blowflies has found a severe impact of GABAergic signaling for DS in LPTCs downstream of T4 and T5 cells, using application of picrotoxin to the whole brain (Single et al. 1997; Schmid and Bülthoff 1988). Although the loss of DS in LPTCs could originate from direct inhibitory synapses onto LPTCs (Mauss et al. 2015; Ammer et al. 2023), the disruption of GABAergic signaling in upstream circuitry, which reduces DS in T4 and T5, may also contribute to the phenotype seen in LPTCs.’

      Reviewer #2 (Recommendations for the authors):

      The following set of corrections aims to better the scientific and presentation aspects of this work.

      (1) The title of the work implies that C2 and C3 neurons are required for motion processing, whereas the study shows their participation in motion computations, which persists post their silencing. Therefore, "Inhibitory columnar feedback neurons contribute to Drosophila motion processing" would be a more appropriate title.

      We rephrased the title to say that inhibitory feedback neurons “are involved in” motion processing.

      (2) The morphology of C2 and C3 neurons, i.e., ramifications in medulla & cell body in medulla and axonal targeting to lamina, implies their feedback role. It would be important to mention the specific feedback loop they participate in and the role of Mi1 more extensively in lines 36, 120.

      We find it hard to speculate on the specific feedback loops that C2 and C3 are involved in from their widespread input and output connectivity. If we had, we would have wanted to support this by functional measurements of this specific loop, which was not the goal of this study.

      (3) In lines 55-89, the authors explore the instances of feedback inhibition within and across species and modalities. For the Drosophila visual example (lines 76-89), given that it also addresses motion circuits, the following studies should be included:

      Ammer, G., Serbe-Kamp, E., Mauss, A.S., et al. Multilevel visual motion opponency in Drosophila. Nat Neurosci 26, 1894-1905 (2023). https://doi.org/10.1038/s41593-023-01443-z. Mabuchi Y, Cui X, Xie L, Kim H, Jiang T, Yapici N. Visual feedback neurons fine-tune Drosophila male courtship via GABA-mediated inhibition. Curr Biol. 2023 Sep 25;33(18):3896-3910.e7. doi: 10.1016/j.cub.2023.08.034.

      We added a sentence on the Ammer et al. finding to the introduction. Since the introduction paragraph focuses on known physiological effects within the visual system, we did not find a good fit for the Mabuchi et al. study, which focuses on serotonergic feedback neurons with a role far downstream in courtship behavior.

      (4) In lines 102-103, the following work should be referenced: Groschner LN, Malis JG, Zuidinga B, Borst A. A biophysical account of multiplication by a single neuron. Nature. 2022 Mar;603(7899):119-123. doi: 10.1038/s41586-022-04428-3.

      We cited a few of the many papers that used “modeling frameworks” and selected the ones focusing on the entire feedforward circuitry. To also give credit to the Borst lab, we instead added Serbe et al. 2016 here.

      (5) In lines 107-108, the Braun et al. (2023) study has not performed Rdl knockdown experiments in T4 cells; hence, it needs to be better clarified in the text.

      We corrected this in the text.

      (6) Even though the dataset was previously published, a summary plot of the different phenotypes would be very helpful to the reader. Moreover, in line 131, as the study focuses on motion vision, it would be better to use "early motion visual processing" rather than "early visual processing.”

      We added a summary plot of the behavioral screen data to Supplementary figure 1, and rephrased previous line 131.

      (7) The first result section title excludes C3 neurons, even though in lines 172-179 they are addressed; therefore, the C3 inclusion is suggested as in "GABAergic C2 and C3 neurons control behavioral responses to motion cues". The term "required" should be excluded from the title as the other neuronal types encountered in the InSITE drivers were never quantified; thus, the "behavioral requirement" might come from these other neurons as well.

      From the experiments shown in this paragraph alone we cannot make conclusive claims about C3, as it was also weakly visible in one of our genetic control in the intersectional strategy that we took (we had written: “This strategy also revealed other GABAergic cell types, including the columnar neuron C3 and the large amacrine cell CT1 which were however also weakly present in the gad1-p65AD control).

      We changed the title of this paragraph to: A forward genetic behavioral screen identifies GABAergic C2 neurons to be involved in motion detection.

      (8) In line 142, it should be clearly stated that the MultiColor FlpOut technique was used and should also be cited: Nern A, Pfeiffer BD, Rubin GM. Optimized tools for multicolor stochastic labeling reveal diverse stereotyped cell arrangements in the fly visual system. Proc Natl Acad Sci U S A. 2015 Jun 2;112(22):E2967-76. doi: 10.1073/pnas.1506763112.

      We did not use MCFO clones, but simple Flp-out clones, and the genotype and reference for this were given in the methods: UAS-FRT-CD2y+-RFT-mCD8::GFP; UAS-Flp , (Wong et al. 2002). To make this clearer, we now also cite (Wong et al. 2002) in the results section.

      (9) In Figure 1c, a description of RFP should be written as it is already in Supplementary Figure 1c.

      We added this to the Figure caption.

      (10) In line 172, "next" is redundant as it was previously used at the beginning of the sentence.

      Removed

      (11) In line 175, based on both figures that the authors refer to, instead of C2, C3 should be written.

      We do indeed see C3 labeled in the images, but also in a gad1-p65AD control. We thus cannot be sure if C3 indeed reflects the intersection pattern. However, the three lines shown in Figure 1d clearly also label C2, which is not seen in the control condition.

      (12) In line 184, a split-C2 line is used (and a split C3 as in Supplementary Figure 2). It would enhance the credibility of the work and even be appropriate afterwards to use the word "requirement" if this split-C2 line was used for behavioral experiments, as in Gohl et al., 2011, and Sillies et al.,2013 studies.

      We are indeed using the same split-C2 line for imaging and for behavioral experiments in Figure 7. We see Figure 1 (and with that, Silies et al. 2013) as a first pass screen, from which we obtained candidates, which we then more thoroughly tested throughout the remaining manuscript, with more specific lines. We are no longer using the word “requirement”

      (13) In lines 186-188, is DenMark used as a postsynaptic marker? If yes, an additional control would be the use of Discs-large (DLG) as a postsynaptic marker, as DenMark would not be restricted to postsynaptic densities.

      Yes, we used DenMark as written in the sentence “we expressed GFP-tagged Synaptotagmin (Syt::GFP) to label pre-synapses together with the dendritic marker DenMark (Nicolai et al., 2010)”. Since our claims about widespread C2 and C3 connectivity are further supported by connectomics, we did not use another postsynaptic marker.

      (14) In line 191, L2 is mentioned as presynaptic, whereas in Figure 2b is clearly postsynaptic.

      We write “This revealed that C2 forms several presynaptic contacts with the lamina neurons L5, L1, and L2” . L5, L1, and L2 are hence postsynaptic to C2, which is what is plotted in Figure 2b. 

      (15) In line 197, the "a" in "because a previous studies" should be removed, and these studies should be cited as the authors do in line 514.

      Done as suggested.

      (16) In line 1191, the figure title uses the term "required", whereas the plotted data suggest that T4 and T5 responses remain DS after C2&C3 silencing. Rephrasing to "C2 and C3 affect direction-selective.." would be better suited.

      We replaced “required” with “contribute to”

      (17) In the legend of Figure 2b, the "Counts of synapses" is misleading. The number plotted refers to the percentage of synapse counts from the target neuron.

      Corrected.

      (18) A general question about the C2 and C3 ON selectivity: How would the authors explain the OFF deficits from the published behavioral screening in Supplementary Figure 1a? Do the other InSITE neurons contribute to it? This needs to be further elaborated in the discussion.

      A neuron being ON selective does not imply that it is functionally required in the ON pathway only. In fact, Mi9, a major component of the ON pathway (even if not “required” under many stimulus conditions), is OFF selective.

      Furthermore, both we (Ramos-Traslosheros and Silies, 2021) and others (Salazar-Gatzimas et al. 2019) have shown that both ON and OFF signals are combined in ON and OFF pathways, which is further supported by connectomics data. We clarified the transition from physiology to function in the results section, as already explained above.

      (19) In line 216, the authors' image from layer M1, but the reasoning behind this choice is missing. The explanation gap intensifies after you proceed with further examining the layer-specific responses in Supplementary Figure 2. Is this because C2 and C3 receive their inputs in M1, as is insinuated in line 219?

      As Supplementary Figure 2 shows, we initially imaged from all layers of the medulla, where C2 arborizes. Because the response properties, including kinetics, weren’t different, we had no reason to believe that C2 is highly compartmentalized. We thus subsequently focused on layer M1, where amplitudes were highest. We clarified this in the text.

      (20) In line 229, it should be clear whether the STRFs come from M1 measurements. STRF analysis in M5, M8, and M9/10 also verifies that the C2, C3 multicolumnar span would further strengthen the results. Given the focus of the work in Mi1 and T4/T5, Mi1-C2 connections should be clarified in terms of which medulla layer they formulate. Additionally, the reasoning behind showing in Figure 3 STRFs from M1 measurements, even though Supplementary Figure 2b implies equal responses in M9/10, where also Tm1 and Tm4 output from C3, should be explained.

      We never recorded STRFs in the silenced condition and make no claims about C2 changing spatial properties of Mi1. We added the information that STRFs were recorded in layer M1 to the figure caption. We checked the specific connectivity of C2 and Mi1 and they indeed connect in M1 (Author response image 4), but regardless of this result, there is no evidence for compartmentalization in these columnar neurons.

      Author response image 4.

      Image of a C2 (blue) and Mi1 (yellow) neuron from EM Data (FAFB). Circles depict synapses from C2 to Mi1 in layer M1 of the medulla.

      (21) In Figure 3e, the statistical significance or lack thereof is not visible at the bar plot.

      Consistently throughout the manuscript, we now just indicate if a comparison is significant. If nothing is shown, it means that it is not.

      To clarify this, we added a sentence to the statistics section in the methods now saying: We show significant differences in figures using asterisks (p<0.05 *,p<0.01 **, p<0.001***). Non-significant differences are not further indicated.

      Please note that based on another reviewer comment, we also adapted the analysis of the kernels. This changed the statistics to be significant for the timing of the on peak response (Figure 3e).’

      (22) In line 249, it is mentioned that the strongest C2 connection is Mi1; this does not derive from the data shown in Figure 2b.

      We intended to look at medulla neurons, and Mi1 is the most connected medulla neuron to C2. We clarified that in the text, which now reads: “Because C2 emerged as a prominent candidate from the behavioral screen, we focused on C2 and asked how silencing C2 affects temporal and spatial filter properties of the medulla neurons that provide direct input to T4 neurons. We chose to test Mi1 as it is the medulla neuron most strongly connected to C2.”

      (23) The result section title "C2 & C3 neurons shape response properties of the ON pathway medulla neuron Mi1" does not include C3 results. This would be fundamental to have. As previously mentioned, the neural correlates of this inhibitory feedback loop should be clearly defined, and the current version of this work evades doing so.

      We corrected the title. As discussed elsewhere, it was not the goal of this study to work the specific contributions of C2 (and C3) to all neurons they connect to, but rather focus on the compound effect for motion detection.

      (24) In line 276, the following work should be cited: Maisak MS, Haag J, Ammer G, Serbe E, Meier M, Leonhardt A, Schilling T, Bahl A, Rubin GM, Nern A, Dickson BJ, Reiff DF, Hopp E, Borst A. A directional tuning map of Drosophila elementary motion detectors. Nature. 2013 Aug 8;500(7461):212-6. doi: 10.1038/nature12320.

      We added the citation.

      (25) In line 273, the title implies the investigation of the spatial filtering of T4 and T5 cells. This does not take place in the respective result section.

      We changed the title to: “C2 and C3 shape temporal and spatial response properties of T4 and T5 neurons.”

      (26) In line 280, Kir2.1 is used, whereas previously thermogenetic silencing with Shibirets was preferred; could the authors elaborate on this choice in the text, for example, genetic reasons?

      We generally prefer shibire[ts] because of its inducible nature. However, our T4/T5 recordings too included more stimuli (motion stimuli) than the Mi1 recordings, and the effect of shi[ts] mediated silencing by pre-heating the flies (as established by Joesch et al. 2010) was not longlasting enough for these experiments, which is why we used Kir2.1. In a previous set of experiments, we had tried incubating flies while imaging, but this induced too large movements of the brain and T4/T5 recordings were not stable enough.

      (27) In lines 290-291, T5 ON suppression is found to be affected by C2 silencing, but the bar plot in Figure 5b uses the OFF-step data. It would be best if the ON-step data for T5 cells were also plotted.

      ON-step data for T5 are plotted in Supplementary Fig. 3e

      (28) In line 288, "when C2 was also blocked", "also" should be included, as you are referring to double silencing.

      Sorry for the confusion, we called the wrong figure in that sentence. Here, we wanted to point at the increased response of T4 to the ON-step upon C2 silencing, which was quantified in Supplementary Fig. 3e.

      (29) In line 312, it is important to mention in the discussion why it is the case that C2 and not C3 had an effect on T5 DS responses. C2 outputs to Tm1, whereas C3 to Tm1 and Tm4, based on Figure 2b, with Tm1 and Tm4 being one of the four major cholinergic T5 inputs. Hence, it would be natural to think that C3 and not C2 would affect T5 responses.

      We addressed this in the discussion.

      (30) In lines 326-328, it is crucial to mention the neural correlates that connect C2 and C3 to T4 and T5. Additionally, the Shinomiya et al. (2019) study shows C3 to T4 connections, which are mentioned in the discussion and should be cited in line 429.

      We do not think that mentioning neural correlates at this point is crucial, as these sentences were concluding a paragraph in which we link C2/C3 silencing to T4/T5 responses. We also do not know the neural correlates (but for Mi1) so this would not be accurate.

      We have been mentioning C3 to T4 connection in both the results and discussion, and our analysis (Figure 2) stems from the FAFB dataset. We added citations to both results and discussion.

      (31) In Figure 6a, compared to Figure 3b, the term compass plots is used instead of polar plots. It would be best to use one consistent term. Additionally, in Figure 6c, it is not mentioned if the responses across genotypes are the outcome of averaging across subtype responses.

      These two plots are not the same; a compass plot is a sub-category of polar plots. Polar plots, as in Figure 3, show the response amplitude of the neurons to the different directions of motion. Instead, compass plots, as in Figure 6, show vectors that depict the tuning direction and the strength of tuning of individual neurons.

      We added the following sentence to clarify the calculation in Figure 6c: ‘To average responses of all neurons, the PD of each neuron was determined by its maximal response to one of 8 directions shown.'

      (32) In line 344, the title could be adjusted to "C2 is controlling the temporal dynamics of ON behavior", under the same reasoning of 'requirements' explained before.

      We think that “is controlling” is a stronger claim than “being required”. For a geneticist, the word “required” simply means that there is a(ny) loss of function phenotype, i.e., a reduction in DS when C2 and C3 are silenced/blocked. Many neurons are sufficient but not required to induce a certain behavior (i.e., they can induce a behavior when ectopically activated, but show no significant loss of function phenotype). We therefore consider it remarkable that C2 and C3 silencing indeed shows a significant reduction in DS.

      However, we do not want to overclaim anything, and the title now reads: “T4 tunes the temporal dynamics of ON behavior”

      (33) In Figure 7c, the plot legend should be "deceleration".

      Corrected

      (34) In line 424, the Braun et al. (2023) experiments were performed in T5 cells as previously mentioned.

      Corrected

      (35) In line 435, the authors mention that both ON-selective C2 and C3 neurons act partially in parallel pathways. In Figure 2b, the upstream circuitry between C2 and C3 is identical. How would they explain the functional-connectivity contradiction?

      In terms of acting in parallel pathways, downstream, not upstream, connectivity of C2 and C3 will matter, which is not identical. C2 for example connects to Mi1, L1, and L4, whereas C3 does not. On the other hand, C3 connects to Mi9 and Tm4, which C2 does not.

      (36) In lines 445-447, the authors address C2 and C3 neurons as columnar, whereas they previously showed in Figure 3 that they are multicolumnar.

      Here, we refer to the nomenclature of Nern et al, that use the term “columnar” whenever something is present in each column. We specifically define this by saying “only 15 cells are truly columnar in the sense that they are present once per column and present in each column”. In the results section, we instead talk about “functionally multicolumnar” and changed a sentence in the discussion to say “The spatial receptive fields of C2 and C3 are consistent with the multicolumnar branching of their projections in the medulla” to avoid any such confusion.

      (37) In line 448, "thus" is repetitive, and the extracted view in line 449 does not contribute to the essence of the study.

      Fixed.

      (38) In line 459, the authors refer to inhibition inheritance; this term should be used frequently in the text in case the neural correlates between C2 & C3 and T4 & T5 are not deciphered.

      We think this point is very clear throughout the manuscript now. As one prominent example, we added a sentence to the first paragraph of the discussion saying “Given the widespread connectivity of C2 and C3 to neurons upstream of T4/T5, this effect [on DS tuning] is likely inherited from upstream neurons of T4/T5.”

      (39) In line 521, the transition between sentences is problematic.

      Corrected

      (40) For Supplementary Figure 1, why were the ON-motion deficits not addressed with the antibody approach used for Supplementary Figure 1a?

      The approach using anti-GABA stainings turned out to be largely redundant with the intersectional strategy. Furthermore, the intersectional strategy provided the full morphology of the cell and, hence, led to easier identification of the cell types involved.

      (41) In line 1169, C2 is mentioned, whereas C3 is annotated in the figure.

      Corrected

      (42) A general comment is that Tm1 inputs could be a good candidate for assessing T5 inputs, as performed for Mi1-T4 in Fig.4. Such experiments would enhance the understanding of inhibitory inheritance to T5 responses.

      We fully agree.

      (42) Do the authors have any indication or experiments done regarding the C2&C3 role in T4&T5 velocity tuning? This would be complementary to the direction of this study.

      This is a good idea, that we had tried. However, we did not see a difference between control and C2 silencing for the temporal frequency tuning of T4/T5. As velocity is closely related to temporal frequency tuning, we would not expect to see a difference there either.

      While it would have been nice to be able to draw such a link, we would also state that our behavioral data are a bit different: We did not look at temporal frequency tuning per se, and overall, it is not well understood how responses in T4/T5 relate to behavior, as they for example have different frequency tunings (T4/T5 physiology: Maisak et al., 2013, Arenz et al., 2017; optomotor behaviour: Strother et al.,2017, Clark et al., 2013). 

      (43) As a suggestion, Figure 7 would be better positioned as Figure 4, right after the ON-selectivity finding of C2 neurons.

      We preferred to keep the current order.

      Reviewer #3 (Recommendations for the authors):

      Main recommendation:

      It would be useful to propose a neural circuit model that connects the various observations. One can draw here on the many circuit models for motion vision in the prior literature.

      (1) How might the extended response in upstream neurons Mi1 lead to the inappropriate nulldirection responses in T4/T5?

      This is a good question and we can only speculate. Mi1 responses are enhanced upon C2 silencing and T4 responses to full field flash responses are also enhanced. Likely, these motionindependent responses are also seen when the edge travels into the non-preferred direction, whereas this non-motion response would likely be masked by the motion response to the preferred direction. The phenotype seen in T5 is likely inherited from medulla neurons, e.g. Tm1, to which C2 connects. How the delay of the Mi1 response upon C2 silencing may specifically affect ND responses, we don’t know. 

      (2) How is the loss of DS in T4/T5 compatible with the continued sensitivity to motion in the turning response? Perhaps the signal from 180-degree oppositely tuned T-cells gets subtracted, so as to remove the baseline activity?

      This is a great question that we cannot answer. Overall, perturbations that affect T4/T5 physiology do not necessarily manifest in equivalent phenotypes when looking at behavioral turning responses. Prominent examples come from silencing core neurons of motion-detection circuits, such as Mi1 and Tm3 (see Figure 4, Strother et al. 2017).

      (3) How do the altered dynamics in upstream neurons relate to the loss of high-frequency discrimination in the behavior? One would want to explain why the normal fly has a pronounced decay in the response even though the motion is still ongoing (Figure 7b left, starting at 0.4 s). That decay is missing in the mutant response.

      That is an excellent question that we unfortunately do not have an answer for. Please note that our visual stimuli is a single edge which is sweeping across the eye, and which might not elicit equally strong responses at each position of the eye, or each time during the stimulus presentation.

      In terms of linking the dynamics of upstream neurons to behavior, we already pointed out above that it is not well understood how responses in T4/T5 relate to behavior, as they for example have different frequency tuning, with T4/T5 neurons being tuned to lower temporal frequencies than the turning behavior of a fly walking on a ball (T4/T5 physiology: Maisak et al., 2013, Arenz et al., 2017; optomotor behaviour: Strother et al.,2017, Clark et al., 2013).

      Other recommendations:

      (1) Abstract line 37 "At the behavioral level, feedback inhibition temporally sharpens responses to ON stimuli, enhancing the fly's ability to discriminate visual stimuli that occur in quick succession." It may be worth specifying *moving* stimuli.

      Done as suggested

      (2) Line 52: "The functional significance of feedback neurons, particularly inhibitory feedback mechanisms, in early visual processing is not understood." This seems overly negative. Subsequent text mentions a number of such instances that are understood, and one could add more from the retina.

      We agree. We rephrased to say ‘motion vision’ and added more examples of known roles of feedback inhibition

      (3) Line 69: "inhibitory feedback signals from horizontal cells and amacrine cells to photoreceptors and bipolar cells, respectively, are involved in multiple mechanisms of retinal processing, including global light adaptation, spatial frequency tuning, or the center-surround organization (Diamond 2017)." Maybe add the proven role in temporal sharpening of responses, which is of relevance to the present report.

      We added temporal sharpening to that introduction point.

      (4) Figure 1: The text for this figure talks about behavioral motion detection deficits in various lines. Maybe add an example of the behavioral effects to this figure.

      We added a summary plot of the behavioral screen data to Supplementary figure 1.

      (5) Line 325: "the timing of the ON peak tended to be slower for C3 compared to C2 for both the vertical and the horizontal STRF": It's hard to see evidence for that in the data.

      Based on your next comment we reanalysed the kernels of C2 and C3. This resulted in a significant difference in peak timing between C2 and C3. 

      (6) When presenting kernels as in Figure 3d and Figure 4b, extend the time axis to positive times until the kernel goes to zero. This "prediction of future stimuli" allows the reader to see the degree of correlation within the stimulus, which affects how one interprets the shape of the kernel. Also, plotting the entire peak gives a better assessment of whether there are any shape differences between conditions. An alternative is to compute the kernel via deconvolution, which gets closer to the actual causal kernel, but that procedure tends to highlight high-frequency noise in the measurement.

      We replotted the kernels in Figure 3d and 4b to show positive times. The kernels of C2 and C3 stayed at a positive level. Going back through the data we found a severe decrease in GCaMP signal in the first 2 seconds of the recording. We reanalyzed the kernels by ignoring the first seconds. All kernels now go back to zero. The shape of the kernels did not change but we now find a significant difference in peak timing between C2 and C3. Thank you for pointing this out.

      (7) Line 280 "simultaneously blocked C2 and C3 using Kir2.1": First use of that acronym. Please explain what the method is.

      We now explain “we simultaneously blocked C2 and C3 by overexpression of the inwardrectifying potassium channel Kir2.1”

      (8) Line 350 "temporal dynamics for C2 silencing": suggests "dynamics of silencing"; maybe better "response dynamics during C2 silencing".

      Edited as suggested

      (9) Figure 7: Explain the details of the stimulus containing two subsequent on edges. What happens between one edge and the next? Does the screen switch back to black? Or does the second edge ride on top of the final level of the first edge? This matters for interpreting the response.

      Yes, the screen turns dark between subsequent edge presentations. We added a sentence to the methods to clarify that. 

      (10) Line 402 "novel, critical components of motion computation.": This seems exaggerated. At the behavioral level, motion computation is mostly unaffected, except for some details of time resolution. Whether those matter for the fly's life is unclear.

      We deleted the word ‘critical.’

      (11) Line 413 "GABAergic inhibition required for motion detection is mediated by C2 and C3": Again, this seems exaggerated. Motion *detection* appears to work fine, but the *discrimination* of two closely successive motion stimuli is affected. The rest of the text does properly distinguish "discrimination" from "detection".

      We changed the title to say: ‘GABAergic inhibition in motion detection is mediated by C2 and C3.’

      (12) Line 489 "Whereas the role of C2 and C3 for the OFF pathway may be more generally to suppress neuronal activity,": Unclear to what this refers. The present report emphasizes that there is no effect on OFF activity (Figure 5).

      We did not see an effect of T5 responses to OFF flashes as shown in Figure 5 but we found a significant reduction of DS when silencing C2, as well as slightly overall increased responses to all directions for C2 and C3 silencing, which was significant for null directions when silencing C2. This is shown in Figure 6.

      Typos:

      (1) Line 521.

      Fixed

      (2) Line 1170: context of the citation unclear.

      Fixed

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors employed comprehensive proteomics and transcriptomics analysis to investigate the systemic and organ-specific adaptations to IF in males. They found that shared biological signaling processes were identified across tissues, suggesting unifying mechanisms linking metabolic changes to cellular communication, which revealed both conserved and tissue-specific responses by which IF may optimize energy utilization, enhance metabolic flexibility, and promote cellular resilience.

      Strengths:

      This study detected multiple organs, including the liver, brain, and muscle, and revealed both conserved and tissue-specific responses to IF.

      We appreciate the recognition of the study’s strengths and the opportunity to clarify the points raised.

      Weaknesses:

      (1) Why did the authors choose the liver, brain, and muscle, but not other organs such as the heart and kidney? The latter are proven to be the largest consumers of ketones, which is also changed in the IF treatment of this study.

      We agree that the heart and kidney are critical organs in ketone metabolism. Our selection of the liver, brain, and muscle was guided by their distinct metabolic functions and relevance to systemic energy balance, neuroplasticity, and locomotor activity, key domains influenced by intermittent fasting (IF). These tissues also offer complementary perspectives on central and peripheral adaptations to IF. Notably, we have previously examined the effects of IF on the heart (eLife 12:RP89214), and we fully acknowledge the importance of the kidney. We intend to include it in future studies to broaden the scope and deepen our understanding of IF-induced systemic responses.

      (2) The proteomics and transcriptomics analyses were only performed at 4 months. However, a strong correlation between IF and the molecular adaptations should be time point-dependent.

      We appreciate this insightful comment. The 4-month time point was selected to capture long-term adaptations to IF, beyond acute or transitional effects. While we acknowledge that molecular responses to IF are time-dependent, our goal in this study was to establish a foundational understanding of sustained systemic and tissue-specific changes. We fully agree that a longitudinal approach would provide deeper insights into the temporal dynamics of IF-induced adaptations. To address this, we are currently undertaking a comprehensive 2-year study that is specifically designed to explore these time-dependent effects in greater detail.

      (3) The context lacks a "discussion" section, which would detail the significance and weaknesses of the study.

      We appreciate this observation. The manuscript was originally structured to emphasize results and interpretation within each section, but we recognize that a dedicated discussion section would enhance clarity and contextual depth. In the revised version, we will add a comprehensive discussion section addressing broader implications, limitations, and future directions of the study.

      (4) There is no confirmation for the proteomic and transcriptomic profiling. For example, the important changes in proteomics could be further identified by a Western blot. 

      We acknowledge the importance of orthogonal validation to support high-throughput findings. While our study primarily focused on uncovering systemic patterns through proteomic and transcriptomic profiling, we agree that targeted confirmation would strengthen the conclusions. To this end, we have included immunohistochemical validation of a key protein common to all three organs— Serpin A1C. Additionally, we are planning a dedicated follow-up study to expand functional validation of several key proteins identified in this manuscript, which will be pursued as a separate project.

      Reviewer #2 (Public review):

      Summary:

      Fan and colleagues measure proteomics and transcriptomics in 3 organs (liver, skeletal muscle, cerebral cortex) from male C57BL/6 mice to investigate whether intermittent fasting (IF; 16h daily fasting over 4 months) produces systemic and organ-specific adaptations. 

      They find shared signaling pathways, certain metabolic changes, and organ-specific responses that suggest IF might affect energy utilization, metabolic flexibility, while promoting resilience at the cellular level.

      Strengths:

      The fact that there are 3 organs and 2 -omics approaches is a strength of this study. 

      We appreciate the reviewer’s recognition of the breadth of our study design. By integrating proteomics and transcriptomics across three metabolically distinct organs, we aimed to provide a comprehensive view of systemic and tissue-specific adaptations to IF. This multi-organ, multi-omics approach was central to uncovering both conserved and divergent biological responses.

      Weaknesses:

      (1) The analytical approach of the data generated by the present study is not well posed, because it doesn't help to answer key questions implicit in the experimental design. Consequently, the paper, as it is for now, reads as a mere description of results and not a response to specific questions.

      We thank the reviewer for this important observation. Our initial aim was to establish a foundational atlas of molecular changes induced by IF across key organs. However, we recognize that clearer framing of the biological questions would enhance interpretability. In the revised manuscript, we will have restructured the introduction, results, and discussion to align more explicitly with specific hypotheses, particularly those related to energy metabolism, cellular resilience, and inter-organ signaling. We have also added targeted analyses and clarified how each dataset contributes to answering these questions.

      (2) The presentation of the figures, the knowledge of the literature, and the inclusion of only one sex (male) are all weaknesses.

      We appreciate this feedback and agree that these are important considerations. Regarding figure presentation, we will revise several figures for improved clarity, add more descriptive legends, and reorganize supplemental materials to better support the main findings. On the literature front, we will expand the discussion to include recent and relevant studies on IF, metabolic adaptation, and sex-specific responses. As for the use of only male mice, this was a deliberate choice to reduce hormonal variability and focus on establishing baseline molecular responses. We fully acknowledge the importance of sex as a biological variable and will soon be conducting studies in female mice to address this gap.

      Reviewer #3 (Public review):

      Summary:

      Fan et al utilize large omics data sets to give an overview of proteomic and gene expression changes after 4 months of intermittent fasting (IF) in liver, muscle, and brain tissue. They describe common and distinct pathways altered under IF across tissues using different analysis approaches. The main conclusions presented are the variability in responses across tissues with IF. Some common pathways were observed, but there were notable distinctions between tissues.

      Strengths:

      (1) The IF study was well conducted and ran out to 4 months, which was a nice long-term design.

      (2) The multiomics approach was solid, and additional integrative analysis was complementary to illustrate the differential pathways and interactions across tissues. 

      (3) The authors did not overstep their conclusions and imply an overreached mechanism.

      We sincerely thank the reviewer for acknowledging the strengths of our study design and analytical approach. We aimed to strike a careful balance between comprehensive data generation and cautious interpretation, and we appreciate the recognition that our conclusions were appropriately framed within the scope of the data.

      Weaknesses:

      The weaknesses, which are minor, include the use of only male mice and the early start (6 weeks) of the IF treatment. See specifics in the recommendations section.

      We appreciate the reviewer’s thoughtful comments. The decision to use male mice and initiate IF at 6 weeks was based on minimizing hormonal variability and capturing early adult metabolic programming. We acknowledge that sex and developmental timing are important biological variables. To address this, we are conducting parallel studies in female mice and evaluating IF initiated at later life stages. These follow-up investigations will help determine the extent to which sex and timing influence the molecular and physiological outcomes of IF.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The editor suggested addressing points regarding the young age at diet onset, use of males only, and justification for the choice of tissues analyzed without requiring new data generation.

      We agree that these are important points for context. We have now added a dedicated paragraph to the Discussion section (page 22) to explicitly acknowledge and discuss these as limitations of our study. We justify our initial experimental design choices in the context of the existing literature while acknowledging the valuable insights that studies in females and with different diet onset timings would provide.

      The editor and reviewers recommended a more integrative analysis, suggesting the use of freely available tools, and a deeper discussion to frame the work against the existing literature.

      We thank the editor for this excellent suggestion. In response to this and the detailed points from Reviewer #2, we have performed a new, integrated multi-omics analysis using Latent variable approaches (DIABLO), implemented in the mixOmics R package version 6.28.0 tool, a state-of-the-art, freely available package for integrative multi-omics analysis. This new analysis, presented in a new Figure 4 and described in the Results section (pages 20-23), identifies the key sources of variation across tissues and omics layers, directly addressing the request for a true integrative approach. Furthermore, we have thoroughly revised the Results and Discussion to more sharply frame our findings and highlight the new insights gleaned from our study.

      The editor requested clarification on whether mice were fasted at euthanasia and to rephrase the statement on page 12 regarding mitochondrial pathways.

      - We have clarified in the Methods section (page 4) that mice were euthanized at the end of their fasting period, precisely detailing the stage of the IF cycle.

      - We thank the editor for this critical correction. We have rephrased the statement on page 12 to more accurately reflect that we observed a lower abundance of proteins involved in mitochondrial oxidative pathways, and we now carefully discuss the important distinction between protein abundance and functional activity in this context.

      The editor noted that the introduction is missing key citations and should acknowledge foundational work.

      We apologize for this oversight. We have now revised the Introduction to include several key foundational citations that were previously missing, ensuring proper credit to the important work of our colleagues.

      Reviewer #2 (Recommendations for the authors):

      We thank the reviewer for their exceptionally detailed and helpful technical suggestions, which have greatly improved the analytical rigor of our manuscript.

      (1) & (4) 3D PCA and Integrated Multi-Omics Analysis:

      We agree with the reviewer that a more sophisticated integrative analysis was needed. As detailed in our response to the editor, we have replaced the original side-by-side analysis with a proper integrated multi-omics analysis using Latent variable approaches (DIABLO), implemented in the mixOmics R package version 6.28.0 tool. This new analysis simultaneously models the proteomic and transcriptomic data from all three organs, identifying shared and tissue-specific sources of variation. This directly and more powerfully validates our claim of "conserved and tissue-specific responses." The results of this analysis are now central to our revised Results section and Figure 4 and supplementary figures (PCA analysis). 

      (2) Concordance/Discordance Analysis:

      This is an excellent point. We have now performed a comprehensive analysis of transcript-protein concordance for the differentially expressed molecules in each tissue. A new figure 4 summarizes these findings, and we discuss the biological implications of both concordant and discordant pairs in the Results section.

      (3) Organ-Specific Functional Remodeling:

      We have taken this advice to heart. The new analysis inherently addresses whether the functional remodeling is shared or tissue-specific. 

      (5) Missing Citations:

      We have thoroughly reviewed the literature and added key citations throughout the manuscript, particularly in the Introduction and Discussion, to properly situate our work within the field.

      (6) Starting Results with Supplementary Data:

      As the study design, including the timing of experimental interventions and blood and tissue collections, is summarized in the supplementary figures, the Results and Discussion section begins with those figures. However, we have now renamed the figures according to the eLife style, in which supplementary figures are linked to the main figures. This ensures a more logical and coherent flow.

      (7) Figure Presentation and Explanation:

      We have completely revised all figures to improve their clarity, consistency, and professional appearance. We have also carefully gone through the manuscript to ensure that every panel in every figure is explicitly mentioned and explained in the main text.

      Reviewer #3 (Recommendations for the authors):

      We thank the reviewer for their important comments regarding the model system.

      (1) Sex Differences and Limitations:

      We fully agree that studying sex differences is a critical and profound aspect of dietary interventions. As noted in our response to the editor, we have added a paragraph to the Discussion to explicitly acknowledge this as a key limitation of our current study. We discuss the existing evidence for sex-specific responses to IF and state that this is an essential direction for future research.

      (2) Early Diet Onset and Developmental Programs:

      This is a valuable point. We have added text to the Discussion acknowledging that starting IF at 6 weeks of age could potentially interact with developmental programs. We discuss this as a consideration for interpreting our data and for the design of future studies.

      We believe that our revised manuscript is substantially stronger as a result of addressing these comments. We are grateful for the opportunity to improve our work and hope that you and the reviewers find these responses and revisions satisfactory.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This report provides useful evidence that EABR mRNA is at least as effective as standard S mRNA vaccines for the SARS-CoV-2 booster vaccine. Although the methodology and the experimental approaches are solid, the inconsistent statistical significance throughout the study presents limitations in interpreting the results. Also, the absence of results showing possible mechanisms underlying the lack of benefit with EABR in the pre-immune makes the findings mostly observational.

      Thank you for your assessment of our study. Respectfully, we do not agree that our study shows a lack of benefit of using the EABR approach. For the monovalent boosters, the S-EABR mRNA booster improved neutralizing antibody titers by 3.4-fold against BA.1 (p = 0.03; Fig. S5) and 4.8-fold against BA.5 (failed to reach statistical significance; Fig. 3B) compared to the regular S mRNA booster, which is consistent with the findings from our prior study in naïve mice. In addition, the bivalent S-EABR booster consistently elicited the highest neutralizing titers against all tested variants, including significantly higher titers against BA.5 and BQ.1.1 than the monovalent S booster. The bivalent S-EABR booster also induced detectable neutralization activity in a larger number of mice than all other boosters.

      Consistent with this analysis, please note that reviewers 1 and 2 commented that “the EABR booster increased the breadth and magnitude of the antibody response, but the effects were modest and often not statistically significant” (reviewer 1) and “the authors found that across both monovalent and bivalent designs, the EABR antigens had improved antibody titers than conventional antigens, although they observed dampened titers against Omicron variants, likely due to immune imprinting” (reviewer 2).

      We agree with the reviewers’ assessment that the EABR booster-mediated improvements were mostly modest, in particular against the BQ.1.1 and XBB.1 strains. We also acknowledge that the improvements in titers did not reach statistical significance in many cases, which we believe could have been addressed by adding more animals to our cohorts. Unfortunately, that would have been prohibitively expensive and time-consuming given that we already included 10 mice per group, which is standard practice in the vaccine field.

      Finally, we also wish to point out that we did include experiments that addressed potential mechanistic differences between booster groups. For example, we conducted deep mutational scanning studies to determine polyclonal antibody epitope mapping profiles, showing that bivalent S-EABR boosters induced more balanced targeting of multiple RBD epitopes, which likely contributed to the observed improvements in neutralization. Our work also included cryo-EM studies demonstrating that bivalent S mRNA boosters promote heterotrimer formation, which could potentially drive preferential stimulation of cross-reactive B cells via intra-spike crosslinking. This represents a potential mechanism explaining how bivalent boosters outperformed monovalent boosters in our and many prior studies, which warrants further investigation. Finally, we also performed serum depletion assays, showing that the BA.5 neutralizing activity elicited by the bivalent Wu1/BA.5 S and S-EABR mRNA boosters was primarily driven by cross-neutralizing Abs induced by the primary vaccination series.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigated the immunogenicity of a novel bivalent EABR mRNA vaccine for SARS-CoV-2 that expresses enveloped virus-like particles in pre-immune mice as a model for boosting the population that is already pre-immune to SARS-CoV-2. The study builds on promising data showing a monovalent EABR mRNA vaccine induced substantially higher antibody responses than a standard S mRNA vaccine in naïve mice. In pre-immune mice, the EABR booster increased the breadth and magnitude of the antibody response, but the effects were modest and often not statistically significant.

      We thank the reviewer for their accurate summary of our study. Please see our comments to the reviewer’s individual points below, as well as our responses to the editor’s assessment above.

      Strengths:

      Evaluating a novel SARS-CoV-2 vaccine that was substantially superior in naive mice in pre-immune mice as a model for its potential in the pre-immune population.

      Weaknesses:

      (1) Overall, immune responses against Omicron variants were substantially lower than against the ancestral Wu-1 strain that the mice were primed with. The authors speculate this is evidence of immune imprinting, but don't have the appropriate controls (mice immunized 3 times with just the bivalent EABR vaccine) to discern this. Without this control, it's not clear if the lower immune responses to Omicron are due to immune imprinting (or original antigenic sin) or because the Omicron S immunogen is just inherently more poorly immunogenic than the S protein from the ancestral Wu-1 strain.

      The reviewer raises an important point, and we agree that including additional groups receiving three immunizations with the bivalent spike and/or spike-EABR mRNA vaccines would have improved the experimental design. However, we believe that several prior studies have already demonstrated that Omicron S immunogens are not inherently poorly immunogenic compared to the ancestral S; e.g., Scheaffer et al., Nat Med (2022); Ying et al., Cell (2022); Muik et al., Sci Immunol (2022). Based on these prior reports, we conclude that the lower neutralizing titers against Omicron variants in our study are most likely driven by immune imprinting as a result of the initial vaccination series with the ancestral S immunogen.

      (2) The authors reported a statistically significant increase in antibody responses with the bivalent EABR vaccine booster when compared to the monovalent S mRNA vaccine, but consistently failed to show significantly higher responses when compared to the bivalent S mRNA vaccine, suggesting that in pre-immune mice, the EABR vaccine has no apparent advantage over the bivalent S mRNA vaccine which is the current standard. There were, however, some trends indicating the group sizes were insufficiently powered to see a difference. This is mostly glossed over throughout the manuscript. The discussion section needs to better acknowledge these limitations of their studies and the limited benefits of the EABR strategy in pre-immune mice vs the standard bivalent mRNA vaccine.

      We acknowledge that the improvements in titers did not reach statistical significance in many cases, which we believe could have been addressed by adding more animals to our cohorts. Unfortunately, that would have been prohibitively expensive and timeconsuming given that we already included 10 mice per group, which is standard practice in the vaccine field. We added a “Limitations of the study” section at the end of the discussion to address all of these points in detail (lines 570-598 in the revised version).

      (3) The discussion would benefit from additional explanation about why they think the EABR S mRNA vaccine was substantially superior in naïve mice vs the standard S mRNA vaccine in their previously published work, but here, there is not much difference in pre-immune mice.

      As we pointed out in our response to the editor’s assessment above, the monovalent SEABR mRNA booster improved neutralizing antibody titers by 3.4-fold against BA.1 (p = 0.03; Fig. S5) and 4.8-fold against BA.5 (failed to reach statistical significance; Fig. 3B) compared to the conventional monovalent S mRNA booster, which is largely consistent with the findings from our prior study in naïve mice. Although the bivalent S-EABR mRNA booster consistently elicited higher neutralizing titers than the conventional bivalent S mRNA booster, we agree with the reviewer that these improvements were modest and not statistically significant. Overall, neutralizing activity against later Omicron variants, such as BQ.1.1 and XBB.1 was low. We attributed this finding to immune imprinting (see response to point (1) above) and acknowledged that the EABR approach was not able to effectively overcome this effect (see discussion section of the paper, lines 537-558; and “Limitations of the study” section, lines 570-598 in the revised version).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Fan, Cohen, and Dam et al. conducted a follow-up study to their prior work on the ESCRT- and ALIX-binding region (EABR) mRNA vaccine platform that they developed. They tested in mice whether vaccines made in this format will have improved binding/neutralization antibody capacity over conventional antigens when used as a booster. The authors tested this in both monovalent (Wu1 only) or bivalent (Wu1 + BA.5) designs. The authors found that across both monovalent and bivalent designs, the EABR antigens had improved antibody titers than conventional antigens, although they observed dampened titers against Omicron variants, likely due to immune imprinting. Deep mutational scanning experiments suggested that the improvement of the EABR format may be due to a more diversified antibody response. Finally, the authors demonstrate that co-expression of multiple spike proteins within a single cell can result in the formation of heterotrimers, which may have potential further usage as an antigen.

      We thank the reviewer for their support and for the accurate summary and evaluation of our study.

      Strengths:

      (1) The experiments are conducted well and are appropriate to address the questions at hand. Given the significant time that is needed for testing of pre-existing immunity, due to the requirement of pre-vaccinated animals, it is a strength that the authors have conducted a thorough experiment with appropriate groups.

      (2) The improvement in titers associated with EABR antigens bodes well for its potential use as a vaccine platform.

      Weaknesses:

      As noted above, this type of study requires quite a bit of initial time, so the authors cannot be blamed for this, but unfortunately, the vaccine designs that were tested are quite outdated. BA.5 has long been replaced by other variants, and importantly, bivalent vaccines are no longer used. Testing of contemporaneous strains as well as monovalent variant vaccines would be desirable to support the study.

      We thank the reviewer for bringing up this important point. We agree that the variants used for this study are now outdated, and it would have been informative to evaluate conventional and EABR boosters against contemporaneous strains. However, as the reviewer correctly pointed out, this type of study requires a substantial amount of time to conduct and will therefore will likely always be outdated by the time the data are analyzed and prepared for publication. To accurately assess immune responses against recent or current strains in mice, multiple boosters would have been needed to mimic the pre-existing immune context in the human population in 2025. Assuming intervals of 6-7 months between boosters (as used in this study to mimic booster intervals in the human population as closely as possible), this type of study would have been challenging to conduct, especially given the limited lifespan of mice. Thus, we performed this proof-of-concept study using outdated variants to assess the potential of EABR-modified boosters. We greatly appreciate the reviewer’s understanding and acknowledge this limitation of our study, which is highlighted in the added “Limitations of the study” section in the revised version of the manuscript (lines 570-598).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The acronym RBD in the title should be spelled out.

      We thank the reviewer for raising this point. We made this change in the revised version of the paper.

      (2) Lines 167-168 describe no differences between the cohorts at day 244. It should also be stated that for all timepoints, there are no significant differences.

      We modified the revised manuscript according to the reviewer’s suggestion (line 170).

      Reviewer #2 (Recommendations for the authors):

      (1) Given the focus on developing broad vaccines for future coronavirus outbreaks, it would be particularly informative to test whether the EABR antigens elicit broadened/heightened responses against other (beta)coronaviruses. If enough serum is left, it would seem straightforward to conduct neutralization assays against non-SARSCoV-2 coronaviruses.

      We thank the reviewer for this valid suggestion. Unfortunately, the extensive analysis of the serum samples, including spike and RBD ELISAs and neutralization assays against multiple variants, deep mutational scanning, and depletion assays, used up the serum samples for most mice. We agree that it would be interesting to investigate whether bivalent EABR boosters elicit pan-sarbecovirus responses in future studies.

      (2) In the bar plots for antibody titer changes, shown as log10 fold change, it is quite hard to interpret the difference between bars (e.g., what is the fold change difference between each bar in the same time point?). A table of mean {plus minus} SD values would be helpful.

      That’s a great suggestion. We added a table (Table S1) presenting all the geometric mean neutralization titers for all timepoints and variants in the revised version of the manuscript.

      (3) The development of heterotrimers as potential antigens is very interesting, but it seems out of place in the current manuscript. This should likely be in a separate, standalone manuscript.

      We thank the reviewer for commenting on the heterotrimer part of our manuscript. The presented work was not intended to advance the development of heterotrimers as potential antigens. Instead, our findings demonstrate that bivalent spike mRNA vaccines readily generate heterotrimers, which could promote intra-spike crosslinking and potentially impact antibody epitope targeting profiles as suggested by the deep mutational scanning data for the bivalent S-EABR mRNA booster (Fig. 4; Fig. S7-8). We think this is an important consideration that warrants further investigation with regards to the development of future bivalent or multivalent vaccines.

      (4) As a minor note, the sequences of the variants used or accession numbers should be provided in the Methods, since different groups have used different mutations for variants.

      We added the accession numbers for the vaccine strains used in this study (lines 604605).

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study delineates a highly specific role for the pPVT in unconditioned defensive responses. The authors use a novel, combined SEFL and SEFR paradigm to test both conditioned and unconditioned responses in the same animal. Next, a c-fos mapping experiment showed enhanced PVT activity in the stress group when exposed to the novel tone. No other regions showed differences. Fiber photometry measurements in pPVT showed enhancement in response to the novel tone in the stressed but not nonstressed groups. Importantly, there were also no effects when calcium measurements were taken during conditioning. Using DREADDS to bidirectionally manipulate global pPVT activity, inhibition of the PVT reduced tone freezing in stressed mice while stimulation increased tone freezing in non-stressed mice.

      Strengths:

      A major strength of this research is the use of a multi-dimensional behavioral assay that delineates behavior related to both learned and non-learned defensive responses. The research also incorporates high-resolution approaches to measure neuronal activity and provide causal evidence for a role for PVT in a very narrow band of defensive behavior. The data are compelling, and the manuscript is well-written overall.

      Weaknesses:

      Figure 1 shows a small, but looks to be, statistically significant, increase in freezing in response to the novel tone in the no-stress group relative to baseline freezing. This observation was also noticed in Figures 2 and 7. The tone presented is relatively high frequency (9 kHz) and high dB (90), making it a high-intensity stimulus. Is it possible that this stimulus is acting as an unconditioned stimulus?

      We thank the reviewer for this insightful comment. In our view, the freezing behavior elicited by the tone reflects an unconditioned response; accordingly, the tone functions as an unconditioned stimulus. Indeed, in our data we found a modest increase in freezing in the no-stress group during the tone presentation relative to baseline (Figures 1, 2, and 7). This effect, however, was considerably smaller in magnitude than the robust freezing observed in stressed mice. We conclude that prior footshock stress enhances the unconditioned tone response.

      In addition, in the final experiment, the tone intensity was increased to 115 dB, and the freezing % in the non-stressed group was nearly identical (~20\%) to the non-stressed groups in Figures 1-2 and Figure 7. It seems this manipulation was meant as a startle assay (Pantoni et al., 2020).

      We appreciate the opportunity to clarify this aspect of the model. In Figure 7, the rationale for selecting a tone amplitude to 115 dB was not to conduct a startle assay. Instead, we sought to determine whether chemogenetic inhibition of the pPVT influenced tone-elicited unconditioned fear in stress naïve mice. Given our prior experiments demonstrating that a 90 dB tone elicits relatively low levels of freezing in non-stressed groups, we increased the tone amplitude to 115 dB in an attempt to elicit a more robust freezing response that would be sufficient to detect meaningful group differences (i.e., prevent a floor effect). As noted by the reviewer, the 115 dB tone yielded moderate levels of freezing behavior. Although freezing levels were not very high, we believe they were sufficient to avoid a floor effect. There was no effect pPVT inhibition in this version of the task, which suggests that pPVT is preferentially engaged after stress. Future studies that identify tone parameters capable of eliciting high levels of freezing will be necessary to further strengthen this finding.

      Because the auditory perception of mice is better at high frequencies (best at ~16 kHz), would the effect seen be evident at a lower dB (50-55) at 9 kHz? If the tone was indeed perceived as “neutral,” there should be no freezing in response to the tone. This complicates the interpretation of the results somewhat because while the authors do admit the stimulus is loud, would a less loud stimulus result in the same effect? Could the interaction observed in this set of studies require not a novel tone, but rather a highintensity tone that elicits an unconditioned response?

      Within our framework, it is important to emphasize that tone intensity (amplitude and frequency), rather than the perceived novelty of the stimulus, is the primary determinant of unconditioned freezing behavior. Moreover, numerous studies have demonstrated that auditory stimuli have the capacity to elicit unconditioned fear responses, as in the case of pseudoconditioning. Accordingly, we agree with the reviewer that decreasing the tone amplitude from 90 dB to 50 dB would diminish the unconditioned freezing response. For example, Kamprath and Wotjak (2004) demonstrated that stress-naïve mice exposed to a 95 dB tone exhibited significantly greater levels of freezing compared to those exposed to an 80 dB tone. This graded effect of tone amplitude on unconditioned freezing was also observed in mice previously exposed to footshock stress. Notably, the authors also reported a plateau effect, such that increases in tone amplitude beyond 95 dB did not further elevate freezing levels. As it relates to our findings, this plateau effect may explain the rather modest changes in freezing behavior that we observed between the 90 dB and 115 dB tone.

      Along these same lines, it appears there may be an elevation in c-fos in the PVT in the non-stress tone test group versus the no-stress home cage control, and overall it appears that tone increases c-fos relative to homecage. Could PVT be sensitive to the tone outside of stress? Would there be the same results with a less intense stimulus?

      Indeed, as the reviewer noted, we observed an increase in PVT c-Fos expression in non-stressed animals exposed to the SEFR tone test relative to homecage controls. The finding is consistent with previous reports demonstrating that PVT neurons are robustly activated by salient stimuli and regulate properties of arousal (Penzo and Gau, 2022). Moreover, the PVT has been shown to exhibit neuronal activity responses that are scaled to stimulus intensity. For example, PVT neurons display increased firing rates in response to a tail shock compared to an air puff (Zhu, 2018). Thus, it is conceivable that a less intense stimuli would evoke a diminished level of c-Fos expression.

      I would also be curious to know what mice in the non-stressed group were doing upon presentation of the tone besides freezing. Were any startle or orienting responses noticed?

      We thank the reviewer for raising this important question. Regarding startle responses, we have found that our standard 90 dB, 9 kHz tone parameter elicits similar degrees of startle between stressed and non-stressed mice (data unpublished). However, Golub et al. (2009) observed effects of prior footshock stress on acoustic startle. Further investigation of behavioral responses expressed during the tone is certainly warranted.

      Reviewer #2 (Public review):

      Summary:

      Nishimura and colleagues present findings of a behavioral and neurobiological dissociation of associative and nonassociative components of Stress Enhanced Fear Responding (SEFR).

      Strengths:

      This is a strong paper that identifies the PVT as a critical brain region for SEFR responses using a variety of approaches, including immunohistochemistry, fiber photometry, and bidirectional chemogenetics. In addition, there is a great deal of conceptual innovation. The authors identify a dissociable behavior to distinguish the effects of PVT function (among other brain regions).

      Weaknesses:

      (1) The authors find a lack of difference between the Stress and No Stress groups in pPVT activity during SEFL conditioning with fiber photometry but an increase in freezing with Gq DREADD stimulation. How do authors reconcile this difference in activity vs function?

      The reviewer points out a curious dissociation. Fiber photometry showed no effect of prior stress on the PVT response during single-shock contextual fear conditioning; however, Gq DREADD stimulation of PVT led to increased postshock freezing during this session. We don’t have a definitive explanation for this dissociation, but we wish to emphasize two relevant points. The first is that in our experience, post-shock freezing during the one-shock contextual fear conditioning session is modest, variable, and an unreliable predictor of long-term contextual fear. Thus, we are hesitant to draw firm conclusions from these data. Second, we did not observe differences in freezing during the SEFL context test, indicating that stimulation of pPVT during conditioning is not sufficient to elicit long-term enhancement of conditioned fear (i.e., SEFL). This suggests that the acute freezing response following shock exposure is mechanistically distinct from expression of conditioned contextual fear. Clearly, further research will be needed to clarify the conditions under which PVT activity regulates / does not regulate freezing.

      (2) Because the PVT plays a role in defensive behaviors, it would be beneficial to show fiber photometry data during freezing bouts vs exclusively presented during tone a shock cue presentations.

      We appreciate the reviewer's suggestion. Unfortunately, freezing data are not available for the fiber photometry experiment because the fiber optic patch cable interfered with mouse activity. We now acknowledge this as a limitation in the paper (line #202).

      (3) Similar to the above point, were other defensive behaviors expressed as a result of footshock stress or PVT manipulations?

      In addition to freezing behavior and locomotor activity in the open field, we examined the time and distance spent in the center of the open field arena. Consistent with our previous report (Hassien, 2020), we did not observe significant group differences between stress conditions, nor did we detect differences across the various experiential manipulations. We did not examine other defensive behaviors in this study. Ongoing research in the lab is examining a broader range of defensive behaviors in this paradigm.

      (4) Tone attenuation in Figure 8 seems to be largely a result of minimal freezing to a 115-dB tone. While not a major point of the paper, a more robust fear response would be convincing.

      Although our data indicate that DREADD-mediated inhibition of the pPVT did not attenuate freezing in non-stressed mice, we agree with the reviewer’s assessment that the 115 dB tone elicited only minimal freezing. Therefore, we remain open to the possibility that higher baseline levels of freezing might reveal a significant behavioral effect. We found it challenging to identify a decibel range that reliably evokes robust freezing in non-stressed mice. Future studies could explore varying tone frequencies to achieve a stronger freezing response.

      (5) In the open field test, the authors measure total distance. It would be beneficial to also show defensive behavioral (escape, freezing, etc) bouts expressed.

      We agree this would be valuable information, and we have noted it as a future direction in the discussion.

      (6) The authors, along with others, show a behavioral and neural dissociation of footshock stress on nonassociative vs associative components of stress; however, the nonassociative components as a direct consequence of the stress seem to be necessary for enhancement of associative aspects of fear. Can authors elaborate on how these systems converge to enhance or potentiate fear?

      We appreciate the reviewer for recognizing this important point regarding the mechanistic relationship between nonassociative fear sensitization and associative fear learning that occurs following footshock stress. At present, the majority of research on this topic has been conducted using the SEFL paradigm.

      At the behavioral level, previous studies indicate that manipulations that interfere or attenuate associative fear memory of the footshock stress event fail to block nonassociative fear sensitization. For example, both SEFL and SEFR persist in animals that have successfully undergone fear extinction training in the footshock stress context (Rau et al., 2005; Hassien et al., 2020). Furthermore, reports also find that infantile or pharmacological amnesia of the footshock stress memory does not occlude the emergence of SEFL (Rau et al., 2005; Poulos et al., 2014). Taken together, associative fear memory of the footshock stress event does not appear to be necessary for fear sensitization.

      If and how the associative and nonassociative mechanisms interact is an interesting question that we are currently investigating. PVT has direct projections to the central and basolateral amygdala, regions well known to mediate conditioned fear acquisition and expression (Penzo et al., 2015). Why PVT activity does not modulate conditioned fear in our hands is intriguing. PVT is a heterogeneous structure with a variety of projections (e.g., Shima et al., 2023), and it is possible that the PVT-Amygdala projections are not hyperactive in our paradigm. As we alluded above, further research will be needed to understand why stress-induced PVT hyperactivity affects some forms of fear and not others.

      (7) In the discussion, authors should elaborate on/clarify the cell population heterogeneity of the PVT since authors later describe PVT neurons as exclusively glutamatergic.

      The reviewer is correct that additional explanation of PVT cellular heterogeneity is warranted. We now provide clarity on this point in the discussion.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Nishimura et al. examines the behavioural and neural mechanisms of stress-enhanced fear responding (SEFR) and stress-enhanced fear learning (SEFL). Groups of stressed (4 x shock exposure in a context) vs non-stressed (context exposure only) animals are compared for their fear of an unconditioned tone, and context, as well as their learning of new context fear associations. Shock of higher intensity led to higher levels of unlearned stress-enhanced fear expression. Immediate early gene analysis uncovered the PVT as a critical neural locus, and this was confirmed using fiber photometry, with stressed animals showing an elevated neural signal to an unconditioned tone. Using a gain and loss of function DREADDs methodology, the authors provide convincing evidence for a causal role of the PVT in SEFR.

      Strengths:

      (1) The manuscript uses critical behavioural controls (no stress vs stress) and behavioural parameters (0.25mA, 0.5mA, 1mA shock). Findings are replicated across experiments.

      (2) Dissociating the SEFR and SEFL is a critical distinction that has not been made previously. Moreover, this dissociation is essential in understanding the behavioural (and neural) processes that can go awry in fear.

      (3) Neural methods use a multifaceted approach to convincingly link the PVT to SEFR: from Fos, fiber photometry, gain and loss of function using DREADDs.

      Weaknesses:

      No weaknesses were identified by this reviewer; however, I have the following comments:

      A closer examination of the Test data across time would help determine if differences may be present early or later in the session that could otherwise be washed out when the data are averaged across time. If none are seen, then it may be worth noting this in the manuscript.

      Given the sex/gender differences in PTSD in the human population, having the male and female data points distinguished in the figures would be helpful. I assume sex was run as a variable in the statistics, and nothing came as significant. Noting this would also be of value to other readers who may wonder about the presence of sex differences in the data.

      We appreciate the reviewer’s thoughtful feedback and have addressed these points as follows: In the methods section, we clarify that pre-tone and post-tone freezing behavior was averaged because we did not detect a significant effect of time across all experiments (line #474). With regards to sex differences, we clarify in the methods section that we did not detect sex as a statistically significant variable across tests (line #443). In addition, we have revised the figures to denote male and female subjects separately.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Following discussion, the reviewers and editors agreed that the strength of the evidence could be updated to compelling, provided the comments were adequately addressed.

      Reviewer #1 (Recommendations for the authors):

      (1) In the discussion around line 333, there is also data indicating a time-dependent role for PVT in conditioned fear (Quinones-Laracuente 2021; Do-Monte 2015).

      We agree with the reviewer’s assessment and have revised the discussion accordingly (line #364).

      (2) The 129S6/SvEvTac mouse exhibits impaired fear extinction but intact discrimination (Temme, 2014). Was there any rationale for using this line of mice?

      The reviewer is correct that additional explanation is warranted. We have amended the manuscript to include additional rationale for using the 129S6/SvEvTac mouse strain as well as address the findings of Temme, 2014 as they relate to our study (line #94).

      (3) Was there any reason why there were no c-fos results in the PAG and IPBM? You discuss those brain regions and their importance in the circuit in the discussion.

      In the current manuscript, we do show c-fos results for the lPAG, dlPAG, and lPBN (Figure 3). We highlight in the discussion the relevance of these regions in the fear circuit.

      (4) Take a look at Sillivan et al., 2018 for an additional reference in the introduction (around lines 61).

      We thank the reviewer for their suggestion and have included the reference in the introduction (line #63).

      (5) Can the authors show the c-fos data for aPVT and pPVT separately? The authors focus on pPVT for later manipulations, but the c-fos data is collapsed. Along these same lines, were there any corrections for multiple comparisons across the brain regions? While the subsequent experiments firmly support a role for pPVT in unlearned stressinduced fear response, a proper correction for multiple comparisons is warranted.

      We have revised Figure 3 to include c-fos expression for both the anterior and posterior PVT separately. To correct for multiple comparisons, we conducted twoway ANOVA (Brain Region X Group) with Tukey's-corrected posthoc tests detailed in methods section (line #577).

      (6) Do the authors provide rationale for why they began to focus specifically on pPVT versus aPVT?

      We agree that additional clarity is warranted. We have provided additional rationale for selecting pPVT as our primary focus in the results section (line #197).

      (7) Lines 298-337 of the discussion could be shortened. This long preamble is a summary of the results.

      We agree with the reviewer’s assessment and have revised the manuscript accordingly.

      Reviewer #2 (Recommendations for the authors):

      Additional analyses for fiber photometry and open field data to probe for PVT-related changes in defensive behaviors beyond freezing.

      As stated above, we agree with the reviewer that additional behavioral analyses would be valuable. Unfortunately, such measures are not available for the current experiment.

      Reviewer #3 (Recommendations for the authors):

      As mentioned in the weaknesses, just checking for differences across time on the Tests, highlighting the M vs. F datapoints in the figures, and reporting if there are sex differences in any of the analyses.

      In the revised manuscript, we have included separate male and female data points for each figure. In addition, we provided clarity in the methods section reporting a lack of statistically significant sex differences across each experiment (line #443).

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      It is well established that many potivirids (viruses in the Potiviridae family), particularly potyviruses (viruses in the Potyvirus genus), recruit (selectively) either eIF4E or eIF(iso)4E, while some others can use both of them to ensure a successful infection. CBSD caused by two potyvirids, i.e., ipomoviruses CBSV and UCBSV, severely impedes cassava production in West Africa. In a previous study (PBI, 2019), Gomez and Lin (co-first authors), et al. reported that cassava encodes five eIF4E proteins, including eIF4E, eIF(iso)4E-1, eIF(iso)4E-2, nCBP-1 and nCBP-2, and CBSV VPg interacts with all of them (Co-IP data). Simultaneous CRISPR/Cas9-mediated editing of nCBp-1 and -2 in cassava significantly mitigates CBSD symptoms and incidence. In this study, Lin et al further generated all five eIF4E family single mutants as well as both eIF(iso)4E-1/-2 and nCBP-1/-2 double mutants in a farmer-preferred casava cultivar. They found that both eIF(iso)4E and nCBP double mutants show reduced symptom severity, and the latter is of better performance. Analysis of mutant sequences revealed one important point mutation, L51F of nCBP-,2 that may be essential for the interaction with VPg. The authors suggest that the introduction of the L51F mutation into all five eIF4E family proteins may lead to strong resistance. Overall I believe this is an important study enriching knowledge about eIF4E as a host factor/susceptibility factor of potyvirids and proposing new information for the development of high CBSD resistance in cassava. I suggest the following two major comments for authors to consider for improvement:

      (1) As eIF(iso)4e-1/-2 or nCBP-1/-2 double mutants show resistance, why not try to generate a quadruple mutant? I believe it is technically possible through conventional breeding.

      (2) I agree that L51F mutation may be important. But more evidence is needed to support this idea. For example, the authors may conduct a quantitative Y2H assay on the binding of VPg to each of the eIF4E (L51F) mutants. Such data may add as additional evidence to support your claim.

      We thank the reviewer for their overall assessment. Regarding investigating a quadruple mutant, we agree that this is a logical next step to investigate. A conventional breeding approach with existing mutant lines, however, is problematic for several reasons; 1) cassava does not flower where this work was conducted, and 2) cassava is subject to inbreeding depression, resulting in both low seed set and considerable heterogeneity among progeny that do arise. Editing existing double mutants is possible, but would require a significant, multi-year investment to produce embryogenic tissue from existing lines and generate the new lines. Cassava has practical limits as a non-model plant. Given these constraints, we conclude that investigating a quadruple mutant is beyond the scope of the current work.

      For investigating the HPL to HPF mutation in other cassava eIF4E-family proteins and their interaction with VPg in yeast, we have now completed this experiment and included the data in the paper. Notably we find that generating this mutant for eIF(iso)4E-2 attenuates VPg interaction without impairing eIF(iso)4E-2 accumulation, while similarly mutating nCBP-1 and eIF(iso)4E-1 results in total and reduced protein accumulation, respectively.

      Reviewer #2 (Public review):

      Summary:

      The authors generated single and double knockout mutants for the eIF4E family members eIF4E, iso4E1, iso4E2, nCBP1, and nCBP2 in cassava. While a single knockout of these eIF4E genes did not abolish viral infection, the nCBP1/nCBP2 double knockout mutant displayed the weakest symptoms and viral infection. Through yeast two-hybrid screening, the nCBP-2 L51F mutant was identified, and the mutant was unable to interact with VPg, yet the nCBP-2 L51F mutant could complement the eIF4E yeast mutant. This L51F is a potentially important editing site for eIF4E.

      Strengths:

      This study systematically generated single and double knockout mutants for the eIF4E family members and investigated their antiviral activity. It also identified a L51F site as a potentially important antiviral editing site in eIF4E, however, its antiviral genetic evidence remains to be validated.

      Weaknesses:

      (1) The symptoms of the iso4E1 & iso4E2 double-knockout mutant are slightly alleviated, and those of the nCBP1 & nCBP2 double-knockout mutant are alleviated the most. If the iso4E1 & iso4E2 and nCBP1 & nCBP2 mutants are crossed to obtain quadruple-knockout mutant plants, whether the resistance of the quadruple mutant will be more excellent should be further investigated.

      (2) Although the yeast two-hybrid identified the nCBP-2 L51F mutant, there is no direct biological evidence demonstrating its antiviral function. While the 6-amino acid deletion mutant (including L51F) showed attenuated symptoms, this deletion might be sufficient to cause loss-of-function of nCBP-2. These indirect observations cannot definitively establish that the L51F mutation specifically confers antiviral activity.

      (3) Given that nCBP-2 can rescue yeast eIF4E mutants, introducing wild type and L51F nCBP2 into the Arabidopsis iso4e mutant viral infectious clones into yeast systems could clarify whether the L51F mutation (and the same mutations in eIF4E, iso4E1, iso4E2) abrogates their roles as viral susceptibility factors - critical genetic evidence currently missing.

      We sincerely thank the reviewer for their constructive feedback.

      With regards to investigating a quadruple eIF4E mutant, please see our response to reviewer 1.

      The reviewer makes a salient point regarding the nCBP-2 L51F and K45_L51del mutations. Ideally, complementation of the ncbp double mutant with nCBP-2 L51F, followed by viral challenge, would address this question. However, the practical limitations, as noted in our response to reviewer 1, make this difficult within the context of this manuscript. We acknowledge that this is a limitation of our study and have been cautious in not overstating our conclusions.

      Reviewer #3 (Public review):

      In the manuscript, the authors generated several mutant plants defective in the eIF4E family proteins and detected cassava brown streak viruses (CBSVs) infection in these mutant plants. They found that CBSVs induced significantly lower disease scores and virus accumulation in the double mutant plants. Furthermore, they identified important conserved amino acid for the interaction between eIF4E protein and the VPg of CBSVs by yeast two hybrid screening. The experiments are well designed, however, some points need to be clarified:

      (1) The authors reported that the ncbp1 ncbp2 double mutant plants were less sensitive to CBSVs infection in their previous study, and all the eIF4E family proteins interact with VPg. In order to identify the redundancy function of eIF4E family proteins, they generated mutants for all eIF4E family genes, however, these mutants are defective in different eIF4E genes, they did not generate multiple mutants (such as triple, quadruple mutants or else) except several double mutant plants, it is hard to identify the redundant function eIF4E family genes.

      (2) The authors identified some key amino acids for the interaction between eIF4E and VPg such as the L51, it is interesting to complement ncbp1 ncbp2 double mutant plants with L51F form of eIF4E and double check the infection by CBSVs.

      We thank the reviewer for their assessment and feedback.

      Regarding analysis of higher-order mutants, please see our response to Reviewer #1’s public review.

      For investigation of nCBP-2 L51F in planta, please see our response to Reviewer #2’s public review.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Since nCBP2 can complement a yeast mutant, it indicates that nCBP2 can also complement Arabidopsis. Wild-type nCBP2 should be introduced into the Arabidopsis iso4e mutant to determine whether it can complement Arabidopsis iso4e and whether the virus can re-establish the infection. The nCBP2 L51F mutant should also be introduced into the Arabidopsis iso4e mutant to see if this mutant fails to re-establish the virus infection. Similarly, eIF4E, iso4E1, iso4E2, nCBP1, etc., should be introduced into the Arabidopsis iso4e mutant to determine whether they can truly complement the virus-infected mutant Arabidopsis, while the L51F mutants cannot.

      Arabidopsis encodes multiple eIF4E proteins, an nCBP protein, and an eIF(iso)4E protein, and knocking out the eIF(iso)4e gene specifically confers resistance to TuMV. Introducing cassava nCBP-2 into arabidopsis eif(iso)4e mutants is unlikely to restore TuMV susceptibility. Because TuMV belongs to a different genus than CBSV, we used the TuMV VPg interaction with arabidopsis eIF(iso)4E to test the generality of mutating the eIF4E HPL motif to HPF potyvirid VPg-eIF4E interaction. However, since this mutation disrupts arabidopsis eIF(iso)4E’s endogenous translation initiation activity in yeast, this mutant protein is not worth pursuing further. In contrast, cassava eIF(iso)4E-2 L27F retains translation initiation activity and has reduced interaction with CBSV VPg by quantitative yeast two-hybrid. It would be interesting to see if this particular mutant protein could interact with TuMV VPg, and if not, would then be worth testing for the ability to restore TuMV susceptibility in Arabidopsis eif(iso)4e. Unfortunately, we are unable to pursue these experiments at this time.

      (2) Given that nCBP-2 can complement yeast eIF4E mutants, the authors may introduce viral infectious clones into yeast systems expressing nCBP-2 variants to determine whether nCBP-2 supports viral translation. This approach could further clarify whether the L51F mutation (and mutations in eIF4E, iso4E1, so4E2) abolishes their roles as viral susceptibility factors.

      This is an intriguing suggestion, but challenging for a few reasons. First, an infectious clone of CBSV Naliendele isolate does not exist, although we have tried to construct one, without success. There is also no guarantee such a clone could infect yeast. We are aware of yeast being used as a surrogate host for a few plant viruses, such as Tomato bushy stunt virus and Brome mosaic virus but are unaware of a similar system for any potyvirid. Developing such a system would undoubtedly require a significant investmentbeyond the scope of this manuscript.

      (3) Phenotypes of all mutant lines with and without virus inoculation in Table 1 should be presented.

      Photos of un-challenged mutants are included in supplemental figures. Representative storage root symptoms for all lines have now been included in the supplemental figures as well.

      (4) In Figure 1c, the results of viral accumulation assays should be presented for additional mutant lines beyond ncbp-1, ncbp-2, ncbp-1 nCBP-2 K45_L51del, and ncbp-1 ncbp-2, particularly eif(iso)4e-1 & eif(iso)4e-2#172 and eif(iso)4e-1 & eif(iso)4e-2#92.

      We have previously found that subtle reductions in visible disease do not always translate to clear differences in viral titer when analyzed by qPCR (Gomez et al., 2018). As such, we focused on lines with the strongest phenotypes in viral titer experiments.

      (5) Inconsistently, the ncbp-1 nCBP-2 K45_L51del line showed reduced symptoms compared to wild-type in Figures 1a and 1b, yet viral accumulation levels were comparable to wild-type in Figure 1c. The explanations for this discrepancy are required.

      Please see our response to (4).

      (6) Root phenotypic data for all mutant lines shown in Figure 1d should be presented.

      Please see our response to (3).

      (7) In Figure 2b, GST control pulldowns showed detectable proteins. This background signal requires explanation.

      It is not uncommon to see weak signal in bead or tag-only negative control pulldown and IP reactions. Importantly, we see strong enrichment of VPg relative to these controls in our experimental samples.

      (8) Contrary to the abstract's implication, Figure 5c indicates that the L51F mutation impacts yeast growth, suggesting potential pleiotropic effects of this mutant.

      We interpret the results to be that nCBP2 L51F does not fully complement the yeast eif4e mutation, rather than nCBP2 L51F impacts yeast growth.

      (9) In vivo protein-protein interaction assays (e.g., co-immunoprecipitation) should be performed to complement the in vitro GST pull-down data in Figure 6.

      We appreciate the desire for these experiments and agree that they would bolster our Y2H and pulldown data. Unfortunately, we are not able to complete these experiments at this time, so have been careful not to over interpret the data.

      (10) Since the AteIF(iso)4E L28F mutant fails to complement yeast, the authors should test whether introducing the L51F mutation into other family members (eIF4E, iso4E1, iso4E2, nCBP1) preserves their yeast complementation capacity.

      This has now been done for additional cassava eIF4E-family proteins.

      (11) Indicate molecular weight sizes in all Western blots.

      This was done. As differences in buffer formulations between gel types can affect the mobility and thus apparent molecular weight of markers, we have provided in the methods section SDS-PAGE gel chemistries and specific protein ladders used in this study. Importantly we note in our experience that certain markers, in relation to proteins of interest, can vary up to 15 kDa between gel chemistries.

      (12) Figures 4d,e are not provided in the paper. Based on the content of the paper, the description in the paper likely corresponds to Figures 5c, d.

      Thank you for catching this error, this has now been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is a potentially valuable modeling study on sequence generation in the hippocampus in a variety of behavioral contexts. While the scope of the model is ambitious, its presentation is incomplete and would benefit from substantially more methodological clarity and better biological justification. The work will interest the broad community of researchers studying corticalhippocampal interactions and sequences.

      Thank you very much for your comments. We are very encouraged by your positive feedback. We have revised our manuscript to clarify our model, strengthen its biological justification, and make it more accessible to a broader audience.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Ito and Toyozumi proposes a new model for biologically plausible learning of context-dependent sequence generation, which aims to overcome the predefined contextual time horizon of previous proposals. The model includes two interacting models: an Amari-Hopfield network that infers context based on sensory cues, with new contexts stored whenever sensory predictions (generated by a second hippocampal module) deviate substantially from actual sensory experience, which then leads to hippocampal remapping. The hippocampal predictions themselves are context-dependent and sequential, relying on two functionally distinct neural subpopulations. On top of this state representation, a simple Rescola-Wagner-type rule is used to generate predictions for expected reward and to guide actions. A collection of different Hebbian learning rules at different synaptic subsets of this circuit (some reward-modulated, some purely associative, with occasional additional homeostatic competitive heterosynaptic plasticity) enables this circuit to learn state representations in a set of simple tasks known to elicit context-dependent effects.

      We appreciate it for carefully reading the manuscript and finding the novelty and significance in our work.

      Strengths:

      The idea of developing a circuit-level model of model-based reinforcement learning, even if only for simple scenarios, is definitely of interest to the community. The model is novel and aims to explain a range of context-dependent effects in the remapping of hippocampal activity.

      Weaknesses:

      The link to model-based RL is formally imprecise, and the circuit-level description of the process is too algorithmic (and sometimes discrepant with known properties of hippocampus responses), so the model ends up falling in between in a way that does not fully satisfy either the computational or the biological promise. Some of the problems stem from the lack of detail and biological justification in the writing, but the loose link to biology is likely not fully addressable within the scope of the current results. The attempt at linking poor functioning of the context circuit to disease is particularly tenuous.

      We thank the reviewer for the insightful comments.

      To better characterize our model, we added formal descriptions of each task setting and explicitly specified the sources of uncertainty. We revised the schematic figures in Figure 1 to more clearly illustrate our model. An important revision is that we now distinguish between stimulus prediction error (SPE)–driven remapping and reward prediction error (RPE)–facilitated remapping. SPEdriven remapping is triggered by mismatches between actual sensory stimuli and those predicted from past history and serves to update the current contextual state or to create a new one. In contrast, RPE-facilitated remapping is more likely to occur when executing an action planning sequence associated with recent negative reward prediction errors, possibly due to environmental changes, and promotes exploration of alternative planning sequences.

      “Based on the source of prediction errors, we consider two types of remapping: sensory prediction error (SPE)–driven remapping and reward prediction error (RPE)–facilitated remapping (Figure 1C). SPE-driven remapping is triggered when the mismatch between the predictive inputs from H to X and externally driven sensory inputs exceeds a threshold (see Materials and Methods), causing X to either transition to a different contextual state or form a new one (Figure 1D). RPE-facilitated remapping is more likely to be triggered when the agents execute an action plan following a hippocampal sequence marked by a no-good indicator. The no-good indicator indicates that the action plan, i.e. the hippocampal sequence, has recently been associated with negative reward prediction errors, possibly due to environmental changes (see Materials and Methods). It then facilitates the exploration of alternative hippocampal sequences (Figure 1E).”

      In addition, we added Figure 2C-E to clarify the neural representations of external stimuli and contextual states in the X module, as well as the neural representations within the H module. We also clarified the purpose of each model component and discussed plausible biological implementations to justify our modeling choices. Furthermore, we added a schematic illustration of our results related to psychiatric disorders in Figure 5B and revised the corresponding section of the manuscript to explicitly frame these results as a computational hypothesis. We also expanded the discussion to relate our findings to existing computational psychiatry models (see point-bypoint responses below).

      We believe that these revisions have improved the clarity of our model and broadened its accessibility to a wider audience.

      Reviewer #2 (Public review):

      Summary:

      Ito and Toyoizumi present a computational model of context-dependent action selection. They propose a "hippocampus" network that learns sequences based on which the agent chooses actions. The hippocampus network receives both stimulus and context information from an attractor network that learns new contexts based on experience. The model is consistent with a variety of experiments, both from the rodent and the human literature, such as splitter cells, lap cells, and the dependence of sequence expression on behavioral statistics. Moreover, the authors suggest that psychiatric disorders can be interpreted in terms of over-/under-representation of context information.

      We appreciate it for carefully reading the manuscript and finding the novelty and significance in our work.

      Strengths:

      This ambitious work links diverse physiological and behavioral findings into a self-organizing neural network framework. All functional aspects of the network arise from plastic synaptic connections: Sequences, contexts, and action selection. The model also nicely links ideas from reinforcement learning to neuronally interpretable mechanisms, e.g., learning a value function from hippocampal activity.

      Weaknesses:

      The presentation, particularly of the methodological aspects, needs to be majorly improved. Judgment of generality and plausibility of the results is hampered, but is essential, particularly for the conclusions related to psychiatric disorders. In its present form, it is unclear whether the claims and conclusions made are justified. Also, the lack of clarity strongly reduces the impact of the work in the larger field.

      We appreciate the reviewer’s valuable feedback. In the revised manuscript, we have improved the presentation of the methodological aspects by providing a more intuitive and general explanation of the model framework and training procedure. We also rewrote the section on psychiatric implications to more clearly explain how dysfunction in contextual inference occurs in our model. These revisions enhance both the clarity and plausibility of our conclusions.

      More specifically:

      (1) The methods section is impenetrable. The specific adaptations of the model to the individual use cases of the model, as well as the posthoc analyses of the simulations, did not become clear. Important concepts are only defined in passing and used before they are introduced. The authors may consider a more rigorous mathematical reporting style. They also may consider making the methods part self-contained and moving it in front of the results part.

      Thank you for raising the important point.

      To improve readability, we have updated Figure 1 to more clearly illustrate the main model structure and its adaptation to individual use cases. Additionally, we have moved the previous Figure 6 (now Figure S1) to an earlier point in the Results to facilitate understanding of the methodological flow. Method section is also revised to explain the algorithmic structure indicated in Figure S1. These revisions make the methods more self-contained and easier to follow.

      In the revised manuscript, we have clarified that our model is qualitatively related to the Bayesadaptive reinforcement learning framework (Guez et al., 2013) as follows.

      “In the framework of reinforcement learning, our model can be mapped onto a Bayesian-adaptive model-based architecture in which contextual state serves as the root of Monte Carlo tree search (Guez et al., 2013) in a simple, largely stable environment with noiseless and unambiguous sensory stimuli, and only occasional abrupt changes. In this setup, prediction errors arise from agent’s lack of experience or due to abrupt environmental changes. Once a context selector X infer the hidden state, the sequence composer H generates episodic sequences that correspond to trajectories in a search tree, each branch representing possible action–outcome sequences. Just as Monte Carlo tree search explores potential future paths to evaluate expected rewards, H produces hippocampal sequences that simulate future states and rewards based on its learned connectivity. In this way, X defines the context that anchors the root of the tree, while H expands the tree through replay or planning, thereby our model provides a simplified algorithmic implementation model-based reinforcement learning via tree search planning.”

      (2) The description of results in the main text remains on a very abstract level. The authors may consider showing more simulated neural activity. It remains vague how the different stimuli and contexts are represented in the network. Particularly, the simulations and related statistical analyses underlying the paradigms in Figure 4 are incompletely described.

      Thank you for pointing this out.

      In the revised manuscript, we have added explicit examples of simulated neural activity. Specifically, we added new figures in Figure 2C–E and showed representative activity patterns from both Context selector (X) and Sequence composer (H). We also clarified the distinction between activity in the stimulus domain (externally driven) and the context domain (internally inferred states)

      “Figure 2C illustrates an example of both the environmental state transition and the corresponding contextual state transition of an agent. The neural activity of X at each contextual state is shown in Figure 2D, where the environmental states … are represented in the stimulus domain and the contextual states … are represented in the context domain. … In the example transition shown in Figure 2C, the agent selected an environmental state transition from S2 to S4 in the 2nd, 5th, and 8th trials, which corresponds to a contextual state transition from X2β to X4β in the X module. However, because this transition was not rewarded, no synaptic potentiation occurred among hippocampal neurons. Subsequently, in the 11th trial, the agent attempted an environmental state transition from S2 to S5, corresponding to the transition from X2β to X5β in the contextual states.

      The agent received a reward at S5, and the corresponding hippocampal sequence was strengthened, enabling the agent to acquire the alternation task in the following trials (Figure 2E).”

      (see point-by-point responses below).

      We also added a detailed explanation of our results in Figure 4 as follows.

      “We consider a simplified environment of a probabilistic cueing paradigm (Ekman et al., 2022). In this study, two auditory contextual cues probabilistically predicted distinct visual motion sequences, and fMRI decoding was used to examine the frequency of hippocampal replay. We simplified this task as shown in Figure 4A. ”

      “... This result replicates Ekman et al. (2022), who showed that the probability of the contextual cues is reflected in the statistically significant differences in hippocampal replay probability in humans (Figure 4F).”

      “F, Our model behavior is similar to the human fMRI result of the cue-probability-dependent hippocampal replay (Ekman et al., 2022). Paired sample t-test. **P<0.01.”

      We believe that these revisions make the model description and simulation results more concrete and easier to interpret.

      (3) The literature review can be improved (laid out in the specific recommendations).

      Thank you for pointing this out. We revised the literature review to the best of our ability.

      (4) Given the large range of experimental phenomenology addressed by the manuscript, it would be helpful to add a Discussion paragraph on how much the results from mice and humans can be integrated, particularly regarding the nature of the context selection network.

      Thank you for your suggestion.

      In the revised manuscript, we added a new paragraph in the Discussion explicitly addressing how results from mice and humans can be integrated.

      “Our model is a functionally modular account of the cortical regions and hippocampus, enabling it to capture experimental findings across species. While hippocampal activity in rodents has been extensively characterized in terms of spatial coding, human hippocampal representations are more often non-spatial and episodic-like (Bellmund et al., 2018; Eichenbaum, 2017). For episodic memory to support flexible behavior, it would be beneficial to retrieve each episode in a contextdependent manner. The episodic contents may vary across species and individuals, yet the fundamental computations—estimating the current context from external stimuli and their history, and flexibly updating this estimate via prediction errors—are likely conserved. Holding context information until the contextual prediction error is detected is analogous to the belief state in model-based reinforcement learning, which is known to improve performance under partially observable conditions (POMDPs) (Kaelbling et al., 1998). Our model provides a simple algorithmic implementation of this principle.”

      (5) As a minor point, the hippocampus is pretty much treated as a premotor network. Also, a Discussion paragraph would be helpful.

      Thank you for pointing this out.

      We define action as a transition from one environmental state to another, and transition-coding hippocampal neurons are used for action-planning. Because our model does not incorporate errors in transitions (actions), the generated hippocampal sequences are perfectly correlated with the executed transitions (actions). However, we acknowledge that computations in the brain are more complex, with contributions from other regions such as the premotor network and the basal ganglia. To clarify this, we added formal representations of state transitions (action) in each task and the following sentences to the manuscript.

      “In Sequence composer, there exist two types of neurons: state-coding neurons, which represent each contextual state, and transition-coding neurons, which encode transitions to successive contextual states given the contextual state indicated by the state-coding neurons (Materials and Methods). Note that in the real brain, not only hippocampus but also the premotor cortex and the basal ganglia contribute to action planning and execution (Hikosaka et al., 2002). Here, however, we focus on how simplified planning sequences are learned and composed in a context-dependent manner.”

      “Our model posits that the Sequence Composer corresponds to computations within the hippocampus. As a biologically plausible projection, we consider CA3–CA1 circuit, where contextual inputs from regions such as the PFC and EC provide the current contextual state to CA3, enabling the recurrent CA3–CA1 architecture to generate predictions of the next contextual state without errors in action.”

      Reviewer #3 (Public review):

      Summary:

      This paper develops a model to account for flexible and context-dependent behaviors, such as where the same input must generate different responses or representations depending on context. The approach is anchored in the hippocampal place cell literature. The model consists of a module X, which represents context, and a module H (hippocampus), which generates "sequences". X is a binary attractor RNN, and H appears to be a discrete binary network, which is called recurrent but seems to operate primarily in a feedforward mode. H has two types of units (those that are directly activated by context, and transition/sequence units). An input from X drives a winner-take-all activation of a single unit H_context unit, which can trigger a sequence in the H_transition units. When a new/unpredicted context arises, a new stable context in X is generated, which in turn can trigger a new sequence in H. The authors use this model to account for some experimental findings, and on a more speculative note, propose to capture key aspects of contextual processing associated with schizophrenia and autism.

      We thank the reviewer for this summary of our model.

      We would like to clarify that the hippocampal Sequence composer (H) is a recurrent network that iteratively composes the next state and the associated sensory stimuli in the sequence based on the current contextual state.

      Strengths:

      Context-dependency is an important problem. And for this reason, there are many papers that address context-dependency - some of this work is cited. To the best of my knowledge, the approach of using an attractor network to represent and detect changes in context is novel and potentially valuable.

      Weaknesses:

      The paper would be stronger, however, if it were implemented in a more biologically plausible manner - e.g., in continuous rather than discrete time. Additionally, not enough information is provided to properly evaluate the paper, and most of the time, the network is treated as a black box, and we are not shown how the computations are actually being performed.

      We thank the reviewer for suggesting an important direction for future work. The goal of this research is to develop a minimal, functionally modular neural circuit model that provides general insights into how context-dependent behavior can be realized across species, including humans. To simplify our model, we only considered discrete-time environmental states, where the exact length of the time step depends on each environment. Extending the model to a more biologically plausible, continuous-time framework is a promising direction for future work, such as using continuous-time modern Hopfield networks and synfire chains. We modified the Discussion section to clearly point out this direction.

      “... the resolution at which our model should distinguish different contextual states, including the stimulus resolution and time resolution, is hand-tuned in this work. While we used an abstract, gridlike state space with discrete time, an important direction for future work is to model its activity at finer-grained neural timescales, … In realistic, continuously changing environments, such resolutions should be adjusted autonomously. Introducing continuous and hierarchical representations with multiple levels of spatial and temporal resolution would facilitate such adjustments, potentially through mechanisms such as modern Hopfield networks (Kurotov and Hopfield, 2020) or synfire-chain–based hippocampal sequence generation (Abeles, 1982; Diesmann et al., 1999; Shimizu and Toyoizumi, 2025; Toyoizumi, 2012), but this is beyond the focus of the current study”

      Also, we would like to emphasize that our model is not treated as a black box. To improve the understandability, we have majorly revised Figures 1 and 2 to include additional details illustrating the neural activity and the internal computational mechanisms.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments and suggestions for improvement:

      (1) Formal link to model based RL is unclear: a core feature of inference is the role of uncertainty in modulating computation and corresponding circuit dynamics, in particular defining expected and unexpected degree of errors; as far as I understand the degree of tolerable errors within a context is defined by the size of the basin of attraction of the context module (which is dependent on number of items and the structure of correlations across patterns) and in no obvious way affected by sensory uncertainty (unless the inputs from H serve that purpose in a more indirect way). Similarly, most experiments are deemed to have deterministic (unambiguous) maps between sensory inputs and world state (although how the agent's state relates to environmental state is more complex and not completely clear based on the existing text).

      Thank you for raising this important point. Our model bears conceptual similarities to model-based RL frameworks, for example, the optimal-inference formulation that underlies Monte Carlo Tree Search (Guez et al., 2013), as we now clarify in the revised manuscript. These similarities, however, are qualitative rather than quantitative. In particular, the error thresholds that separate expected from unexpected outcomes are manually specified in our model, but their exact values do not appreciably influence the simulation results.

      Concretely, the heuristic threshold for SPE-driven remapping (𝜃<sub>𝑟𝑒𝑚𝑎𝑝</sub>) is set to 5 bits, allowing for small miss-convergence during recall in the Amari–Hopfield model. For RPE-facilitated remapping, the threshold is set to 𝜃<sub>𝑁𝐺</sub> = 0.7, making the agent sufficiently sensitive to abrupt environmental changes and enabling it to explore some candidate contexts after RPE-facilitated remapping. This simple thresholding scheme is adequate for our largely deterministic simulation setting, where contextual switches are rare and occur abruptly in an otherwise stable and unambiguous environment.

      Importantly, our goal in this work was not to achieve Bayesian optimality. Mice and likely humans in certain settings often deviate from optimal inference. Instead, we focus on the qualitative remapping-related processes that support goal-directed planning following epistemic errors. We have clarified this scope in the revised manuscript.

      “In the framework of reinforcement learning, our model can be mapped onto a Bayesian-adaptive model-based architecture in which contextual state serves as the root of Monte Carlo tree search (Guez et al., 2013) in a simple, largely stable environment with noiseless and unambiguous sensory stimuli, and only occasional abrupt changes. In this setup, prediction errors arise from the agent’s lack of experience or due to abrupt environmental changes. … However, these conceptual similarities are qualitative rather than quantitative. The goal of this work is not to achieve Bayesian optimality, but rather to show qualitative remapping-related processes that support goal-directed planning following epistemic errors.”

      “Note that we set the remapping threshold 𝜃<sub>𝑟𝑒𝑚𝑎𝑝</sub> = 5 bits to allow for small miss-convergence during recall in the Amari–Hopfield model.”

      “Note that we set 𝜃<sub>𝑁𝐺</sub> as 0.7 to make the agents sufficiently sensitive to abrupt environmental changes and enable exploring some candidate contexts after RPE-facilitated remapping.”

      (2) Improvement: start describing each task specification in explicit model-based RL terms, then explain how the environmental specification translates into agent operations. Be explicit about what about the process is inferential, in particular, sources of uncertainty.

      Thank you for this important suggestion. Following your recommendation, we revised the manuscript to describe each task explicitly in model-based RL terms. For each task, we now identify the relevant sources of uncertainty, which arise either from imperfections in the agent’s internal model of the environment or from occasional abrupt switches in task rules. We also explain how the agent infers the hidden state from experience to construct an appropriate context representation, enabling the model to perform the task successfully.

      (3) A lot of seemingly arbitrary model choices need additional computational and biological justification; the description of the process is fundamentally an algorithmic one, which includes a lot of if-then type of operations: the dynamics of different elements of the circuit switch between "initialization to landmark/other", "error detected/not", different forms of plasticity on/off etc and it is not discussed in way how this kind of global coordination of different processes is supposed to be orchestrated biologically; e.g. as far as I understand the sequential structure in H activity is largely hardcoded rather than an emergent property of the learning+neural dynamics.

      Thank you for this important suggestion. We have made a concerted effort to clearly describe the biological context and the relevant literature motivating each of our algorithmic assumptions. Notably, as highlighted in Fig. 1F, we emphasize that the sequential structure in H activity emerges as a consequence of the agent’s exploration and learning. We also explain how the two remapping mechanisms concatenate sequence segments to support long-term planning and to predict both stimuli and rewards.

      About Fig. 1F

      “At the beginning of learning, hippocampal segments are not connected, and H yields only short sequences that generate immediate actions and short-term predictions. As learning continues, the three-factor Hebbian plasticity rule concatenates these segments, thereby creating longer sequences that reflect the task structure (Figure 1F).”

      About “initialization to landmark/other,”

      “While the history-based initialization was introduced to select contextual state based on the history input from H (episodic), the landmark-based initialization was introduced to terminate the episodes that would otherwise continue indefinitely. Biologically, the landmark-based initialization corresponds to the operation of anchoring a contextual state to salient environmental landmarks - such as an animal’s nest - that serve as clear reference points.”

      About “error detected/not,”

      “Based on the source of prediction errors, we consider two types of remapping: sensory prediction error (SPE)-driven remapping and reward prediction error (RPE)-facilitated remapping (Figure 1C). SPE-driven remapping is triggered when the mismatch between the predictive inputs from H to X and externally driven sensory inputs exceeds a threshold (see Materials and Methods), causing X to either transition to a different contextual state or form a new one (Figure 1D). RPE-facilitated remapping is more likely to be triggered when the agents execute an action plan following a hippocampal sequence marked by a no-good indicator. The no-good indicator indicates that the action plan, i.e. the hippocampal sequence, has recently been associated with negative reward prediction errors, possibly due to environmental changes (see Materials and Methods). It then facilitates the exploration of alternative hippocampal sequences (Figure 1E). ”

      About “different forms of plasticity on/off”

      “We used different learning rules for the intra-hippocampal synaptic weights depending on withinepisodic and between-episodic segments.”

      “Within-episodic connections, i.e., state-coding to transition-coding synapses, are constantly updated in a reward-independent manner … This modeling is inspired by behavioral time scale plasticity in the hippocampus (Bittner et al., 2017), in which synaptic potentiation occurs for events that are close in time regardless of reward, and such plasticity is believed to support the formation of place cells, etc..”

      “Between-episodic connections, i.e., transition-coding to state-coding synapses, are constantly updated in a reward-dependent manner … This is supported by the finding that dopaminergic neuromodulation gates LTP, enabling preferential consolidation of reward-associated experiences (Lisman and Grace, 2005; Takeuchi et al., 2016).”

      (4) Improvement: Justify individual design choices by biology whenever possible; in the absence of such justification, provide at least a computational rationale for each such model choice. Additional justification for the neural substrate of different prediction errors.

      Thank you for pointing this out. Following the advice, we have added the computational objectives behind each algorithmic component in addition to the biological motivations described above. In particular, we have completely updated Fig. 1 to help readers better understand the key remapping mechanisms in our algorithm: SPE-driven and RPE-facilitated remapping.

      About the Amari-Hopfield model

      “We employ the Amari–Hopfield model because it allows multiple contexts to be stably maintained and selected in response to stimuli and can be trained via Hebbian plasticity. We assume that similar computations are carried out in prefrontal and entorhinal cortical circuits in the brain.” “As one possible biological implementation, we consider that Context selection in X as the brainwide evoked potential during which bottom-up information may be integrated with top-down signals to select the current context (Mohanty et al., 2025). In this case, it takes several hundred milliseconds for the contextual states in X to settle (Massimini et al., 2005).”

      About the default matrix

      “This contextual state is set as a default context, ensuring that the X module assigns a unique contextual state to each environmental state. Biologically, one possible interpretation is that this default context corresponds to modality-specific innate representations in prefrontal regions (Manita et al., 2015).”

      About state-coding neurons and transition-coding neurons

      “The state-coding neurons receive input from X and represent the current contextual state, while the transition-coding neurons send output to X and predict the next contextual state after an action ... One possible biological grounding for this functional separation is that entorhinal cortex provide contextual inputs to CA3, and CA3 and CA1 generates predictions of next state through its recurrent architecture (Chen et al., 2024).”

      About the no-good indicator

      “No-good indicator is introduced to transiently suppress previously established sequences that have not been recently rewarded, without devaluing them. This no-good indicator facilitates RPEfacilitated remapping (see RPE-facilitated remapping section) that leads to exploration of different contextual states in X and sequences in H. The no-good indicator is inspired by recent findings in the ventral hippocampus, where dopamine D2-expressing neurons of the ventral subiculum selectively promote exploration under anxiogenic contexts (Godino et al., 2025).”

      (5) In particular, the temporal scale at which processes unfold with reference to behavioral time scale actions is fundamentally unclear: what determines the time scale of a sequential element? What stitches them together? What is the temporal relationship between H and X operations? At what time scale do actions happen in terms of those operating scales? How does this align with what is known about hippocampal dynamics during behavior?

      (6) Improvement: make the time scales of different aspects of the process explicit in the text, potentially with additional graphic support.

      Thank you for the questions and suggestions. In this work, we model the agent’s behavior in an abstract grid-world environment with discrete time steps, as is common in classical RL. At each time step, the agent observes a sensory stimulus, makes a plan, and executes an action based on it. The action induces a state transition in the environment. Accordingly, the model includes a single fundamental timescale: the environmental (behavioral) time step.

      The modeled brain dynamics in both X and H are similarly locked to this environmental clock. As clarified in Fig. 1F, each sequence segment corresponds to one behavioral time step. These segments are then chunked based on reward events, enabling longer-horizon planning and prediction.

      The agent’s cognitive operations at each behavioral time step are summarized in Fig. S1. Briefly, the agent infers the contextual state X from the current stimulus and its stimulus history, generates a sequential action plan H with predictions using chunked sequence segments, and then follows the plan when it is sufficiently promising. In addition, when sensory or reward prediction errors occur, the agent reorganizes the synaptic-weight parameters of the context selector and sequence composer. Once the agent becomes familiar with the environment, H typically generates an extended action sequence along with predictions of future stimuli and the resulting reward. The agent then executes this sequential plan, bypassing step-by-step context estimation by X, until a prediction error triggers remapping.

      The revised manuscript includes the following additions.

      “For simplicity, the environment is defined in discrete time, and agents move through environmental states characterized by distinct external stimuli. The model operation relies on the environmental (behavioral) time step. At each time step, the agents perform contextual state estimation by Context selector and activate a corresponding hippocampal neuron. Then, this hippocampal neuron initiates sequential activity based on hippocampal synaptic connectivity. Each hippocampal sequence represents a planned course of action and is used to predict a series of external stimuli. … The hippocampal sequence from which actions are generated is updated upon a reward. After the action execution, the agents repeat the process by selecting the current contextual state. As the agents become familiar with the environment, hippocampal sequences that enable future predictions to become longer, and contextual state estimation by Context selector becomes less frequent. The algorithmic flow chart of our model is described in Figure S1.”

      (7) As far as I understand it, the existence of splitter cells is directly inherited from the task specification, and to some extent the same can be said about the lap cells; please explain what can be understood from the model simulations that goes beyond what was put into the inputs/reward function for each experiment. Emphasize numerical results that are counterintuitive or where additional predictions about the dynamics come directly from simulating the model but would have been less obvious beforehand.

      The existence of splitter cells in our model is not inherited from the task specification. Instead, it emerges directly from the hippocampal module retaining sensory history (namely, whether the agent approached from the left or right arm), independent of reward structure or other task details. When sensory history is removed from the sequence composer (and, consequently, from the context selector), splitter-cell representations disappear.

      To develop lap-cell representations, immediate sensory history alone is not sufficient. The sequence composer must chunk episodic segments based on rewards to support sufficiently long action plans (i.e., history dependence) that span the multiple laps required by the task. The planning horizon - the length of action sequences - typically increases as animals learn a task. This progressive development of hippocampal sequences and their dependence on reward yields experimentally testable predictions. Notably, as we clarified in Fig. S2, the required sensory history length must also be learned adaptively: if it is too short, the agent cannot solve the task, whereas if it is too long, learning becomes unnecessarily slow.

      In the revised manuscript, we explicitly described the emergent process of splitter cells and lap cells as follows.

      About splitter cells

      “A second contextual state at S2, X2β, was generated through SPE-driven remapping at the second visit of S2 (second trial) due to history mismatch… In our model, the transition-coding neurons exhibit right/left turn-specific firing at S2 after learning is complete (Figure 2E, I), replicating the emergence of splitter cells.”

      About lap cells

      “the task environment changes again and the agents are rewarded for two laps, …. Either the shortest transition, ..., or the one-lap transition, …, is no longer rewarded, which triggers another RPE-facilitated remapping and exploration. During exploration, a history mismatch occurs …, and the contextual states for the second lap … are generated. Finally, the rewarded transition of contextual states and corresponding sequence… is reinforced (Figure 3B).”

      “This task can also be solved by simply preparing temporal contexts with three steps of sensory history (n=3), which is the minimal number to solve this task. (see Materials and Methods for Model-free learning). However, it takes much longer to find the correct transition for solving the 1-lap task than our model because it involves an excessive number of states (Figure S2).”

      “As the agents become familiar with the environment, hippocampal sequences that enable future predictions to become longer, and contextual state estimation by Context selector becomes less frequent.”

      (8) The partitioning of H subpopulation into current input vs predictive subpopulations seems to fundamentally deviate from known CA1 properties like theta phase processing, where the same neurons encode information about recent past, present, and future at different moments in time within a theta cycle. The existence of such populations (especially since they come with distinct plasticity mechanisms and projection patterns) seems like a strong avenue for validating the model experimentally.

      (9) Improvement: biologically justify the two subpopulations, discuss neural signatures of this distinction that could be used to identify such neurons in experiments

      We thank the reviewer for bridging our model with biological circuits.

      First, we would like to clarify that we do not claim that our H module corresponds to CA1 specifically.

      Rather, we assume that within the broader hippocampal loop (EC–DG–CA3–CA1–EC), subpopulations emerge that preferentially encode the current contextual states and the transitions to the next contextual states. This assumption reflects our hypothesis that the hippocampus implements a mechanism for predicting the next context given the current one. Importantly, this functional separation does not contradict known theta-phase coding in which the same neurons can represent past, present, and future information at different phases of the theta cycle.

      As a possible biological grounding, we particularly emphasize the CA3–CA1 projection. Recent studies have shown that CA1 representations exhibit a temporal delay relative to CA3 activity (Chen et al., 2024), suggesting a circuit-level mechanism by which predictions of upcoming contextual states may be computed based on the current context. In this framework, state-coding and transition-coding functions could be assigned to CA3 and CA1, or dynamically expressed through their interactions. Based on our model, we make testable experimental predictions. Specifically, we predict that neural representations in CA3 and CA1 should precede contextual switching in tasks such as alternation or multiple-lap tasks, and that perturbing CA3–CA1 computations would impair task performance.

      Please note, however, that our model does not characterize the sequence composer’s activity at such fine-grained neuronal timescales. Instead, we model the computation it performs in abstract time steps corresponding to the grid states (e.g., while the animal is at a corner of the maze).

      We have added these points to the Discussion to clarify the biological interpretation and to suggest potential experimental validations of the proposed subpopulation distinction as follows.

      “Our model posits that the Sequence composer corresponds to computations within the hippocampus. As a biologically plausible projection, we consider the CA3–CA1 circuit, where contextual inputs from regions such as the PFC and EC provide the current contextual state to CA3, enabling the recurrent CA3–CA1 architecture to generate predictions of the next contextual state. Consistent with this idea, the temporal lag in CA3→CA1 transmission suggests a functional gradient in which CA3 represents present-oriented information while CA1 carries more futureoriented predictions (Chen et al., 2024), and neurons in both CA3 and CA1 exhibit action-driven remapping and encode action-planning signals (Green et al., 2022). Our framework, therefore, predicts that changes in CA3→CA1 population activity precede behavioral switching in contextdependent alternation in Figure 2 or multi-lap tasks in Figure 3, and perturbation of this input will degrade the behavioral performance.”

      “While we used an abstract, grid-like state space with discrete time, an important direction for future work is to model its activity at finer-grained neural timescales, such as theta cycles (Foster and Wilson, 2007; Wikenheiser and Redish, 2015).”

      (10) The flexibility of the new solution in terms of learning contexts with variable temporal horizons seems an important feature of the model, but one poorly demonstrated in the existing numerical experiments. Could more concrete model predictions be generated by designing an experiment targeted specifically for such scenarios?

      Thank you for raising this point.

      As we showed in Figure S2, in environments with variable temporal horizons, our model performs better than model-free learning (Q-learning) that incorporates temporal context.

      To further demonstrate this point, we added a new task in Figures 3G and H, in which the 1-lap task and the 2+ lap task are alternated. Our model exhibits rapid switching between these tasks, regardless of differences in sequence length or temporal horizon. We added the following text.

      “To demonstrate the advantage of our model in a rapidly switching task that requires different history lengths, we show that an agent trained on both the 1-lap and 2-lap tasks can flexibly alternate between them in a reward-dependent manner (Figure 3G), selectively engaging hippocampal sequences of different lengths according to the current task context (Figure 3H). Together, these results illustrate how hippocampal lap-like representations emerge through learning and enable flexible context switching across tasks with distinct temporal demands.”

      In such a scenario, a subjective representation of laps in the hippocampus is the key to solving the task. As we responded to points (8) and (9), neural representations, especially in CA1, are expected to bifurcate between the 1-lap and 2-lap conditions, and this bifurcation would precede and critically govern the animal’s behavior.

      (11) I found figures confusing/uninformative, specifically in making it explicit what is external task structure and what is the agent's internal representation of it; as a result it is not clear what of the results is trivially inherited from the task specification and what is an emergent property of the model; e.g. Figure 2A described external transition specification according to world model but it is unclear to me if Figure 2B shows the ideal agent state representation across context or a graphical summary of what the agent actually learned from the sensory experience described in A; from the text. Figure 2F is supposed to describe a property of the emergent representation, but what is shown is another cartoon... etc.

      We appreciate the reviewer’s insightful comments regarding the clarity of our figures.

      To clarify the neural representation of the agent and how it links to the action, we have revised Figure 2 and the descriptions in the main text.

      First, Figure 2A schematically depicts the external stimulus as being determined solely by the task. In this task, animals must keep track of the immediately preceding state (S1 or S3) to correctly choose between S4 and S5 upon reaching S2. Without such a memory of prior states, an agent would have no basis for distinguishing which action is appropriate, and therefore cannot selectively move to S4 and S5. Therefore, any reinforcement learning model that does not incorporate at least a onestep state history cannot solve the task.

      To solve the task, S2 must be represented as two distinct contextual states depending on the previous state. Figure 2B therefore illustrates an example of internal representation that separates S2 into X2α and X2β: transitions from S1 to S2 are internally represented as X1 → X2α, whereas transitions from S3 to S2 are represented as X3 → X2β. Although the sensory inputs provided to the model correspond only to the task-defined states in Figure 2A, the combination of the sensory input with contextual states in Context selector successfully achieves this contextual representation of X2α and X2β (see Figure 2C, D). Also, the hippocampal neurons in Sequence composer indicate the next contextual states given the current contextual states, i.e., X2α→X4 and X2β→X5 (see Figure 2E). Thus, combining Context selector and Sequence composer successfully achieves the task requirement indicated in Figure 2B.

      Regarding the reviewer’s concern that Figure 2F (now Figure 2I) appeared to be another cartoon, we have revised the panel to clearly display our result. These results demonstrate that some hippocampal neurons in our model encode the transition from X2α→X4 and X2β→X5. The updated figure clarifies that our hippocampal neurons functionally work similarly to the splitter cells in Wood et al., 2000.

      (12) Improvement: use visuals and captions. Make it clear what is a cartoon, what is a model specification, and what is an actual result. Replace/complement algorithmic cartoons in Figure 1 with a description of the actual result.

      Thank you for raising this point.

      As we explained in the previous point (11), we added Figure 2D and Figure 2E for displaying the actual neural activity, and the corresponding annotations in the manuscript, e.g, X2α. Also, we revised the cartoons of our model description in Figure 1 to better describe our model structure.

      (13) Map between model and experimental results is poorly justified: in particular the nature of sensory inputs is not clearly specified, and how the experimental manipulations (e.g. MEC input disruption) map into model manipulations is not intuitive and no justification is provided for the choices beyond that the model ends up matching the experiment by some metric. Also, not clear why a tradeoff of neural resources as implemented in the model makes sense for the clinical case and how this hypothesis deviates from alternative Bayesian accounts invoking imperfections in inference (e.g. relative strength of priors vs likelihood as reported by e.g. P.Series's group, or issues with hierarchical inference more generally along R.Jardri's work).

      Thank you for raising this important point. We have revised the manuscript to clarify the mapping between model components, sensory inputs, and the experimental manipulations, and to further justify the clinical interpretation.

      About sensory inputs

      First, each environmental state in our model is represented as a binary (0/1) pattern. We have added Figure 2D to explicitly illustrate these sensory stimuli and how they are provided to the context-selection module.

      About mapping between model components and brain circuits

      Functionally, we speculate that Context selector (X) corresponds to computations carried out in the prefrontal cortex (PFC) and entorhinal cortex (EC), and Sequence composer (H) corresponds to the hippocampus. Inputs from the PFC are thought to reach the hippocampus via the EC. Therefore, suppression of MEC→hippocampus inputs in Sun et al. (2020) naturally maps onto blocking a subset of the inputs from X to H in our model.

      We clarified this correspondence in the revised manuscript and now explicitly justify why this manipulation matches the biological experiment.

      Relation to Bayesian theories

      We agree that Bayesian accounts have provided influential explanations of psychiatric symptoms by invoking imperfections in inference, such as imbalances between priors and likelihoods (e.g., work by P. Series and colleagues) or disruptions in hierarchical inference (e.g., work by Jardri and others). Our model complements these frameworks by explicitly incorporating sequential structure and context remapping. Rather than treating priors as static or fixed-weight quantities, our model allows contextual representations to be dynamically reorganized based on prediction errors over time. In the SZ-like condition, we assume that an excessively expanded context domain increases the influence of internally generated contextual predictions, causing them to override sensory inputs and resulting in maladaptive behavior with hallucination-like percepts. Importantly, this effect reflects not only stronger priors but also excessive generation and competition of contextual states, leading to unstable and non-reproducible remapping. In contrast, in the ASD-like condition, sensory-weighted context representations limit the ability to flexibly incorporate newly introduced contexts, causing the model to perseverate on an initially learned context and thereby reproduce inflexible behavior. We added a schematic illustration in Figure 5B and expanded the Discussion to clarify this point.

      “When the stimulus domain is relatively underrepresented, the reconstruction of contextual state in the Amari-Hopfield network tends to infer contextual states based on the context domain rather than the stimulus domain. Consequently, it converges to an incorrect attractor that is not assigned to the current environmental state, thereby increasing perceptual error for external stimuli (hallucination-like effects). Moreover, SPE-driven remapping and the corresponding synaptic plasticity occur more frequently. In contrast, when the stimulus domain is overrepresented, the Amari-Hopfield network rarely assigns multiple contextual states to a given environmental state, leading to an overuse of default contextual states (see Figure 5B and Materials and Methods). ”

      “Our model also provides an algorithmic-level account of psychiatric symptoms by changing the relative weighting of sensory-encoding versus context-coding neurons. This implementation is analogous to Bayesian theories linking priors to psychiatric symptoms. In SZ, hallucinations and delusions have been modeled as arising from overly strong top-down priors (Powers et al., 2016) or circular inference, which leads to erroneous belief formation (Jardri et al., 2017; Jardri and Denève, 2013). In our model, we used an underrepresented stimulus domain to increase the relative influence of internally generated context representation in context selection. Crucially, this implementation does not simply strengthen priors but induces excessive generation and competition of contextual states, leading to frequent yet non-reproducible remapping of hippocampal contextual activity and a failure of learning to converge despite repeated experience. In ASD, it has been argued that abnormally high sensory precision reduces the updating of expectations (Karvelis et al., 2018) or leads to sensory-dominant perception, which has been interpreted as weak priors (Angeletos, Chrysaitis, and Seriès, 2023; Lawson et al., 2014; Pellicano and Burr, 2012). In our framework, we used an overrepresented stimulus domain to increase the relative influence of external stimulus representations in context selection. Importantly, our model captures not only sensory-dominant processing emphasized in previous studies, but also a distinctive impairment in flexibly utilizing newly introduced contexts, reflecting a failure of context reconstruction and resulting in persistent inflexible behavior. Thus, our conjunctive modeling of sensory and context processing complements Bayesian accounts of psychiatric symptoms and provides a mechanistic explanation for the role of sensory processing in maladaptive, inflexible behavior. ”

      (14) Improvement: justify choices, explain in more detail relationships with computational psychiatry literature.

      Thank you for pointing it out. As we explained in the previous point (13), we justified our model choice in the revised version.

      Minor comments:

      (1) Typos: "algorism" (pg2), duplicate Sun reference.

      Thank you for finding the typo and the missing reference. We revised accordingly.

      (2) Unclear statements from Methods:

      • "preparing temporal context with three histories" not sure what is meant by this.

      • "... state estimation by the context-selection module becomes less frequent." (Methods/Overview): what is the mechanism?

      • "default pattern" and failure to converge: What is the biological basis for them?

      • Why is the converter function used on some occasions but not others?

      • "new contextual state is prepared": What does that mean?

      We thank the reviewer for pointing out several unclear statements in the Methods section.

      • “preparing temporal context with three histories”

      We now explicitly state the formal description of three histories in the Methods as follows.

      “the state is defined by the recent n-step transition history of task state (i.e. 𝑠<sub>𝑘</sub><sup>(𝑛)</sup> =(𝑆<sub>𝑘</sub>,𝑆<sub>𝑘−1</sub>, ⋯,𝑆<sub>𝑘−𝑛</sub>)<sup>𝑇</sup> , where 𝑠<sub>𝑘</sub><sup>(𝑛)</sup> is the temporal context state, and 𝑆<sub>𝑘</sub> is the environmental state at time 𝑘). We changed n from 0 to 3.”

      • “state estimation by the context-selection module becomes less frequent”

      In our model, context selection is performed every time the agents execute an action sequence generated by Sequence composer. As learning progresses, the Sequence composer comes to predict distant future states and executes coherent action sequences based on these predictions. When no unexpected errors are encountered during execution, context estimation is suppressed, resulting in less frequent context selection. We modified the manuscript as follows.

      “After the action execution, the agents repeat the process by selecting the current contextual state. As the agents become familiar with the environment, hippocampal sequences that enable future predictions to become longer, and contextual state estimation by Context selector becomes less frequent. The algorithmic flow chart of our model is described in Figure S1.”

      • “default pattern”

      In biological systems, it is reported that the frontal cortex shows sensory modality-specific representation without prior learning (Manita et al., 2015). We refer to these innate modalityspecific sensory representations as the default pattern. In the early stages of learning, we assume that no stable contextual representations have yet been formed in the brain, and therefore, a default pattern uniquely driven by external stimuli is used as the context representation. Even during intermediate stages of learning, the context selector may fail to converge to a specific state. In such context-uncertain environments, it has been reported that agents often rely on previously learned or habitual action choices (psychological inertia), which is evident in ASD patients.

      “This contextual state is set as a default context, ensuring that the X module assigns a unique contextual state to each environmental state. Biologically, one possible interpretation is that this default context corresponds to modality-specific innate representations in prefrontal regions (Manita et al., 2015).”

      “This default implementation is analogous to psychological inertia, particularly under uncertainty (Ip and Nei, 2025; Sautua, 2017), which has been reported to be more pronounced in ASD patients (Joyce et al., 2017).”

      • Why is the converter function used only in some cases?

      The converter function A(stim → context) was introduced to compose the default pattern (one-toone mappings between stimuli and contexts) as we described above. In other cases, the Hopfield dynamics were used to select contextual states; therefore, we did not use the converter function.

      • “new contextual state is prepared”

      Thank you for pointing this out.

      The term “prepared” was inaccurate. We revised it to “generated”.

      In the case of remapping, we assumed that X generates a new random neural activity pattern in its contextual domain and stores it as a new contextual state. We described this process as “a new contextual state is generated”.

      (3) Please explain the mapping between hippocampal sequences to actions in more detail for each task.

      • Why 9 attempts before rejection?

      • Why all the variations on Hebb?

      We appreciate the reviewer’s request for clarification. Below, we provide additional explanations point by point.

      Mapping between hippocampal sequences and actions

      In this research, we defined action as the transition from one environmental state to another environmental state. The hippocampal sequences predict the transition of environmental states; therefore, they correspond to a set of action plans from the current environmental state. In the revised manuscript, we added the formal definition of environmental states and actions in each task.

      • Why 9 attempts before rejection?

      These repetitions ensure adequate exploration of the contextual states in X and the episodic sequence in H before committing to an action. Increasing the number of attempts excessively causes the reward value function to be dominated by a single highest-scoring sequence, thereby causing excessive exploitation and narrowing behavioral variability. While the exact number 9 is not critical—the qualitative results are robust to moderate changes—we selected this value because it provides a good balance between exploration and exploitation and produces the clearest visualizations in our figures. We have clarified this in Method below.

      “We set the number of attempts before rejection to nine, providing a balance between exploration and exploitation and serving as a good compromise for visualization.”

      • Why all the variations on Hebbian learning?

      We consider three loci of plasticity in our model: the X module, the H module, and their reciprocal connections. Within the H module, synaptic connections that link episodic segments—specifically from transition-coding neurons to state-coding neurons—are assumed to follow a reward prediction error–dependent, supervised form of Hebbian learning. This choice reflects the need to selectively reinforce transitions that lead to successful outcomes. In contrast, all other synaptic updates in the model are assumed to follow reward-independent, activity-based Hebbian learning. These learning rules support the unsupervised formation and stabilization of contextual representations and action execution.

      In addition to the basic Hebbian rule, we introduced biologically motivated constraints, such as upper and lower bounds on synaptic weights and heterosynaptic depression, which weakens nonpotentiated synapses. Importantly, these mechanisms do not alter the fundamental nature of Hebbian learning but increase the stability of our model.

      (4) For Q learning: please clarify "the state is defined by the recent transition history of task state.

      As you suggested, we clarified the statement by adding the following sentences in Method. “To highlight the advantage of our model, we compared it to the Q-learning with temporal contexts, namely, the state is defined by the recent n-step transition history of task states (i.e. 𝑠<sub>𝑘</sub><sup>(𝑛)</sup> =(𝑆<sub>𝑘</sub>,𝑆<sub>𝑘−1</sub>, ⋯,𝑆<sub>𝑘−𝑛</sub>)<sup>𝑇</sup> , where 𝑠<sub>𝑘</sub><sup>(𝑛)</sup> is the temporal context state, and 𝑆<sub>𝑘</sub> is the environmental state at time 𝑘.”

      (5) What is the purpose and biological justification for the NG addition to RW?

      Thank you for raising this point. The prediction-error–based update of each sequence’s value function 𝑅 alone cannot distinguish between two fundamentally different cases:

      (a) the value of a sequence has genuinely decreased, or

      (b) the sequence remains useful, but it is just not appropriate in the current context. This distinction is essential for modeling context-dependent switching of behavioral strategies. To address this, we introduced the No-good (NG) indicator. NG allows the agent to temporarily mark certain sequences as unsuitable without altering their long-term value, thereby facilitating short-term exploration of alternative sequences. In other words, NG provides a mechanism for transiently suppressing a previously valid sequence in case of contextual changes, while preserving the underlying value learned in past experiences.

      This mechanism is consistent with several lines of biological evidence. First, extinction learning after fear conditioning does not erase the original fear memory but instead forms a new memory trace, known to be stored in the medial PFC (Milad & Quirk, 2002). This suggests that animals may switch to a different contextual representation rather than simply downgrading the value of the conditioned stimulus, supporting the idea of temporarily suppressing a sequence without modifying its intrinsic value.

      Second, recent studies in the ventral hippocampus show that dopamine D2–expressing neurons in the ventral subiculum promote exploration specifically under anxiogenic contexts (Godino et al., 2025). This finding is consistent with the short-term exploratory behavior enabled by our NG mechanism. Thus, we added the following statement to the manuscript:

      “No-good indicator is introduced to transiently suppress previously established sequences that have not been recently rewarded, without devaluing them. This no-good indicator facilitates RPEfacilitated remapping … that leads to exploration of different contextual states in X and sequences in H. The no-good indicator is inspired by recent findings in the ventral hippocampus, where dopamine D2-expressing neurons of the ventral subiculum selectively promote exploration under anxiogenic contexts (Godino et al., 2025).”

      Together, these biological findings provide a conceptual basis for modeling NG as a contextsensitive, transient modulation that encourages exploration without overwriting previously learned sequence values.

      (6) Missing details about H network size

      Thank you for pointing it out.

      We used 300 neurons for H. We indicated it as below.

      “We model the hippocampus with an N = 300 binary recurrent neural network.”

      (7) S1 figure: learning is slower even in the early, easy phases of learning when the temporal dependence should not matter; how are learning rates calibrated across models?

      Thank you for raising this point. In our model, the learning rate was fixed at 0.15, whereas the control model (now shown in Figure S2) uses a higher learning rate of 0.4, independent of temporal context.

      Regarding why learning appears slower even in the early, easy phases, when the number of temporal contexts increases, the size of the state space expands. This broadening of the state space makes it more time-consuming to identify and reinforce the appropriate state transitions. This is especially evident in easy phases because the temporal context prepared in the model is excessive to the number of temporal contexts that the task requires.

      Importantly, unlike the control model, which postulated a fixed number of temporal contexts, our model gradually increases the number of temporal contexts depending on prediction error. This adaptive mechanism allows the model to achieve fast learning during early, easy phases while still enabling more complex learning in later phases.

      Reviewer #2 (Recommendations for the authors):

      (1) "Hippocampal neurons show sequential activity...." The authors should include more classical references for hippocampal sequential activity at this point, too.

      Thank you for your suggestion. We added the citations below

      Skaggs and McNaughton, 1996; Wilson and McNaughton, 1993

      (2) "...called remapping" also here, please reference classic work (Bostock, Muller, ...)

      As suggested, we added the citations below

      Bostock et al., 1991; Muller and Kubie, 1987

      (3) "Several theoretical models..." What I miss here are models that explain remapping by inputs from the grid cell population, and/or the LEC (see Latuske 2017 for review), still widely considered the standard mechanism. Also, the models by Stachenfeld et al. 2017, Mattar and Daw 2019, and Leibold 2020 specifically address context dependence. Accordingly, "A comprehensive model that can explain the formation of context-dependent hippocampal sequences of various lengths through remapping, while relying on a biologically plausible learning process,..." somewhat overstates the novelty of the current paper.

      Thank you for pointing this out and for suggesting relevant citations. We agree with the reviewer that inputs from MEC and LEC to the hippocampus constitute a fundamental mechanism underlying remapping. However, in our view, a key open question in the remapping field is how MEC and LEC estimate the current context and convey this information to the hippocampus in a manner that supports goal-directed behavior. While previous studies have addressed remapping at the representational level and the hippocampal sequence at planning, the overall relationship between remapping, reinforcement learning, and planning has not yet been explained within a single unified model. In this work, we propose a simple and biologically plausible model that integrates an Amari–Hopfield network for context selection with hippocampal sequences, providing an account of coordination under goal-directed behavior. To more accurately position the novelty of our contribution, we have revised the manuscript as follows.

      “While previous works have explored hippocampal sequential activity for planning (Jensen et al., 2024; Mattar and Daw, 2018; Pettersen et al., 2024; Stachenfeld et al., 2017) and hippocampal remapping for contextual inference (Low et al., 2023) separately, they have yet to elucidate how these two aspects jointly enable flexible behavior. A simple biologically plausible model-based reinforcement learning model that uses the Amari-Hopfield model for context selection and hippocampal sequences of various lengths as a state-transition model for long-horizon planning, relying on remapping driven by prediction errors to form state representation, would thus provide valuable insights into the neural mechanisms underpinning context-dependent flexible behavior.”

      (4) Please properly introduce nomenclature "C2α, C2β, S2,...." S is sometimes used for stimulus, sometimes for location (state?), or even action?

      Thank you for pointing it out. We acknowledge that the annotation of Cn (e.g., C1, C2…) was not straightforward. Therefore, we changed the annotation to Xn (e.g., X1, X2, …) in order to indicate the contextual state of X.

      We define Sn (e.g., S1, S2…) as the external input given by the environment and represented in stim. domain of X, while Xn (e.g., X1, X2…) is the subjective contextual state generated by the agent and represented in the context domain of X. As a reference, we added the neural representation of X in Figure 2D and added the following text below.

      “The neural activity of X at each contextual state is shown in Figure 2D, where the environmental states (e.g., S1, S2…) are represented in the stimulus domain, and the contextual states (e.g., X1, X2α…) are represented in the context domain.”

      (5) "Our model replicates this result by blocking the synaptic transmission from most of the neurons in the context domain of X to H (Figure 3F).". Does this mean the X module is hypothesized to be in the EC?

      Thank you for the thoughtful question. In our model, the X module is intended as a functional abstraction that combines the roles of several brain regions known to contribute to contextual representation, including the prefrontal cortex (PFC) and the entorhinal cortex (EC). Although X is not necessarily meant to correspond to a single anatomical region, we consider it likely that the contextual information represented in X would reach the hippocampus (H) (CA3 and CA1) primarily through the EC. Thus, the experimental manipulation shown in Figure 3F—suppression of medial EC axon at the hippocampus—is interpreted in our framework as weakening the input from X to H.

      We added the following texts in the Discussion section.

      “We speculate that Context selector is implemented across multiple brain regions with varying degrees of resolution, including a part of the entorhinal cortex and prefrontal cortex.”

      “Our model posits that the Sequence Composer corresponds to computations within the hippocampus. As a biologically plausible projection, we consider the CA3–CA1 circuit, where contextual inputs from regions such as the PFC and EC provide the current contextual state to CA3, enabling the recurrent CA3–CA1 architecture to generate predictions of the next contextual state.”

      (6) Discussion "model-based reinforcement learning": Please detail where the model is here. In my understanding, the naive agent does not have a model (this would be model-free then?).

      Thank you for asking.

      Unlike model-free reinforcement learning, where each action is evaluated step by step, we use hippocampal sequences for multiple-step prediction and action planning. This is the “model” in our research. As you mentioned, initially, animals do not have a “model”, but Sequence composer gradually chunks the episodic segments to compose a longer sequence.

      (7) "...can change the attractor dynamics in the hippocampus (34)": What is (34)? I also would doubt that one can make such absolute statements about the human hippocampus.

      Thank you for pointing out the missing citation. We corrected it accordingly.

      Rolls E. 2021. Attractor cortical neurodynamics, schizophrenia, and depression. Transl Psychiatry 11. doi:10.1038/s41398-021-01333-7

      (8) "To the best of our knowledge, this is the first model that describes the formation of contextdependent hippocampal activity through remapping and its contribution to flexible behavior." See "Several theoretical models...".

      Thank you for pointing this out. We admit that it was an overstatement. We corrected it accordingly.

      “To the best of our knowledge, this is the first model that uses associative memory for describing the formation and switching of context-dependent hippocampal activity through remapping and its contribution to flexible behavior.”

      (9) "We speculate that the context-selection module is implemented across multiple brain regions..." How would an attractor network be implemented over "multiple brain regions"?

      We thank the reviewer for raising this important conceptual question. Context information in realistic environments is likely to have a hierarchical structure. We therefore speculate that multiple brain regions may jointly support context selection by maintaining different levels or components of this hierarchy. In particular, the prefrontal cortex (PFC), medial entorhinal cortex (MEC), and lateral entorhinal cortex (LEC) have all been implicated in representing contextual or task-state information at different levels of abstraction. These regions are known to exhibit attractor-like dynamics and to provide inputs to the hippocampus. Thus, an attractor network spanning multiple regions could arise, with different areas stabilizing distinct components of the contextual representation, depending on the timescale of memory, task demands, or sensory features.

      We used the Amari–Hopfield network as a functional abstraction to explain such multi-regional interactions underlying context representation, rather than to provide a one-to-one mapping onto a specific brain region. How region-specific attractor dynamics jointly contribute to maintaining global contextual information and enabling context switches in response to prediction errors remains an important direction for future research.

      Methods:

      (10) "... agents move through discrete environmental states characterized by distinct external stimuli.": How is this exactly implemented? What is the neural representation of these states, xi? What is the difference to a "landmark"?

      We appreciate the reviewer’s thoughtful question regarding the implementation and neural representation of environmental states. In our model, each environmental state is represented as a binary stimulus pattern provided to the stimulus-domain neurons in Context Selector. Specifically, for each state, we constructed a pattern in which half of the neurons are set to 1 and the other half to 0. We chose this design because, in the Amari–Hopfield model, memory performance is maximized when stored patterns contain approximately equal proportions of 0 and 1. For clarity, we have added an illustration of these stimulus patterns in the revised Figure 2D.

      Regarding the reviewer’s question about landmarks: in our framework, a landmark denotes an environmental state for which the contextual state is uniquely determined, regardless of the preceding transition history. For simplicity in this study, we designated the initial environmental state in each task (S0 or S1) as the landmark. Importantly, in our implementation, landmarks do not differ from other states in terms of their stimulus pattern; their special role arises solely from the task structure, not from additional sensory properties.

      In real environments, what constitutes a landmark likely varies depending on stimulus saliency and the agent’s prior experience. Determining how landmarks should be optimally defined or learned is an interesting direction for future work.

      (11) How are different contexts represented for the same stimulus xi^stim?

      We added an example of neural activity in X in Figure 2D, illustrating the distinction between the stimulus domain and the context domain. While the activity in the stimulus domain depends on the external stimulus, the contextual domain consists of uncorrelated random neural states. We exploit a key property of the Amari–Hopfield network to associate each contextual state with a given external stimulus.

      (12) "...and its stimulus domain ??stim becomes identical to ??xistim ." Does that mean every stimulus is an attractor in the context net? How can that work with only 1200 neurons? Is that realistic for real-life environments? Neuron numbers would need to increase dramatically.

      As you mentioned, we assigned each stimulus to a corresponding attractor in the Context selector (X). An Amari–Hopfield network with 1,200 neurons can store approximately 10–20 attractors, which is sufficient to solve the tasks considered in this study. We adopted the Amari–Hopfield network for its simplicity and conceptual clarity; however, in biological neural systems, it is not necessary to construct such rigid attractors for every stimulus. For example, modality-specific neural projections exist in the brain and are sometimes sufficient to form loose attractor states across different stimuli. In addition, the prefrontal cortex is known to support working memory, which may also serve as a form of contextual representation incorporating recent history. Thus, we propose that multiple brain regions cooperate to implement the Context selector.

      (13) How are WHX and WHH initialized?

      Thank you for pointing this out.

      We set the initial condition of all W to 0. We added the following text in the Method section.

      “Note that the initial synaptic weights of 𝑊<sup>𝐻𝑋</sup> and 𝑊<sup>𝑋𝐻</sup> are all 0.”

      (14) It is unclear why the hippocampus separates into state and transition neurons. Why cannot one pattern serve both purposes?

      Thank you for asking about this important point.

      The reason why we prepare two kinds of hippocampal neurons is that state-coding neurons represent the current contextual state, and transition-coding neurons predict the following contextual state under the current contextual state. These two separations enable it to predict multiple scenarios under the current contextual state and to choose a sequence most suitable in the environment.

      We rewrote the following sentences in the manuscript.

      In result section,

      “In Sequence composer, there exist two types of neurons: state-coding neurons, which represent each contextual state, and transition-coding neurons, which encode transitions to successive contextual states given the contextual state indicated by the state-coding neurons”

      In Method section,

      “The state-coding neurons receive input from 𝑋 and represent the current contextual state, while the transition-coding neurons send output to 𝑋 and predict the next contextual state after an action i.e., T(𝑋<sub>𝑘+1</sub>|𝑋<sub>𝑘</sub>,𝑎<sub>𝑘,𝑘+1</sub>).”

      (15) "the agents execute actions according to this sequence." How are the actions defined? Are they part of the state?

      We thank the reviewer for raising this important point. In our model, an action is defined as the transition from a given environmental state to the next environmental state. To avoid ambiguity, we have added a formal mathematical definition of actions for each task in the revised manuscript. In our framework, the transition-coding neurons in Sequence Composer (H) predict the upcoming environmental state, and thus the hippocampal sequence intrinsically contains the representation of an action. Consequently, the sequence generated before actions functions as the agent’s internal action planning process.

      (16) "Because the input source for the state-coding neuron and the transition coding neuron differ (the former is selected from ??, while the latter is selected from ??), the same hippocampal neuron could occasionally be used for both state-coding and transition-coding across different contextual states. This is evident when an excessive number of contextual states are prepared, especially in the SZ condition. This phenomenon degrades state estimation at X (eq.3)." I have no idea what you want to convey here, .... and how is state estimation related to Equation 3?

      We appreciate the reviewer’s feedback and agree that our original explanation was unclear. Our intention was to clarify why context estimation deteriorates specifically in the SZ condition.

      In our model, state-coding neurons in the hippocampus represent the current contextual state, and transition-coding neurons predict the next contextual state given the current contextual state. Under normal conditions, these two sets of neurons remain sufficiently distinct, allowing accurate prediction of the upcoming contextual state, which is conveyed to X. However, when an excessively large number of contextual states are stored in the SZ condition, representations in the hippocampus begin to overlap. As a result, some hippocampal neurons are inadvertently recruited for both state-coding and transition-coding across different contextual states. This overlap disrupts the H’s ability to accurately predict the next contextual state.

      This degraded prediction directly affects the state-estimation process in X (Eq.3), because Eq.3 relies on receiving an accurate predicted next state from H. When this signal becomes ambiguous, X may converge to an incorrect contextual state, potentially mimicking hallucination-like inference errors.

      We have rewritten the relevant passage in the manuscript to clarify this mechanism as follows.

      “When the number of contextual states increases - particularly in the SZ condition - representational overlap arises between hippocampal state-coding and transition-coding neurons.

      This overlap makes the prediction of the next contextual state by the transition-coding neurons unreliable. The degraded prediction from H, in turn, corrupts the initial condition for context selection in X (Eq. 3), leading to hallucination-like behavior.”

      (17) The figures hardly show simulated activity. Consider displaying more neuronal simulations to help the reader grasp the workings of the model.

      Thank you for your suggestion. We indicated the neural activity of X and H in Figures 2D and 2E, respectively, to show the overview of our model.

      (18) Figure 5: What is the "Hopfield count"?

      Thank you for pointing this out. The definition of the Hopfield count was ambiguous. We added an explicit explanation of “context selection” and its possible outcomes (correct association, hallucination-like, and default contexts) in Fig. S1. To clarify our claim, we replaced the countbased measure with the probability of selecting hallucination-like and default contexts during context selection. Accordingly, we removed the term “Hopfield count” and revised the caption of Figure 5 as follows.

      “The result of context selection (see Figure S1). The probability of wrong stimulus reconstruction (hallucination-like effects) is plotted in red, and the probability of default context usage due to failures in context reconstruction (see Materials and Methods) is plotted in blue.”

      (19) Figure 6: Consider moving this upfront.

      Thank you for the suggestion. We moved Fig.6 to Fig.S1 and introduced it earlier in the manuscript.

      Reviewer #3 (Recommendations for the authors):

      I was a bit confused about the implementation, which may not be autonomous, meaning there are numerous stages that require intervention from outside the X-H network (see Figure 6). It seems that the X network might wait to converge before providing input to H, rather than having the entire network evolve in parallel. There are also aspects to the implementation that seem rather ad hocsuch as the "no-good indicator".

      Thank you for the thoughtful comments. We would like to clarify several points regarding the implementation and its biological motivation.

      First, regarding the concern that the X–H interaction may not be fully autonomous:

      In our framework, the convergence time of the X module under external sensory input is assumed to be on the order of several hundred milliseconds, consistent with the timescale of stimulus-evoked cortical population dynamics observed in biological systems. Especially when hippocampal input is present, X does not need to explore the full attractor landscape. Instead, it quickly settles into an attractor located near the hippocampal cue, which substantially shortens the convergence time.

      Second, although our current implementation proceeds in an algorithmically sequential manner for clarity, we do not intend to imply that the brain performs these steps sequentially. Biologically, the states of X and H are expected to co-evolve and mutually constrain each other through recurrent interactions. The sequential algorithm in the model is therefore a practical choice for implementation, not a theoretical claim about strict temporal ordering in the neural system.

      Finally, the “no-good indicator” is introduced to suppress hippocampal sequences transiently and thereby accelerate switching behavior. Our no-good indicator is most consistent with the biological findings on D2-expressing neurons in the hippocampus. We added the following text below.

      About the no-good indicator

      “The no-good indicator is inspired by recent findings in the ventral hippocampus, where dopamine D2-expressing neurons of the ventral subiculum selectively promote exploration under anxiogenic contexts (Godino et al., 2025)”

      Besides the hippocampus, similar mechanisms—temporary suppression of recently visited or lowvalue attractor states—have been proposed in biological decision-making and working-memory literature, providing conceptual support for the no-good indicator in our model.

      After exposure to a new context, a new memory/context is stored in the X network. As the storage of a new memory requires synaptic plasticity, this step would presumably take a significant amount of time in an animal.

      Thank you for raising this important point. We agree that the formation of a new memory or context requires synaptic changes, and it is well established that processes such as tagging during wakefulness and consolidation during sleep take considerable time. However, once a context has been learned, switching between contexts can be achieved just by moving between attractors in the X network. This mechanism allows for rapid, context-dependent behavior without requiring new synaptic modifications each time. Our study focuses on this aspect of fast context-dependent switching rather than the initial memory formation.

      My understanding is that the Amari-Hopfield network should be evolving in continuous time and not be binary. But there were no time constants mentioned, and the equations were not provided, and it seems that the elements of X were binary units, rather than analog. This should be clarified.

      Thank you for the comment.

      Although there are models with continuous firing rates and continuous time (Ramsauer et al., 2021), the original Amari-Hopfield model uses binary neurons operating in discrete time steps. As we answered the comments (5) and (6) from Reviewer 1, we considered only a discretely timestepped environment for which the timescale is arbitrary. At each environmental state where the current contextual state is selected, it typically takes about ten iterations for the conversion of the Amari-Hopfield network.

      In the text, we added the following text.

      “For simplicity, the environment is defined in discrete time, and agents move through environmental states characterized by distinct external stimuli.”

      Figure 3 is aimed at replicating the lap cell finding of Sun et al, 2020. In panel E, a comparison is made between the data and the model. Are the cells in the model the entire population of H neurons (state and transition), or just a subset? Does the absence of the "ghosts" (the weaker off diagonal responses seen in the experimental data) imply that the network is not encoding that it is in the same location, but a different lap? Why is there not any true sequentiality (i.e., why do all H units go on at once)?

      Thank you for your insightful comments. Throughout this study, we used 300 neurons for the Sequence composer (H); however, for simplicity, we constrained the model such that only a single H neuron was active at each time point. As a result, most other neurons remained silent. Accordingly, in Fig. 3E, we display only neurons with firing activity, and silent neurons are not shown.

      As you correctly inferred, hippocampal neurons in our model encode lap identity rather than the same physical location across laps. This design choice reflects our focus on hippocampal neurons representing contextual states, rather than place-coding neurons, as only the former contributes directly to contextual behavior in our framework. As shown in Fig. 3E, hippocampal neurons exhibit clear sequential activity with “episode-like” representations corresponding to individual laps. Nevertheless, we believe that incorporating a mixture of context-coding neurons and place-coding neurons is an important direction for future work, as illustrated in Fig. S3.

      We revised the caption of Fig. 3E as follows.

      “E, The comparison of (Left) lap cells in the hippocampus in the 4-lap task (Sun et al., 2020) and (Right) our results of active neurons in the H module.”

      Typo "but also makeS predictions".

      Thank you for pointing this out. We revised it correctly.

    1. Author Response:

      We appreciate the reviewers’ thoughtful assessments and constructive feedback on our manuscript. The central goal of our study was to propose a simple and biologically inspired model-based reinforcement learning (MBRL) framework that draws on mechanisms observed in episodic memory systems. Unlike model-free approaches that require processing at each state transition, our model uses sequential activity (= transition model) to predict environmental changes in the long term by leveraging episode-like representations.

      While many prior studies have focused on optimizing task performance in MBRL, our primary aim is to explore how flexible, context-dependent behavior—reminiscent of that observed in biological systems—can be instantiated using simple, neurally plausible mechanisms. In particular, we emphasize the use of an Amari-Hopfield network for the context selection module. This network, governed by Hebbian learning, forms attractors that can correct for sensory noise and facilitate associative recall, allowing dynamic separation of prediction errors due to sensory noise versus those due to contextual mismatches. However, we acknowledge that our explanation of these mechanisms, especially in relation to sensory noise, was not sufficiently developed in the current manuscript. We plan to revise the text to clarify this limitation and to expand on the implications of these mechanisms in the context of psychiatric disorder-like behaviors, as illustrated in Figure 5. Several reviewers raised concerns about the clarity of our model. Our implementation is intentionally algorithmic rather than formal, designed to provide an accessible proof-of-concept model. We will revise the manuscript to better describe the core logic of the model—namely, the bidirectional interaction between the Hopfield network (X) and the hippocampal sequence module (H), where X sends the information on estimated current context to H, and H returns a future prediction based on the episode to X. This interaction forms a loop enabling the current context estimation and its reselection.

      The key advantage of this architecture is its ability to flexibly adjust the temporal span of episodes used for inference and control, providing a potential solution to the challenge of credit assignment over variable time scales in MBRL. Because our model forms and stores the variable length of episodes depending on the context, it can handle both short-horizon and long-horizon tasks simultaneously. Moreover, because each episode is organized by context, reselecting contexts enables rapid switching between these variable timescales. This flexibility addresses a challenge in MBRL—the assignment of credit across variable time scales—without requiring explicit optimization. To better illustrate this important feature, we plan to include additional experiments in the revised manuscript that demonstrate how context-dependent modulation of episode length enhances behavioral flexibility and task performance.

      Finally, we will address the comments on the presentation and the biological grounding of our model. To improve clarity and biological relevance, we will revise the Methods section to explicitly describe how the model is grounded in mechanisms observed in real neural systems. Also, we will clarify which parts of our figures represent computational results versus schematic illustrations and more clearly explain how each model component relates to known neural mechanisms. These revisions aim to improve both clarity and accessibility for a broad audience, while reinforcing the biological relevance of our approach.

      We thank the reviewers again for their insightful comments, which will help us substantially improve the manuscript. We look forward to submitting a revised version that more clearly conveys the contributions and implications of our work.

    1. Author response:

      We thank the reviewers for their excellent and thoughtful comments and suggestions, along with their strong support of the work. We agree with the general feedback that there is opportunity for further mechanistic dissection of the data from a variety of interesting angles. This was a fascinating project to work on because of all of the possible directions, and we attempted to highlight a diversity of compelling findings. We wish we had time to devote to answering more of the open mechanistic questions, but, given competing priorities, we are unfortunately unable to do them justice at this time. At the suggestion of a reviewer, we have made results available through MaveDB (accession numbers urn:mavedb:00001270-a and urn:mavedb:00001271-a) as a way to empower others to explore more.

    1. Author response:

      We thank the editors and reviewers for their careful reading of our manuscript and for their insightful comments. We appreciate the opportunity to clarify several aspects of the derivations and experimental design, and we will revise the manuscript accordingly. Below we provide responses to the major weaknesses raised by the reviewers.

      The derivation of the main error term misses some important steps, which complicates peer review at this stage. In particular, factorisation of the covariance into noise and the inverse of the observation covariance matrix needs a more thorough justification. The cited sources do not contain the derivation for a noise term with full covariance, which is essential for deriving this error term.

      Thank you for pointing this out. We agree that the derivation of the main error term should be presented more explicitly to facilitate peer review. In the revised manuscript, we will explicitly cite the relevant equation numbers from the references to make each step of the argument easier to follow. We will also revise the text to more clearly discuss the assumption on the noise covariance matrix.

      The pratical recommendation at the end of the paper also requires clearer guidance on how the design perturbations are constructed, and how many times and for how long the system is stimulated in each iteration of the experiment.

      Thank you for this helpful suggestion. We agree that the practical implementation of the experimental design should be explained more clearly. In the revised manuscript, we will provide a more explicit description of how the input perturbations are constructed in each iteration. To more clearly explain how many times and for how long the system is stimulated, we will clarify the stopping criterion used in the iterative procedure and the time length of the external inputs. As shown in Eq. (8), the estimation error scales approximately as 1/T, so longer measurements improve accuracy. For clearer guidance, we will add additional explanations on the relation between the stimulation time and estimation accuracy, as well as on the role of iterative input design.

      Finally, there is no analysis of model mis-specification. In particular, the true dynamics are unlikely to be linear; the noise is unlikely to be either Gaussian or uncorrelated across time; and the B matrix is unlikely to be known perfectly. We're not suggesting that the authors consider a more complex model, but it's important to know how sensitive their method is to model mismatch. If nothing can be done analytically, then simulations would at least provide some kind of guide.

      We thank the reviewer for raising this important point. We agree that it is important to understand how sensitive the proposed method is to model mismatch. While our current theoretical analysis assumes linear dynamics with Gaussian noise for analytical tractability, real systems may deviate from these assumptions in several ways, including nonlinear dynamics, temporally correlated noise, or imperfect knowledge of the input matrix B. To address this concern, we will add simulation experiments to examine the robustness of our method under several types of model misspecification. These simulations will provide practical guidance on how deviations from the assumed model affect estimation performance. We will include these results and discuss their implications in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      General Response

      We are grateful for the constructive comments from reviewers and the editor.

      The main point converged on a potential alternative interpretation that top-down modulation to the visual cortex may be contributing to the NC connectivity we observed. For this revision, we address that point with new analysis in Fig. S8 and Fig. 6. These results indicate that top-down modulation does not account for the observed NC connectivity.

      We performed the following analyses.

      (1) In a subset of experiments, we recorded pupil dynamics while the mice were engaged in a passive visual stimulation experiment (Fig. S8A). We found that pupil dynamics, which indicate the arousal state of the animal, explained only 3% of the variance of neural dynamics. This is significantly smaller than the contribution of sensory stimuli and the activity of the surrounding neuronal population (Fig. S8B). In particular, the visual stimulus itself typically accounted for 10-fold more variance than pupil dynamics (Fig. S8C). This suggests that the population neural activity is highly stimulus-driven and that a large portion of functional connectivity is independent of top-down modulation. In addition, after subtracting the neural activity from the pupil-modulated portion, the cross-stimulus stability of the NC was preserved (Fig. S8D).

      We note that the contribution from pupil dynamics to neural activity in this study is smaller than what was observed in an earlier study (Stringer et al. 2019 Science). That can be because mice were in quiet wakefulness in the current study, while mice were in spontaneous locomotion in the earlier study. We discuss this discrepancy in the main text, in the subsection “Functional connectivity is not explained by the arousal state”.

      (2) We performed network simulations with top-down input (Fig. 6F-H). With multidimensional top-down input comparable to the experimental data, recurrent connections within the network are necessary to generate cross-stimulus stable NC connectivity (Fig. 6G). It took increasing the contribution from the top-down input (i.e., to more than 1/3 of the contribution from the stimulus), before the cross-stimulus NC connectivity can be generated by the top-down modulation (Fig. 6H). Thus, this analysis provides further evidence that top-down modulation was not playing a major role in the NC connectivity we observed.

      These new results support our original conclusion that network connectivity is the principal mechanism underlying the stability of functional networks.

      Public Reviews:

      Reviewer #1 (Public Review):

      Using multi-region two-photon calcium imaging, the manuscript meticulously explores the structure of noise correlations (NCs) across the mouse visual cortex and uses this information to make inferences about the organization of communication channels between primary visual cortex (V1) and higher visual areas (HVAs). Using visual responses to grating stimuli, the manuscript identifies 6 tuning groups of visual cortex neurons and finds that NCs are highest among neurons belonging to the same tuning group whether or not they are found in the same cortical area. The NCs depend on the similarity of tuning of the neurons (their signal correlations) but are preserved across different stimulus sets - noise correlations recorded using drifting gratings are highly correlated with those measured using naturalistic videos. Based on these findings, the manuscript concludes that populations of neurons with high NCs constitute discrete communication channels that convey visual signals within and across cortical areas.

      Experiments and analyses are conducted to a high standard and the robustness of noise correlation measurements is carefully validated. However, the interpretation of noise correlation measurements as a proxy from network connectivity is fraught with challenges. While the data clearly indicates the existence of distributed functional ensembles, the notion of communication channels implies the existence of direct anatomical connections between them, which noise correlations cannot measure.

      The traditional view of noise correlations is that they reflect direct connectivity or shared inputs between neurons. While it is valid in a broad sense, noise correlations may reflect shared top-down input as well as local or feedforward connectivity. This is particularly important since mouse cortical neurons are strongly modulated by spontaneous behavior (e.g. Stringer et al, Science, 2019). Therefore, noise correlation between a pair of neurons may reflect whether they are similarly modulated by behavioral state and overt spontaneous behaviors. Consequently, noise correlation alone cannot determine whether neurons belong to discrete communication channels.

      Behavioral modulation can influence the gain of sensory-evoked responses (Niell and Stryker, Neuron, 2010). This can explain why signal correlation is one of the best predictors of noise correlations as reported in the manuscript. A pair of neurons that are similarly gain-modulated by spontaneous behavior (e.g. both active during whisking or locomotion) will have higher noise correlations if they respond to similar stimuli. Top-down modulation by the behavioral state is also consistent with the stability of noise correlations across stimuli. Therefore, it is important to determine to what extent noise correlations can be explained by shared behavioral modulation.

      We thank the reviewer for the constructive and positive feedback on our study.

      The reviewer acknowledged the quality of our experiments and analysis and stated a concern that the noise correlation can be explained by top-down modulation. We have addressed this concern carefully in the revision, please see the General Response above.

      Reviewer #2 (Public Review):

      Summary:

      This groundbreaking study characterizes the structure of activity correlations over a millimeter scale in the mouse cortex with the goal of identifying visual channels, specialized conduits of visual information that show preferential connectivity. Examining the statistical structure of the visual activity of L2/3 neurons, the study finds pairs of neurons located near each other or across distances of hundreds of micrometers with significantly correlated activity in response to visual stimulation. These highly correlated pairs have closely related visual tuning sharing orientation and/or spatial and/or temporal preference as would be expected from dedicated visual channels with specific connectivity.

      Strengths:

      The study presents best-in-class mesoscopic-scale 2-photon recordings from neuronal populations in pairs of visual areas (V1-LM, V1-PM, V1-AL, V1-LI). The study employs diverse visual stimuli that capture some of the specialization and heterogeneity of neuronal tuning in mouse visual areas. The rigorous data quantification takes into consideration functional cell groups as well as other variables that influence trial-to-trial correlations (similarity of tuning, neuronal distance, receptive field overlap). The paper convincingly demonstrates the robustness of the clustering analysis and of the activity correlation measurements. The calcium imaging results convincingly show that noise correlations are correlated across visual stimuli and are strongest within cell classes which could reflect distributed visual channels. A simple simulation is provided that suggests that recurrent connectivity is required for the stimulus invariance of the results. The paper is well-written and conceptually clear. The figures are beautiful and clear. The arguments are well laid out and the claims appear in large part supported by the data and analysis results (but see weaknesses).

      Weaknesses:

      An inherent limitation of the approach is that it cannot reveal which anatomical connectivity patterns are responsible for observed network structure. The modeling results presented, however, suggest interestingly that a simple feedforward architecture may not account for fundamental characteristics of the data. A limitation of the study is the lack of a behavioral task. The paper shows nicely that the correlation structure generalizes across visual stimuli. However, the correlation structure could differ widely when animals are actively responding to visual stimuli. I do think that, because of the complexity involved, a characterization of correlations during a visual task is beyond the scope of the current study.

      An important question that does not seem addressed (but it is addressed indirectly, I could be mistaken) is the extent to which it is possible to obtain reliable measurements of noise correlation from cell pairs that have widely distinct tuning. L2/3 activity in the visual cortex is quite sparse. The cell groups laid out in Figure S2 have very sharp tuning. Cells whose tuning does not overlap may not yield significant trial-to-trial correlations because they do not show significant responses to the same set of stimuli, if at all any time. Could this bias the noise correlation measurements or explain some of the dependence of the observed noise correlations on signal correlations/similarity of tuning? Could the variable overlap in the responses to visual responses explain the dependence of correlations on cell classes and groups?

      With electrophysiology, this issue is less of a problem because many if not most neurons will show some activity in response to suboptimal stimuli. For the present study which uses calcium imaging together with deconvolution, some of the activity may not be visible to the experimenters. The correlation measure is shown to be robust to changes in firing rates due to missing spikes. However, the degree of overlap of responses between cell pairs and their consequences for measures of noise correlations are not explored.

      Beyond that comment, the remaining issues are relatively minor issues related to manuscript text, figures, and statistical analyses. There are typos left in the manuscript. Some of the methodological details and results of statistical testing also seem to be missing. Some of the visuals and analyses chosen to examine the data (e.g., box plots) may not be the most effective in highlighting differences across groups. If addressed, this would make a very strong paper.

      We thank the reviewer for acknowledging the contributions of our study.

      We agree with the reviewer that future studies on behaviorally engaged animals are necessary. Although we also agree with the reviewer that behavior studies are out the scope of the current manuscript, we have included additional analysis and discussion on whether and how top-down input would affect the NC connectivity in the revision. Please see the General Response above.

      Reviewer #3 (Public Review):

      Summary:

      Yu et al harness the capabilities of mesoscopic 2P imaging to record simultaneously from populations of neurons in several visual cortical areas and measure their correlated variability. They first divide neurons into 65 classes depending on their tuning to moving gratings. They found the pairs of neurons of the same tuning class show higher noise correlations (NCs) both within and across cortical areas. Based on these observations and a model they conclude that visual information is broadcast across areas through multiple, discrete channels with little mixing across them.

      NCs can reflect indirect or direct connectivity, or shared afferents between pairs of neurons, potentially providing insight on network organization. While NCs have been comprehensively studied in neuron pairs of the same area, the structure of these correlations across areas is much less known. Thus, the manuscripts present novel insights into the correlation structure of visual responses across multiple areas.

      Strengths:

      The study uses state-of-the art mesoscopic two-photon imaging.

      The measurements of shared variability across multiple areas are novel.

      The results are mostly well presented and many thorough controls for some metrics are included.

      Weaknesses:

      I have concerns that the observed large intra-class/group NCs might not reflect connectivity but shared behaviorally driven multiplicative gain modulations of sensory-evoked responses. In this case, the NC structure might not be due to the presence of discrete, multiple channels broadcasting visual information as concluded. I also find that the claim of multiple discrete broadcasting channels needs more support before discarding the alternative hypothesis that a continuum of tuning similarity explains the large NCs observed in groups of neurons.

      Specifically:

      Major concerns:

      (1) Multiplicative gain modulation underlying correlated noise between similarly tuned neurons

      (1a) The conclusion that visual information is broadcasted in discrete channels across visual areas relies on interpreting NC as reflecting, direct or indirect connectivity between pairs, or common inputs. However, a large fraction of the activity in the mouse visual system is known to reflect spontaneous and instructed movements, including locomotion and face movements, among others. Running activity and face movements are some of the largest contributors to visual cortex activity and exert a multiplicative gain on sensory-evoked responses (Niell et al, Stringer et al, among others). Thus, trial-by-fluctuations of behavioral state would result in gain modulations that, due to their multiplicative nature, would result in more shared variability in cotuned neurons, as multiplication affects neurons that are responding to the stimulus over those that are not responding ( see Lin et al, Neuron 2015 for a similar point).<br /> As behavioral modulations are not considered, this confound affects most of the conclusions of the manuscript, as it would result in larger NCs the more similar the tuning of the neurons is, independently of any connectivity feature. It seems that this alternative hypothesis can explain most of the results without the need for discrete broadcasting channels or any particular network architecture and should be addressed to support its main claims.

      (1b) In Figure 5 the observations are interpreted as evidence for NCs reflecting features of the network architecture, as NCs measured using gratings predicted NC to naturalistic videos. However, it seems from Figure 5 A that signal correlations (SCs) from gratings had non-zero correlations with SCs during naturalistic videos (is this the case?). Thus, neurons that are cotuned to gratings might also tend to be coactivated during the presentation of videos. In this case, they are also expected to be susceptible to shared behaviorally driven fluctuations, independently of any circuit architecture as explained before. This alternative interpretation should be addressed before concluding that these measurements reflect connectivity features.

      We thank the reviewer for acknowledging the contributions of our study.

      The reviewer suggested that gain modulation might be interfering with the interpretation of the NC connectivity. We have addressed this issue in the General Response above.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      (2) Discrete vs continuous communication channels

      (2a) One of the author's main claims is that the mouse cortical network consists of discrete communication channels. This discreteness is based on an unbiased clustering approach to the tuning of neurons, followed by a manual grouping into six categories in relation to the stimulus space. I believe there are several problems with this claim. First, this clustering approach is inherently trying to group neurons and discretise neural populations. To make the claim that there are 'discrete communication channels' the null hypothesis should be a continuous model. An explicit test in favor of a discrete model is lacking, i.e. are the results better explained using discrete groups vs. when considering only tuning similarity? Second, the fact that 65 classes are recovered (out of 72 conditions) and that manual clustering is necessary to arrive at the six categories is far from convincing that we need to think about categorically different subsets of neurons. That we should think of discrete communication channels is especially surprising in this context as the relevant stimulus parameter axes seem inherently continuous: spatial and temporal frequency. It is hard to motivate the biological need for a discretely organized cortical network to process these continuous input spaces.

      (2b) Consequently, I feel the support for discrete vs continuous selective communication is rather inconclusive. It seems that following the author's claims, it would be important to establish if neurons belong to the same groups, rather than tuning similarity is a defining feature for showing large NCs.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      Finally, as stated in point 1, the larger NCs observed within groups than across groups might be due to the multiplicative gain of state modulations, due to the larger tuning similarity of the neurons within a class or group.

      We have addressed this issue in the General Response above and the response to comment (1).

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      A general recommendation discussed with the reviewers is to make use of behavioural recording to assess whether shared behaviourally driven modulations can explain the observed relation between SC and NC, independently of the network architecture. Alternatively, a simulation or model might also address this point as well as the possibility that the relation of SC and NC might be also independent of network architecture given the sparseness of the sensory responses in L2/3.

      We have addressed this in the General Response above.

      Broadly speaking, inferring network architecture based on NCs is extremely challenging. Consequently, the study could also be substantially improved by reframing the results in terms of distributed co-active ensembles without insinuation of direct anatomical connectivity between them.

      We agree that the inferring network architecture based on NCs is challenging. The current study has revealed some principles of functional networks measured by NCs, and we showed that cross-stimulus NC connectivity provides effective constraints to network modeling. We are explicit about the nature of NCs in the manuscript. For example, in the Abstract, we write “to measure correlated variability (i.e., noise correlations, NCs)”, and in the Introduction, we write “NCs are due to connectivity (direct or indirect connectivity between the neurons, and/or shared input)”. We are following conventions in the field (e.g., Sporns 2016; Cohen and Kohn 2011).

      Notice also that the abstract or title should make clear that the study was made in mice.

      Sorry for the confusion, we now clearly state the study was carried out in mice in the Abstract and Introduction.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript presents a meticulous characterization of noise correlations in the visual cortical network. However, as I outline in the public review, I think the use of noise correlations to infer communication channels is problematic and I urge the authors to carefully consider this terminology. Language such as "strength of connections" (Figure 4D) should be avoided.

      We now state in the figure legend that the plot in Fig. 4D shows the average NC value.

      My general suggestion to the authors, which primarily concerns the interpretation of analyses in Figures 4-6, is to consider the possible impact of shared top-down modulation on noise correlations. If behavioral data was recorded simultaneously (e.g. using cameras to record face and body movements), behavioral modulation should be considered alongside signal correlation as a possible factor influencing NCs.

      We have addressed this issue in the General Response above.

      I may be misunderstanding the analysis in Figure 4C but it appears circular. If the fraction of neurons belonging to a particular tuning group is larger, then the number of in-group high NC pairs will be higher for that group even if high NC pairs are distributed randomly. Can you please clarify? I frankly do not understand the analysis in Figure 4D and it is unclear to me how the analyses in Figure 4C-D address the hypotheses depicted in the cartoons.

      Sorry for the confusion, we have clarified this in the Fig. 4 legend.

      Each HVA has a SFTF bias (Fig. 1E,F; Marshel et al., 2011; Andermann et al., 2011; Vries et al., 2020). Each red marker on the graph in Fig. 4C is a single V1-HVA pair (blue markers are within an area) for a particular SFTF group (Fig. 1). The x-axis indicates the number of high NC pairs in the SFTF group in the V1-HVA pair divided by the total number of high NC pairs per that V1-HVA pair (summed over all SFTF groups). The trend is that for HVAs with a bias towards a particular SFTF group, there are also more high NC pairs in that SFTF group, and thus it is consistent with the model on the right side. This is not circular because it is possible to have a SFTF bias in an HVA and have uniformly low NCs. The reviewer is correct that a random distribution of high NCs could give a similar effect, which is still consistent with the model: that the number of high NC pairs (and not their specific magnitudes) can account for SFTF biases in HVAs.

      To contrast with that model, we tested whether the average NC value for each tuning group varies. That is, can a small number of very high NCs account for SFTF biases in HVAs? That is what is examined in Fig. 4D. We found that the average NC value does not account for the SFTF biases. Thus, the SFTF biases were not related to the modulation in NC (i.e., functional connection strength). 

      I found the discussion section quite odd and did not understand the relevance of the discussion of the coefficient of variation of various quantities to the present manuscript. It would be more useful to discuss the limitations and possible interpretations of noise correlation measurements in more detail.

      We have revised the discussion section to focus on interpreting the results of the current study and comparing them with those of previous studies.

      Figure 3B: please indicate what the different colors mean - I assume it is the same as Figure 3A but it is unclear.

      We added text to the legend for clarification.

      Typos: Page 7: "direct/indirection wiring", Page 11: "pooled over all texted areas"

      We have fixed the typos.

      Reviewer #2 (Recommendations For The Authors):

      The significance of the results feels like it could be articulated better. The main conclusion is that V1 to HVA connections avoid mixing channels and send distinctly tuned information along distinct channels - a more explicit description of what this functional network understanding adds would be useful to the reader.

      Thanks for the suggestion. We have edited the introduction section and the discussion section to make the take-home message more clear.

      Previous studies with anatomical data already indicate distinctly tuned channels - several of which the authors cite - although inconsistently:

      • Kim et al 2018 https://doi.org/10.1016/j.neuron.2018.10.023

      • Glickfeld et al., 2013 (cited)

      • Han et al., 2022 (cited)

      • Han and Bonin 2023 (cited)

      Thanks for the suggestion, we now cite the Kim et al. 2018 paper.

      I think the information you provide is valuable - but the value should be more clearly spelled out - This section from the end of the discussion for example feels like abdicates that responsibility:<br /> "In summary, mesoscale two-photon imaging techniques open up the window of cellular-resolution functional connectivity at the system level. How to make use of the knowledge of functional connectivity remains unclear, given that functional connectivity provides important constraints on population neuron behavior."

      A discussion of how the results relate to previous studies and a section on the limitations of the study seems warranted.

      Thanks for the suggestion, we have extensively edited the discussion section to make the take-home message clear and discuss prior studies and limitations of the present study.

      Details:

      Analyses or simulations showing that the dependency of correlations on similarity of tuning is not an artifact of how the data was acquired is in my mind missing and if that is the case it is crucial that this be addressed.

      At each step of data analysis, we performed control analysis to assess the fidelity of the conclusion. For example, on the spike train inference (Fig. S4), GMM clustering (Fig. S1), and noise correlation analysis (Figs. 2, S5).

      None of the statistical testing seems to use animals as experimental units (instead of neurons). This could over-inflate the significance of the results. Wherever applicable and possible, I would recommend using hierarchical bootstrap for testing or showing that the differences observed are reproducible across animals.

      We analyzed the tuning selectivity of HVAs (Fig. 1F) using experimental units, rather than neurons. It is very difficult to observe all tuning classes in each experiment, so pooling neurons across animals is necessary for much of the analysis. We do take care to avoid overstating statistical results, and we show the data points in most figure to give the reader an impression of the distributions.

      Page 2. "The number of neurons belonged to the six tuning groups combined: V1, 5373; LM, 1316; AL, 656; PM, 491; LI, 334." Yet the total recorded number of neurons is 17,990. How neurons were excluded is mentioned in Methods but it should be stated more explicitly in Results.

      We have added text in the Fig. 1 legend to direct the audience to the Methods section for information on the exclusion / inclusion criteria.

      Figure 1C, left. I don't understand how correlation is the best way to quantify the consistency of class center with a subset of data. Why not use for example as the mean square error. The logic underlying this analysis is not explained in Methods.

      Sorry for the confusion, we have clarified this in the Methods section.

      We measured the consistency of the centers of the Gaussian clusters, which are 45-dimensional vectors in the PC dimensions. We measured the Pearson correlation of Gaussian center vectors independently defined by GMM clustering on random subsets of neurons. We found the center of the Gaussian profile of each class was consistent (Fig. 1C). The same class of different GMMs was identified by matching the center of the class.

      Figure 1E. There are statements in the text about cell groups being more represented in certain visual areas. These differences are not well represented in the box plots. Can't the individual data points be plotted? I have also not found the description and results of statistical testing for these data.

      We have replotted the figure (now Fig. 1F) with dot scatters which show all of the individual experiments.

      Figure 2A, right, since these are paired data, I am not quite sure why only marginal distributions are shown. It would be interesting to know the distributions of correlations that are significant.

      This is only for illustration showing that NCs are measurable and significantly different from zero or shuffled controls. The distribution of NCs is broad and has both positive and negative values. We are not using this for downstream analysis.

      Figure 4A, I wonder if it would not be better to concentrate on significant correlations.

      We focused on large correlation values rather than significant values because we wanted to examine the structure of “strongly connected” neuron pairs. Negative and small correlation values can be significant as well. Focusing on large values would allow us to generate a clear interpretation.  

      Figure 4B, 'Mean strength of connections' which I presume mean correlations is not defined anywhere that I can see.

      I believe the reviewer means Fig. 4D. It means the average NC value. We have edited the figure legend to add clarity.

      Figure 4F, a few words explaining how to understand the correlation matrix in text or captions would be helpful.

      Sorry for the confusion, we have clarified this part in figure legend for Fig. 4F.

      Page 5, right column: Incomplete sentence: "To determine whether it is the number of high NC pairs or the magnitude of the NCs,".

      We have edited this sentence.

      Page 5, right column: "Prior findings from studies of axonal projections from V1 to HVAs indicated that the number of SF-TF-specific boutons -rather than the strength of boutons- contribute to the SF-TF biases among HVAs (Glickfeld et al., 2013)." Glickfeld et al. also reported that boutons with tuning matched to the target area showed stronger peak dF/F responses.

      Thank you. We have revised this part accordingly.

      Page 9, the Discussion and Figure 7 which situates the study results in a broader context is welcome and interesting, but I have the feeling that more words should be spent explaining the figure and conceptual framework to a non-expert audience. I am a bit at a loss about how to read the information in the figure.

      Sorry for the confusion, we have added an explanation about this section (page 10, right column).

      As far as I can see, data availability is not addressed in the manuscript. The data, code to analyze the data and generate the figures, and simulation code should be made available in a permanent public repository. This includes data for visual area mapping, calcium imaging data, and any data accessory to the experiments.

      We have stated in the manuscript that code and data are available upon request. We regularly share data with no conditions (e.g., no entitlement to authorship), and we often do so even prior to publication.

      The sex of the mice should be indicated in Figure T1.

      The sex of the mice was mixed. This is stated in the Methods section.

      Methods:

      Section on statistical testing, computation of explained variance missing, etc. I feel many analyses are not thoroughly described.

      Sorry for the confusion, we have improved our method section.

      Signal correlation (similarity between two neurons' average responses to stimuli) and its relation to noise correlation is not formally defined.

      We have included the definition of signal correlation in the Methods.

      Number of visual stimulation trials is not stated in Methods. Only stated figure caption.

      The number of visual stimulus trials is provided in the last paragraph of the Methods section (Visual Stimuli).

      Fix typos: incorrect spelling, punctuation, and missing symbols (e.g. closing parentheses).

      We have carefully examined the spelling, punctuation, and grammar. We have corrected errors and we hope that none remain.

      Why use intrinsic imaging to locate retinotopic boundaries in mice already expressing GCaMP6s?

      We agree with the reviewer that calcium imaging of visual cortex can be used to identify the visual cortex.

      It is true that areas can be mapped using the GCaMP signals. That is not our preferred approach. Using intrinsic imaging to define the boundary between V1 and HVAs has been a well refined routine in our lab for over a decade. It is part of our standard protocol. One advantage is that the data (from intrinsic signals) is of the same nature every time. This enables us to use the same mapping procedure no matter what reporters mice might be expressing (and the pattern, e.g., patchy or restricted to certain cell types).

      Reviewer #3 (Recommendations For The Authors):

      The possibilty that larger intra-group NCs observed simply reflect a multiplicative gain on cotuned neurons could be addressed using pupil and/or face recordings: Does pupil size or facial motion predict NCs and if factored out, does signal correlation still predict NCs?

      Perhaps a variant of the network model presented in Figure 6 with multiplicative gain could also be tested to investigate these issues.

      We have addressed this issue in general response.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      Similarly further analyses can be done to strengthen support for the claims that the observed NCs reflect discrete communication channels. A direct test of continuous vs categorical channels would strengthen the conclusions. One possible analysis would be to compare pairs with similar tuning (same SC) belonging to the same or different groups.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      I also found many places where the manuscript needs clarification and /or more methodological details:<br /> • How many times was each of the stimulus conditions repeated? And how many times for the two naturalistic videos? What was the total duration of the experiments?

      The number of visual stimulus trials is provided in the last paragraph of the Methods section entitled Visual Stimuli. About 15 trials were recorded for each drifting grating stimulus, and about 20 trials were recorded for each naturalistic video.

      • Typo: Suit2p should be Suite2p (section Calcium image processing - Methods).

      We have fixed the typo.

      • What do the error bars in Figure 1E represent? Differences in group representation across areas from Figure 1E are mentioned in the text without any statistical testing.

      We have revised the Figure 1E (current Fig. 1F), and we now show all data points.

      • The manuscript would benefit from a comparison of the observed area-specific tuning biases across areas (Figure 1E and others) with the previous literature.

      We have included additional discussion on this in the last paragraph of the section entitled Visual cortical neurons form six tuning groups.

      • Why are inferred spike trains used to calculate NCs? Why can't dF/F be used? Do the results differ when using dF/F to calculate NC? Please clarify in the text.

      We believe inferred spike trains provide better resolution and make it easier to compare with quantitative values from electrical recordings. Notice that NC values computed using dF/F can be much larger than those computed by inferred spike trains. For example, see Smith & Hausser 2010 Nat Neurosci. Supplementary Figure S8.

      • The sentence seems incomplete or unclear: "That is, there are more high NC pairs that are in-group." Explicit vs what?

      We have revised this sentence.

      • Figure 1E is unclear to me. What is being plotted? Please add a color bar with the metric and the units for the matrix (left) and in the tuning curves (right panels). If the Y and X axes represent the different classes from the GMM, why are there more than 65 rows? Why is the matrix not full?

      We have revised this figure. Fig. 1D is the full 65 x 65 matrix. Fig. 1F has small 3x3 matrices mapping the responses to different TF and SF of gratings. We hope the new version is clearer.

      • How are receptive fields defined? How are their long and short axes calculated? How are their limits defined when calculating RF overlap?

      We have added further details in the Methods section entitled “Receptive field analysis”.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigated the role of transcriptional and translational controls of gene expression in dorsal root ganglia and lumbar spinal cord in neuropathic pain in mice. Using ribosome profiling (Ribo-seq) and translating ribosome affinity purification (TRAP), they show changes in transcriptomic and translational gene expression at the peripheral and central levels rapidly after nerve injury. While translational changes in gene expression remained elevated for more than two months in both DRGs and the spinal cord, transcriptomic regulation was absent in the spinal cord long after the onset of neuropathy. Disrupting mRNA translation in dorsal horn neurons using antisense oligonucleotides reduced mechanical withdrawal threshold and facial expression of pain. Using fluorescent noncanonical amino acid tagging (FUNCAT), the authors further show that de novo protein expression primarily occurs in inhibitory neurons in the superficial dorsal horn after nerve injury. Accordingly, a selective increase in translational control of gene expression in spinal inhibitory neurons, or a subset of mainly inhibitory neurons expressing parvalbumin (PV), using transgenic mice, led to a decrease in the excitability of PV neurons and mechanical allodynia. In contrast, decreasing the translational control of spinal PV neurons prevented the alteration of the electrophysiological properties of the PV cells induced by nerve injury.

      Strengths:

      This is a well-written article that uncovers a previously unappreciated role of gene expression control in PV neurons, which seems to play an important part in the loss of inhibitory control of spinal circuits typically seen after peripheral nerve injury. The conclusions are generally well supported by the data.

      Weaknesses:

      The study would benefit from further clarifications in the methods section and a deeper analysis of gene expression changes in mRNA expression and ribosomal footprint observed after nerve injury.

      We have improved the description of the methods and clarified the rationale underlying the presentation of gene expression changes. We have also added lists of the top differentially expressed genes at both the translational and transcriptional levels to Figure 1, and improved the description of the datasets in the Supplementary Materials.

      Antisense oligonucleotides used to reduce translation by disrupting eIF4E expression were administered i.c.v. It is unknown if the authors controlled for locomotor deficits, which might add confounds in the interpretation of behavioral results. A more local route should have been preferable to avoid targeting brain regions, which could potentially affect behavior.

      Thank you for raising this important point. We used i.c.v. administration to specifically target the central nervous system (CNS) without affecting the peripheral nervous system, as this is the recommended approach for selectively targeting the CNS using ASOs. Intraspinal administration of ASOs (into the spinal cord parenchyma) at an effective dose for long-term effects is not feasible. Intrathecal administration is possible but would result in exposure of the DRGs to the injected ASO and therefore would not be specific to the CNS.

      To rule out potential locomotor deficits, we now subjected mice to the rotarod and open field tests to assess motor function. We found no differences between eIF4E-ASO– and control-ASO– injected mice (Fig. 2J, K).

      In the revised version of the manuscript, we now better explain the rationale for i.c.v. injection. Moreover, we discuss the potential supraspinal effects of eIF4E-ASO in the Limitations section, while also describing the lack of motor phenotypes in the rotarod/open field tests.

      Only female mice were used for Ribo-Seq, TRAP, FUNCAT, and electrophysiology, but both sexes were used for behavior experiments.

      Our manuscript involves various complicated techniques and analyses. Due to limited resources, we therefore opted to use only females for expensive and labor-intensive experiments, such as Ribo-Seq, TRAP, FUNCAT, and electrophysiology, while using both sexes for behavioral studies.

      We now clearly acknowledge this limitation in the revised manuscript.

      The conditional KO of 4E-BP1 using transgenic animals should be total in the targeted cells. However, only a partial reduction is reported in Figure S2 in GAD2, PV, Vglut2, or Tac1 cells. Again, proper methods for quantification of fluorescence in these experiments are lacking.

      We apologize for the oversight; we have now updated the description of the methods for IHC signal quantification. Although genetic ablation is indeed expected to result in a complete loss of signal, in practice, previous studies employing IHC, but not Western blotting, for 4E-BP1 have also shown only a partial reduction in signal. This is likely because the 4E-BP1 antibody partially detects other epitopes. Using the same antibody, we and others have shown complete elimination of the band corresponding to 4E-BP1 in spinal cord and DRG tissue (e.g., PMID: 26678009).

      The elegant knockdown of eIF4E using AAV-mediated shRNAmir shows a recovery of the electrophysiological intrinsic properties of PV neurons after injury. It is unclear if such manipulation would be sufficient to reverse mechanical allodynia in vivo.

      Thank you for this concern, which was also raised by other reviewers. We have now performed two additional experiments, which revealed that suppressing the mTORC1–eIF4E axis in spinal PV neurons (using AAVs expressing eIF4E-shRNA in spinal PV neurons [Fig. 6A] and transgenic mice expressing non-phosphorylatable 4E-BP1 in PV neurons [Fig. 6B]) is not sufficient to alleviate neuropathic pain. These new findings need to be reconciled with our other results showing that eIF4E downregulation in PV neurons prevents the SNI-induced reduction in their excitability, and that ASO-mediated suppression of eIF4E, which affects all cell types, alleviates neuropathic pain.

      Together, these results suggest that targeting translational control in PV neurons is sufficient to reverse SNI-induced reduction in PV neuron excitability, but is not sufficient to prevent behavioral phenotypes, which likely require changes in other cell types and/or additional pathways, as well as other alterations within PV neurons. We have now included these new results in the revised manuscript (Fig. 6A and Fig. 6B) and revised the text accordingly. These changes include toning down the role of translational control in PV neurons after SNI in driving behavioral hypersensitivity.

      Reviewer #2 (Public review):

      Summary:

      I reviewed the manuscript titled "Translational Control in the Spinal Cord Regulates Gene Expression and Pain Hypersensitivity in the Chronic Phase of Neuropathic Pain." This manuscript compares transcription and translation in the spinal cord during the acute and chronic phases of neuropathic pain induced by surgical nerve injury. The authors chose to focus their investigation on translation in the chronic phase due to its greater impact on gene expression in the spinal cord compared to transcription.

      (1) The study is significant because the molecular mechanisms underlying chronic pain remain elusive. The role of translational regulation in the spinal cord has not been investigated in neuroplasticity and chronic pain mouse models. The manuscript is innovative and technically robust. The authors employed several cutting-edge techniques such as Rio-seq, TRAP-seq, slice electrophysiology, and viral approaches. Despite the technical complexity, the manuscript is wellwritten. The authors demonstrated that inhibition of eIF4E alleviates pain hypersensitivity, that de novo protein synthesis is more pronounced in inhibitory interneurons, and that manipulating mTOR-eIF4E pathways alters mechanical sensitivity and neuroplasticity.

      Strengths:

      Innovation (conceptual and technical levels), data support the conclusions.

      Weakness:

      Confusion about the sex of the animals. It is unclear whether eIF4E ASO affects translation and which cells. It is not determined that modulating translation in PV<sup>+</sup> neurons impacts neuropathic pain behaviors.

      We thank the reviewer for their thoughtful comments. In the revised version of the manuscript, we better explain that both sexes were used for behavioral experiments, whereas only females were used for Ribo-Seq, TRAP, FUNCAT, and electrophysiology experiments.

      ASOs are not known to be intrinsically cell-type-specific; therefore, we do not expect differential effects on excitatory versus inhibitory neurons. We demonstrated that eIF4E-ASO reduces the levels of eIF4E, a key translation initiation factor that is rate-limiting for cap-dependent translation.

      Moreover, in the revised manuscript we included two additional experiments (Fig. 6A and Fig. 6B) showing that decreased eIF4E-dependent translation in PV neurons is not sufficient to alleviate neuropathic pain, despite its effects on excitability measures. We have updated the manuscript to reflect these important new findings

      Reviewer #3 (Public review):

      Summary:

      This study provides evidence for translational changes in inhibitory spinal dorsal horn neurons following chronic nerve injury. Gene expression changes have been widely studied in the context of pain induction and provided key insights into the adaptation of the nervous system in the early phases of chronic pain. Whereas this is interesting biologically, most patients will arrive in the clinic beyond the acute phase of their injury, thus limiting the translational relevance of these studies. Recent studies have extended this work to highlight the difference between acute and chronic pain states, potentially explaining the cascading factors leading to chronic pain, and hopefully how to prevent this in vulnerable populations. The present study suggests that translational changes within spinal inhibitory populations could underlie long-term chronic pain, leading to decreased inhibition and heightened pain thresholds.

      Strengths:

      The approaches used and the broad outcomes of the manuscript are interesting and could be an exciting development in the field. The authors are using approaches more common in molecular biology and extending these into neuroscientific research, getting into the detail of how pathology could impact gene expression differentially across the course of an injury. This could open up new areas of research to selectively target not only defined populations but additionally help alleviate pain symptoms once an injury has already reached the maintenance phase. There is an opportunity to delve into what must be a very large data set and learn more about what genes are differentially translated and how this could affect circuit function.

      Weaknesses:

      Whereas the authors approach a key question in pain chronicity, the manuscript falls a little short of providing any conclusive data. The manuscript was in some areas very difficult to follow. Terminology was not always consistent or clear, and the flow of the manuscript could use some attention to highlight key areas. Whereas the overall message is clear in the summary, this would not necessarily be the case when reading the manuscript alone.

      To improve the clarity and flow of the manuscript, we made changes to the text, including the addition of intermediate summaries and further explanations of terms and experiments.

      The study claims to show that translational control mechanisms in the spinal cord play a role in mediating neuropathic pain hypersensitivity, but the studies presented do not fully support this statement. The authors instead provide some correlation between translation and behavioural reflex excitability (namely vfh and Hargreaves).

      It is difficult to fully interpret the work, as there are a number of inconsistencies, namely the range of timings pre- and post-injury, lack of controls for manipulations, the use of shmiRNA versus lineage deletions, and lack of detailed somatosensory testing. It is not completely clear how this work could be translatable as is, without a deeper understanding of how translational control affects circuit function and whether all of this is necessarily bad for the system, or whether this is a positive homeostatic adaptation to the hyperexcitability of the circuit following injury.

      A large portion of the work is focussed on showing an inhibitory-selective change in translation following chronic nerve injury. The evidence for this is however lacking. Statistics to show that translational effects are restricted to inhibitory subpopulations are inadequate. The author's choice of transgenic lines is not clear and seems to rely on availability rather than hypothesis.

      Although we agree with some of the criticism, we have reservations regarding other points raised by the reviewer. To address several of the concerns, we added new experiments (Fig. 2J, 2K, 6A, and 6B). We also made changes to the text to improve readability and to better explain the rationale for the study and our focus on inhibitory neurons.

      For example, we clarify that we do not state that changes in mRNA translation in the spinal cord during the chronic phase of neuropathic pain occur exclusively in inhibitory neurons. Although we observe changes in general protein synthesis, assessed using FUNCAT, in inhibitory but not excitatory neurons after SNI, alterations in the translation of specific transcripts, assessed using the TRAP approach, are observed in both excitatory and inhibitory neurons.

      The second part of the paper focuses on inhibitory neurons because these neurons demonstrate larger translational changes. We now clearly indicate that alterations in excitatory neurons are also likely important during the chronic phase of SNI. This conclusion is further supported by newly added results (Fig. 6A and Fig. 6B), showing that targeting eIF4E-dependent translation in spinal PV neurons using two different approaches is not sufficient to reverse pain hypersensitivity.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Analysis of gene expression in Figure 1 lacks clarity, and the data do not effectively guide the reader toward their intended purpose. A list of the most dysregulated genes at the transcriptional level, the translational level, or both, would help the reader fully appreciate the outcome of this analysis. Similarly, what is the message conveyed by Figures 4 D-G?

      As requested, we have now included the top 10 upregulated and top 10 downregulated genes at both the translational and transcriptional levels in Figure 1. We also expanded the main text and figure legends to clarify that Supplementary Figure 1 includes volcano plots for all conditions, and that Supplementary Table 1 contains the complete datasets. In addition, we expanded the figure legends to explain the organization of the data in Supplementary Table 1. Finally, we provide pathway analyses of translationally regulated genes in the spinal cord, as this condition is the primary focus of the study.

      Figure 4D–G shows the top 15 translationally upregulated and downregulated genes in inhibitory neurons at days 4 (D) and 60 (E), and in Tac1<sup>+</sup> excitatory neurons at days 4 (F) and 60 (G) (four conditions in total) after SNI. These panels convey that translational regulation of specific transcripts occurs in both inhibitory and excitatory neurons. Panel 4H further demonstrates that, although translational changes are observed in both neuronal populations, a greater number of genes are altered in inhibitory neurons. We have improved the readability and flow of this section to better convey this message.

      Details about how AHA was quantified in Figure 3 are missing. It is unclear how and where the cells were selected for quantification. Objective criteria for expression/no expression of AHA in the cells are not indicated. Additionally, the signal seems to have somehow been normalized over images from the contralateral side. It is difficult to understand what the bar graphs actually represent in panel C. One would interpret them as percentages of excitatory/inhibitory cells expressing AHA.

      We apologize for the lack of clarity. We have now expanded the description of the analyses in the figure legend and in the Methods to better explain the results shown in Fig. 3. The imaged cells were selected based on specific criteria, such as lamina location and cell type. In panel C (the anisomycin experiment), values were normalized to the control group. In all other panels, no normalization was applied, and the values represent the AHA integrated density on maximumintensity projection images (averaged per mouse). We also describe the number of sections and cells per mouse, as well as other technical details, as requested.

      In addition, a few minor changes should be made:

      (1) Rephrase Introduction: "Peripheral nerve injury can cause neuropathic pain, a chronic pain condition [...]." Neuropathic pain is not necessarily chronic.

      This sentence was reworded to read “Peripheral nerve injury may result in neuropathic pain, a debilitating condition with limited effective treatment options”.

      (2) Host species for secondary anti-mouse antibodies are provided but not for the anti-rabbit (donkey?). Also, check for consistency in the methods section. The method mentions P21 two secondary antibodies and an apparent third antibody named "anti-HRP-conjugated antibody." Please provide information about this antibody, or remove it.

      Thank you for flagging it, the inadvertent repetition of “anti-HRP-conjugated antibody” was removed.

      (3) Provide primary antibody hosts on page 22.

      The hosts of all primary and secondary antibodies were now provided.

      (4) Define PBST on page 21 and PBS-T on page 22.

      We defined PBST in the revised manuscript (0.2% Triton-X100 in PBS).

      (5) Specify the filter sets used for fluorescent microscopy.

      We specified the filter sets used for fluorescent microscopy.

      (6) Change the legend to 50% withdrawal threshold for vF behavior tests.

      We addressed this by making the requested change in all relevant legends.

      Reviewer #2 (Recommendations for the authors):

      Major:

      (1) The authors need to show that eIF4E ASO (Figure 2) reduces translation in both inhibitory and excitatory neurons.

      ASOs are not intrinsically cell-type specific, as they do not contain promoters or regulatory elements and act wherever they enter cells and engage RNase H1. However, differences in ASO effects across cell types can arise from variability in uptake, intracellular trafficking, RNase H activity, or target mRNA expression levels.

      In our study, we used eIF4E-ASO as a general approach to demonstrate that eIF4E-dependent translation contributes to SNI-induced hypersensitivity, particularly at the chronic phase. We show a marked reduction in eIF4E levels in the spinal cord of eIF4E-ASO–injected mice compared with controls. We do not claim that the effects of eIF4E-ASO are mediated by a specific cell type; rather, they may involve excitatory neurons, inhibitory neurons, and non-neuronal cells, such as microglia and astrocytes, among others.

      Notably, while eIF4E can promote general translation during development, in adult mice it predominantly regulates cap-dependent translation of specific mRNAs without having a major effect on overall protein synthesis. In our case, the partial reduction in eIF4E is unlikely to substantially affect general translation, as assessed by AHA incorporation, and would instead require TRAP or Ribo-Seq to detect transcript-specific translational changes. We now better explain the rationale for the eIF4E-ASO experiment and clearly state that the effects observed cannot be attributed to a specific cell type.

      In addition, our new results showing that inhibition of eIF4E-dependent translation in PV neurons is not sufficient to alleviate SNI-induced mechanical hypersensitivity suggest that translational changes in other neuronal and/or non-neuronal cell types contribute to hypersensitivity. This important point is now more clearly explained in the revised manuscript, and the role of PV neurons is toned down throughout the paper.

      (2) In Figure 5, it is necessary to show the effect of eIF4E-shRNA in PV+ neurons on neuropathic behaviors (von Frey and MGS).

      To address this important concern, we performed two new experiments, both of which showed that inhibiting the mTORC1–eIF4E axis in parvalbumin neurons is not sufficient to alleviate neuropathic pain. First, we injected PV-Cre mice with AAV-eIF4E-shRNAmir and a scrambled control. We found that downregulating eIF4E in spinal PV neurons has no effect on SNI-induced mechanical hypersensitivity. We used a second, complementary approach to validate this finding. Specifically, we generated transgenic mice in which a non-phosphorylatable form of 4E-BP1 is expressed in PV neurons. Because non-phosphorylatable 4E-BP1 acts as a translational suppressor of eIF4E, this approach is functionally similar to eIF4E deletion.

      Altogether, our findings indicate that cell-type–non-specific suppression of eIF4E using ASOs is sufficient to alleviate neuropathic pain, particularly at the chronic phase. In contrast, while activation of eIF4E-dependent translation in PV neurons (via 4E-BP1 deletion) induces pain hypersensitivity, suppression of eIF4E-dependent translation in PV neurons inhibits SNI-induced decrease in PV neuron excitability but does not alleviate pain hypersensitivity. Thus, increased eIF4E-dependent translation in PV neurons is sufficient to induce pain hypersensitivity, but targeting this pathway in PV neurons alone is not sufficient to reverse neuropathic pain.

      Potential explanations for these findings include: (1) the presence of other important mechanisms in PV neurons (e.g., changes in synaptic transmission) that are translation independent; (2) the insufficiency of correcting reduced PV neuron excitability to alleviate hypersensitivity; and (3) an essential role for mRNA translation in other neuronal and/or non-neuronal cell types in neuropathic pain. We have updated the manuscript to include these potential explanations in the Discussion section.

      Moderate:

      (1) In Figure 2, MGS should be performed at earlier time points as well.

      We performed MGS when von Frey testing, which is less noisy and less labor intensive in our hands, suggested altered phenotypes.

      (2) In Figure 4B, the gene markers are different in Gad2+ and Tac1+ cells. Please show the 12 markers for both cell types.

      We now better explain the selection of the markers.

      (3) In Figure 5, MGS should be performed to test if the effect is limited to mechanical sensation/reactivity or extends to nociception. Additionally, do these mice exhibit altered locomotion and grip strength?

      As described above, we added experiments involving downregulation of eIF4E and expression of a mutant non-phosphorylatable 4E-BP1 in PV neurons. We performed von Frey testing, which showed no effect of suppressing the mTORC1–eIF4E axis on mechanical hypersensitivity under these conditions. Given these negative results, we did not proceed with mouse grimace scale (MGS) analysis.

      (4) In Figure S2E, the reduction of eIF4E does not appear to be specific to GFP+ cells.

      We now replaced the representative images in this Figure.

      (5) Can chronic neuropathic pain be reduced by enhancing 4E-BP1 specifically in PV+ neurons?

      We added the experiment proposed by the reviewer in Fig. 6B. We found that enhancing 4E-BP1 activity, by expressing a non-phosphorylatable form of 4E-BP1 in PV neurons, is not sufficient to alleviate neuropathic pain hypersensitivity.

      (6) Why did the authors not use PainFace for the MGS?

      We began using manual, blinded MGS scoring, as originally described by Mogil and colleagues in 2010 (PMID: 20453868), for this project before PainFace became available around 2019 (e.g., Tuttle and Zylka) and in later versions (e.g., PMID: 39024163). For consistency, we therefore continued using the same approach throughout the experiments.

      (7) In Figures 2A-C, the labeling of the bar graphs seems incorrect: is it 4E-BP1 or eIF4E immunoreactivity?

      Thank you very much for noticing this; we have corrected the mistake.

      (8) In Figure 1, present the data by sex.

      We performed sequencing analyses only in females. This decision was based on the large number of mice and experimental conditions required for both Ribo-Seq (n = 15 mice per replicate, 3 replicates per condition, and 2 time points for SNI/Sham, ~180 mice total) and TRAP (n = 3 mice per replicate, 3 replicates per condition, 2 time points, and 2 genotypes [Tac1 and GAD2] for SNI/Sham), as well as the high cost of sequencing. Behavioral experiments were performed in both sexes. This information is clearly indicated in the Methods section, and we have now also included it in the Limitations section of the paper.

      (9) While the methods state that all behavioral testing was done with equal numbers of male and female mice, it seems that several experiments were done only in females. In the absence of a strong justification, all experiments should be conducted in both sexes.

      As explained above, due to the very large number of mice required for some experiments and the high cost of sample processing and sequencing, only behavioral experiments were performed in both sexes. We now clearly describe the sex of the animals used in each experiment in the figure legends.

      Minor:

      (1) In Figure 3, the legend is confusing and lacks labels.

      We expanded the Fig. 3 legends and added labels, as requested.

      Reviewer #3 (Recommendations for the authors):

      Overall, the manuscript needs to be made clearer and more specific. As it stands, the logic and flow are difficult to follow. Figure legends are not always indicative of the figure and are inconsistent.

      Regarding timelines:

      The logic of the different timelines is not clear. Either explain why different times post-injury were chosen between experiments or keep them consistent. It seems a key message here is that the timing is important. It therefore follows that the authors should be strict about this in their own experiments. Figure 1: 4 and 63 days. Figure 2: Day 3 and weeks 8 and 12. Figure 3: Days 4 and 60. Figure 4: Days 4 and 60. Figure 5: 6 weeks. Figure S1: 4 and 60. Clarifying why these timings were used in each case and showing at the transcript level that these are most appropriate would be needed.

      We thank the reviewer for carefully reviewing our manuscript. We focused on early versus late time points. For the sequencing experiments, we performed Ribo-seq at day 4 for the early time point and day 63 for the late time point, whereas TRAP analyses (and FUNCAT) were performed at day 4 for the early time point and day 60 for the late time point. These differences (day 60 versus day 63) were due to logistical issues related to sample collection. In our view, there are no major biological differences between day 60 and day 63 for the late time points, particularly because we do not perform direct comparisons across different experiments.

      In other experiments, we used several time points (e.g., day 3, as well as 6, 8, and 12 weeks) either to follow the development of phenotypes or based on previous publications regarding the timing of specific effects. We now acknowledge the potential limitation of using slightly different time points in the Limitations section of the paper.

      Regarding the use of inhibitory and excitatory markers:The comparisons they made between subpopulations seem a little random- for one, the number of Tac1 positive cells in the dorsal horn is not equal to that of PV, and so the comparison seems inappropriate.

      The number of cells from each subpopulation should not affect the number of DEGs. Because these analyses were performed on bulk mRNA rather than at the single-cell level, the comparisons are made between SNI and control groups within each subpopulation. Thus, the number of differentially translated genes is determined per cell type, not per individual cell.

      The lack of any semblance of variability or statistics with regard to gene changes makes it difficult to assess whether these comparisons were justified experimentally. Pax2 is a developmentally regulated transcription factor, with reduced levels in the adult. Using Pax2- NeuN+ to label excitatory interneurons is therefore not appropriate for comparison. A more appropriate comparison would be to use vGluT2 and GAD67. Similarly, the use of the GAD2Cre seems a poor choice. This is a restricted population of interneurons that have been suggested to have specific roles in presynaptic inhibition. If the authors were interested in this subpopulation for that reason, then they should state so.

      Pax2 is commonly used as a marker of inhibitory neurons in the spinal cord (e.g. PMID: 36323322) as in the adult dorsal horn, Pax2 protein remains expressed in nearly all inhibitory neurons, including both GABAergic (GAD65/67<sup>+</sup>) and glycinergic (GlyT2<sup>+</sup>) neurons. VGluT2 marks terminals of IB4-binding peripheral sensory neurons as well as those of spinal cord excitatory interneurons in lamina II of the dorsal horn, complicating the analyses. We attempted using Lmx1b for excitatory neurons (Pax2 for inhibitory and Lmx1b for excitatory) but could not obtain specific and robust signal using different commercial antibodies (we have no access to non-commercial Pax2 antibody).

      Regarding Cre lines, Gad2-Cre has been extensively used to target GABAergic neurons in the spinal cord. Although it is not expressed in purely glycinergic neurons, it is expressed in GABAergic and mixed GABA/glycine interneurons. Gad2-Cre is more restricted to superficial dorsal laminae I–III, which are relevant to pain processing, versus Gad1-Cre, which may also capture low-level GABAergic neurons in deep laminae and ventral horn inhibitory neurons. Moreover, there are also differences in the developmental profile, whereas Gad1-Cre is expressed earlier at embryonic stages during inhibitory neuron development, GAD2 is expressed later, in post-mitotic and mature inhibitory neurons. Because of these considerations (higher specificity to dorsal horn and later developmental expression), we used Gad2-Cre mouse line in our experiments.

      Regarding cKO experiments:

      It is unclear whether the deletion of Eif4ebp (which is not "ablation" as stated in the manuscript) has had any effect on the PV/GAD2 cells themselves seeing as this deletion would be a lineage deletion. One would imagine that altering transcription in such a population from early development would affect a host of neuronal and circuit properties, such as connectivity, dendritic branching, etc. The authors should show that the circuit properties were not broadly changed, not least as PV is expressed throughout the nervous system and in muscles. This could in itself explain the hypersensitivity described in their results. Experimenters should repeat the AAV shRNAmir experiments in non-injured animals, and not just control animals with the scrambled sh.

      We agree with the concerns related to potential developmental effects. Although it is nearly impossible to reliably and comprehensively demonstrate that circuit properties were not altered in our cKO mice, our manuscript presents several lines of evidence supporting a role for translational control in specific cell types in the regulation of gene expression and nociception independent of developmental effects. First, our translational gene expression analyses were performed in adult WT mice and reflect SNI-induced changes in gene expression at the translational level, assessed using complementary approaches. In addition, the effects of eIF4E ASO delivered to adult animals support a role for translational control in the regulation of SNI-induced pain hypersensitivity at later stages.

      Moreover, downregulation of eIF4E in PV neurons using an AAV-based approach in adult mice affects their SNI-induced excitability, further supporting a role for translational mechanisms in regulating PV neuron plasticity after peripheral nerve injury in adulthood. To acknowledge the potential developmental effects associated with 4E-BP1 deletion using Tac1-Cre, Gad2-Cre, and PV-Cre mouse lines (with PV-Cre beginning expression postnatally), we have included an explicit limitation statement in the Discussion of the revised manuscript.

      We also thank the reviewer for highlighting the distinction between deletion and ablation, and we have corrected this terminology in the revised manuscript.

      Regarding pain:

      A large sticking point within the study is the lack of clarity of the populations they are targeting. Many of the populations mentioned are not expressed solely in the dorsal somatosensory horn and instead are also expressed in the ventral motor horn. This is particularly important with regard to the sensory tests they are performing, which rely on reflex responses. It seems these results, although interesting, are not proof of a pain effect, but rather showing changes in vfh-behaviour. To show this is a pain-specific event, and not just correlative or reflexive, the authors should perform further behavioural tests beyond vfh, Hargreaves, and the grimace scale, such as low threshold touch, rotarod, etc. How much of this effect is due to changes in reflex excitability? Would the authors expect similar results for all neuropathic models but not for chronic inflammatory states for example? Western Blot analysis at the moment is for the whole cord, which could imply changes in the ventral or intermediate horn, it could help strengthen the study to show that these changes are selective to the dorsal cord.

      We have now added a new experiment showing that eIF4E-ASO has no effect on motor function in the rotarod and open field tests (Fig. 2J, K). In addition, the eIF4E-ASO experiment included in the original submission reflects supraspinal behavior, as assessed by MGS. Overall, our study includes numerous experiments and datasets. While we agree with some of the reviewer’s concerns, the extensive additional work requested, including additional neuropathic and inflammatory pain models, further assays of supraspinal behavior, Western blot analyses restricted to the dorsal horn, additional Cre lines and markers, and other analyses, is not feasible within the scope of the current manuscript.

      Notably, in the revised manuscript, we have added new experiments (Fig. 2J, 2K, 6A, 6B) that we believe address the most critical concerns raised by the reviewers, and we have revised the text to more clearly acknowledge the limitations of the study.

      Regarding patch clamp studies:

      An increase in rheobase alone in the PV cells would not in itself account for the changes seen in behaviour, seeing as the authors are suggesting this is a selective effect for von Frey and not radiant heat, for example. The authors should therefore show a change in mechanically-evoked firing of PV/GAD2 cells either by dorsal root stimulation in slice, or by cfos or equivalent marker of activation following sensory stimulation. The title of this figure is also misleading- it is not clear how there is any proof of promotion of plasticity in the experiments shown.

      In the original submission, in addition to an increase in rheobase, we also demonstrated decreased spiking activity in response to a range of stimulating currents (Fig. 4). We agree that assessing mechanically evoked responses of PV neurons would be informative; however, such studies are beyond the scope of the current manuscript.

      To address the final concern, we modified the title of Fig. 5 and the related text. Moreover, the newly added data showing that inhibition of translation in PV neurons does not alleviate SNIinduced hypersensitivity prompted us to tone down, throughout the manuscript, the link between translational changes in PV neurons and pain hypersensitivity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Thank you for the thoughtful and constructive comments on our manuscript. We have carefully addressed all points raised, and believe the manuscript is substantially improved as a result. In particular, we have performed:

      - Comprehensive spatial analysis of stable mutants. Following Recommendations for the authors comment #1, we performed spatial analysis by binning the anterior-posterior axis into 200 µm strata. This analysis validates our initial conclusions and reveals striking spatiotemporal dynamics, including profoundly blunted HFD responses in foxp1b mutants (68% reduction) and loss of spatial gradients in foxp1a mutants.

      - Substantially enhanced the statistical rigour of the screen analysis. We have implemented stratified Kolmogorov-Smirnov tests (within-experiment testing, then combined via Fisher's method) alongside linear mixed models to control for batch effects. In the revised manuscript, we now focus on three hypertrophy genes – foxp1b, txnipa and mmp14b – which are robustly validated by both methods.

      - Normalisation of adipose area to body size. To address concerns about developmental delay (Recommendations for the authors #2), we now normalise adipose area to standard length. With this normalisation, foxp1b single mutants show only a non-significant trend toward decreased adiposity (updated from our original analysis), while the hypertrophic LD morphology remains highly significant - demonstrating the phenotype is independent of body size and not a developmental delay.

      - Revised title. As suggested by Recommendations for the authors comment #6, we have changed the title to: "A quantitative in vivo CRISPR-imaging platform identifies regulators of hyperplastic and hypertrophic adipose morphology in zebrafish"

      - Extensive code and analysis availability. We now provide all code and extensive analysis pipelines in interactive HTML documents at https://github.com/jeminchin/zebrafish_adipose_morphology_screen

      Joint Public Review:

      We thank the reviewers for their thoughtful assessment of our work and their recognition of the rigorous experimental design, statistical approaches, and the utility of both the identified genes and screening pipeline for the field. We address their concerns below.

      Weakness:

      Distinguishing developmental patterning from adipose tissue plasticity

      We appreciate this important distinction and agree that separating developmental from adaptive effects is a key challenge in the field. We would like to make several points in response:

      First, we acknowledge this limitation in our discussion and have now expanded this section to more explicitly address the interpretive boundaries of our approach. Our screening platform was intentionally designed to capture the outcome of genetic perturbation across development and early adaptation, as these processes are inherently intertwined during the establishment of adipose tissue.

      Second, regarding the suggested analysis of lipid droplet size along the AP axis in response to HFD: we have now performed this analysis and include it as new Fig. 6 and new Supplemental Fig. 8 & 9. These data validate our initial conclusions and reveal striking spatiotemporal dynamics, including profoundly blunted HFD responses in foxp1b mutants (68% reduction) and loss of spatial gradients in foxp1a mutants. Further, these data provide additional resolution on regional responses to dietary challenge.

      Third, we note that our stable mutant validation experiments (Figure 6) do begin to disentangle these effects by examining both baseline and HFD-challenged conditions in animals with constitutive genetic loss. However, we agree that definitive separation would require temporally controlled genetic manipulation, which we now acknowledge as an important future direction.

      Lack of tissue-specific manipulations

      We agree that tissue-specific approaches would strengthen mechanistic conclusions and have acknowledged this limitation in our revised discussion. The current study was designed as a discovery-focused screen to identify candidate regulators, with the understanding that mechanistic dissection would require follow-up studies employing tissue-specific tools.

      We note that adipocyte-specific Cre/lox or Gal4-UAS approaches in zebrafish are feasible and represent an important next phase of investigation for the most promising candidates identified here, rather than a requirement for the current screening study. We have added text explicitly framing our findings as establishing genetic associations that warrant future tissue-autonomous investigation.

      Recommendations for the authors: 

      (1) Analysis: In Figure 6, the authors state that foxp1b mutants "fail to undergo further hypertrophic remodeling in response to a high-fat diet (HFD)." Foxp1b mutant juveniles are already hypertrophic before the high-fat diet. After a high-fat diet, these mutants reach mean lipid droplet diameters similar to WT, approximately 65 µm, which the authors state earlier in the manuscript are "a potential upper limit of LD growth at this developmental stage." The authors should perform additional analysis of their existing data. Specifically, determine lipid droplet size by binning the AP axis as shown in Figure 3. The rationale is that lipid droplet size differences in response to HFD may be more evident when not considering the anterior populations of lipid droplets that have already reached maximum steady state size for this juvenile stage. This would not require any new experiments, just reanalyzing data similar to how they did in Figure 3.

      We thank the reviewer for this excellent suggestion. We have performed the requested spatial analysis by binning the AP axis into 200 µm strata (Figure 3 approach). These data can be found in new Fig. 6H-M, and new Supplemental Figs 8 & 9. This new analysis verifies our initial conclusions, and also reveals several very interesting spatiotemporal dynamics

      (i) Baseline hypertrophy in foxp1b mutants across AP strata

      In support of our initial conclusion that foxp1b mutants have larger LDs at baseline, the spatial analysis confirms that on a control diet (baseline), foxp1b mutants have significantly larger LDs than WT across strata 1-5 (new Fig. 6I), ranging from +22.2 µm larger in strata 1 to +17.8 µm larger in strata 5 (all FDR-adjusted p < 0.05, linear mixed effects model). Extended analysis across all 15 strata is shown in Supplemental Figs. 8 & 9. By contrast, and also in support of our initial conclusion, foxp1a mutants showed no baseline hypertrophy on control diet (all strata p > 0.10, Supplemental Fig. 8).

      (ii) foxp1b mutants show a profoundly blunted hypertrophic response to HFD

      Using paired analysis (same fish on both control diet and after 14 days of high-fat diet) with a linear mixed effects model, we quantified the effect of HFD across all strata:

      (A) Anterior/oldest strata (1-6): WT + HFD increases LD diameter by +25.1-28.1 µm (+52-58%, p < 0.0001). Whereas, foxp1b mutants + HFD only increase LD diameter by +7.5-11.7 µm (+12-19%, p < 0.003). Therefore, in the oldest/most anterior regions, containing the largest LDs, the hypertrophic response of foxp1b mutants to HFD is ~57% weaker than WTs.

      (B) Posterior/newer strata (7-15): WT + HFD undergo significant increases in LD diameter of +17.7-23.7 µm (p < 0.024). However, in foxp1b mutants there is no significant hypertrophic response at all (p > 0.068), and hypertrophic effect sizes decline from +6.8 µm (stratum 7) to +0.4 µm (stratum 15).

      (C) Overall effect: Averaged across all strata, WT + HFD LDs show +24.4 µm increase (p < 0.0001), whereas foxp1b mutant LDs only show a +7.7 µm increase with HFD (p = 0.020). Therefore, foxp1b mutants show a 68% reduction in hypertrophic growth in response to HFD compared to WT (Fig. 6K).

      The consequence of these spatial dynamics is that WT SAT LDs - which start 22 µm smaller than foxp1b mutants on a control diet - undergo massive hypertrophy across all regions/strata in response to a HFD. Meanwhile, foxp1b mutants - starting larger than in WTs - show only a modest, spatially restricted response. This results in a convergence in LD size in early/anterior strata, but WT LDs actually surpass foxp1b mutant sizes in late/posterior strata (strata 14-15: +WT 14.7 µm larger on HFD, p = 0.028; Supplemental Figs. 8 & 9).

      By contrast, foxp1a mutants retain the capacity for HFD-induced hypertrophy but show a ~35% weaker response than WT (p = 0.023) – significantly less severe than the 68% reduction in foxp1b mutants. Interestingly, foxp1a mutants after HFD show a reduction in the AP gradation of LD size observed in WT and foxp1b mutants (uniform +14.4 mm across all strata versus WT range of +26.4 mm anteriorly to +16.6 mm posteriorly), suggesting that foxp1a may regulate spatial heterogeneity in adaptive responses to HFD (Fig. 6L-M).

      (iii) Developmental ceiling or impaired adaptive capacity?

      The reviewer raises an important question about whether anterior adipose LDs have reached a "developmental ceiling." After conducting the spatial analysis suggested by the Reviewer, we now believe several lines of evidence support an intrinsic defect in HFD-induced hypertrophy in foxp1b mutants, rather than reaching a developmentally determined limit:

      First, foxp1b mutants show reduced responses across ALL strata, not just anterior regions. The attenuation extends throughout the entire AP axis (57% reduction in strata 1-6, complete loss of response in strata 7-15). If anterior adipocytes had simply reached a size ceiling, we would expect normal responses in posterior regions where cells are smaller - but we don't observe this.

      Second, in posterior/newer regions of SAT (strata 14-15) the hypertrophic response to HFD in foxp1b is so limited that WT LDs actually become larger than foxp1b mutant LDs (+14.7 mm larger, p = 0.028; Supplemental Fig. 9). This demonstrates that these LD sizes are not developmentally limiting and argues for intrinsic hypertrophic defects in response to HFD.

      Third, foxp1a mutants provide an important control. These mutants show no baseline hypertrophy (all strata p > 0.10) yet still exhibit blunted hypertrophic responses to HFD (~35% reduction, p = 0.023), proving that reduced HFD responses can occur independently of baseline hypertrophy.

      We have updated the Results and Discussion to reflect these new conclusions. Methods have been updated to include the spatial analysis approach.

      (2) Adipose morphogenesis in WT is a function of standard length, as shown by the authors. At juvenile stages, foxp1 mutants are both smaller and have reduced adipocyte coverage, while adults show normal body length and very subtle adipose phenotypes. Can the authors demonstrate that the observed defects in foxp1 mutant juveniles are bona fide phenotypes rather than a developmental delay?

      We thank the reviewer for this key point. We agree it is critical to distinguish true foxp1b-dependent phenotypes from potential developmental delay. Importantly, our data strongly argue against a simple developmental delay. We show that LD size scales with body size in Fig. 3G, with smaller zebrafish having smaller LDs and larger zebrafish having larger LDs. In contrast to a developmental delay, our data show that foxp1b single and foxp1a;foxp1b double mutants are smaller (reduced standard length) but have larger LDs (Fig. 6E,G). This dissociation between body size and LD size is the opposite of what would be expected from developmental delay.

      To account for the body size difference, we have now normalised adipose area to standard length (Fig. 6F). With this normalisation, foxp1b single mutants show only a non-significant trend toward decreased adiposity, whereas foxp1a;foxp1b double mutants remain significantly reduced. This represents a change from our original analysis and we have updated the text accordingly. Critically, despite normalised adipose area showing only a trend in foxp1b singles, the hypertrophic LD morphology remains highly significant (Fig. 6G), demonstrating that the morphological phenotype is robust and independent of overall body size.

      We have clarified this interpretation in the Results and Discussion.

      (3) What was the rationale for selecting one amongst paralogous genes for the screen? For example, why did the authors choose ptenb rather than ptena?

      (4) Point 3 is particularly relevant for the final six genes that resulted in adipose phenotypes. Why did the authors choose not to target both paralogs, given that multi-plexed F0 CRISPR targeting is feasible in zebrafish (PMID: 29974860).

      We answer Points 3 & 4 together here.

      We used the DIOPT (DRSC Integrative Ortholog Prediction Tool) orthology tool to identify the zebrafish paralogue with the highest orthology score to each human gene. This tool integrates predictions from 20 orthology databases to generate a composite score. We selected the paralogue with the highest DIOPT score for each gene. For example, we selected ptenb over ptena because it showed a higher predicted orthology to human PTEN.

      We acknowledge this approach has important limitations, including orthology scores not necessarily predicting functional equivalence (ie, the "most orthologous" paralogue may not be the one with the most relevant adipose tissue function in zebrafish). We acknowledge that this may mean we have missed genuine hits - testing only one paralogue means we could fail to identify genes where the "less orthologous" paralogue has the relevant adipose function.

      Our findings with Foxp1 paralogues both validate this approach and reveal its limitations. The higher-scoring paralogue foxp1b (DIOPT score = 13/19) showed the more severe phenotype, validating our prioritisation. However, the lower-scoring paralogue foxp1a (DIOPT score = 5/19), which we tested subsequently, showed a distinct but significant phenotype (altered spatial patterning) – a finding that would have been missed had we not pursued secondary validation.

      For future screens where comprehensive hit identification is the goal, multiplexed targeting of all paralogues would be valuable, though this may complicate interpretation of paralogue-specific phenotypes. We have discussed this in the Discussion.

      (5) General framework and limitations: The analysis platform presented in the manuscript cannot separate the developmental effects from adipose tissue plasticity/remodeling. Potential approaches that may help address this concern include: (a) establishing a baseline model to illustrate how WT fish respond to high-fat diet (HFD); (b) showing how mutants with hyperplasticity (opposite effects of foxp1 mutants) respond to HFD; (c) examining whether foxp1 gene expression level changes in response to HFD. However, these approaches (especially a and b) would require extensive experimental work and may be beyond the scope of this study. Without further evidence or data support of adipose tissue plasticity and remodeling, the author may want to emphasize in the background and discussion sections how adipose tissue development may affect plasticity and adaptation, and soften the tone of how genes may directly regulate adipose tissue plasticity and adaptation.

      We thank the reviewer for this comment about the relationship between adipose development and plasticity/remodelling. We agree this is an important issue as we are looking in juvenile fish that are still growing. Therefore, when we feed them HFD and see LDs get bigger – is this diet-induced remodelling or just accelerated normal development (ie, growth that would happen anyway, but occurring faster due to more nutrients)?

      To address the reviewer's specific suggestions:

      (A) Baseline model of WT HFD response: We have now performed detailed spatial analysis of WT responses to HFD (new Fig 6H-M, Supplemental Figs. 8 & 9). This analysis establishes a comprehensive baseline for hypertrophic responses to HFD in developing adipose tissue. In summary, WT fish show robust, statistically significant and spatially-graded hypertrophic responses to HFD across the entire AP axis, with responses ranging from +28.1 mm anteriorly to +17.7 mm posteriorly.

      We agree with the Reviewer that separating developmental from adaptive processes in growing juvenile fish is challenging. Importantly, we believe foxp1a mutants provide compelling genetic evidence that we are studying adaptive responses rather than purely developmental processes. foxp1a mutants have normal baseline LD sizes on control diet (demonstrating foxp1a is not required for developmental adipose expansion), yet when challenged with HFD show significantly reduced hypertrophic expansion and reduction of spatial gradient. This genetic dissociation strongly argues we are observing adaptive capacity rather than developmental growth rate.

      (B) Hyperplastic mutants:

      We agree that analysis of hyperplastic mutants would provide valuable complementary information about tissue remodelling capacity. However, as the reviewer anticipated, this would require: (1) generating stable lines of the appropriate hyperplastic mutants, (2) conducting paired HFD feeding studies, (3) performing spatial morphometric analysis comparable to our foxp1 studies, and (4) potentially distinguishing hyperplastic vs hypertrophic contributions to expansion. We agree this constitutes substantial additional experimental work beyond the scope of the current manuscript, though it represents an important direction for future studies.

      (C) foxp1 expression changes in HFD:

      Unfortunately, we do not have SAT samples from HFD-treated fish preserved for RNA analysis, and therefore cannot assess whether foxp1 expression levels change in response to dietary challenge. This would be valuable for future studies to determine whether foxp1 genes are dynamically regulated during metabolic adaptation or function as constitutive regulators of adaptive capacity.

      Following the Reviewer's guidance, we have revised throughout the manuscript to more carefully distinguish developmental patterning from metabolic adaptation.

      (6) Title: In the absence of experimental results that can distinguish between developmental effects from adipose tissue plasticity/remodeling, such as those mentioned above, the manuscript title is not accurate and should therefore be revised to be something like "hyperplastic and hypertrophic adipose morphology."

      We have now altered the title as the Reviewer suggested to “A quantitative in vivo CRISPR-imaging platform identifies regulators of hyperplastic and hypertrophic adipose morphology in zebrafish”

      Minor:

      (7) In mice studies, deleting foxp1b in adipose tissue protects mice from diet-induced obesity, while overexpressing foxp1b in adipose tissue promotes diet-induced obesity (Liu et al., Nature Communication, 2019). These overall phenotypes and foxp1b-mediated effects appear to be contradictory to what is observed in the zebrafish model. Can the authors also provide more evidence/discussion on why such a difference occurs comparing zebrafish and mice models?

      We thank the reviewer for this important comparison. We believe the apparent contradictions reflect (1) differences in adipose tissue thermogenic capacity - between species possibly, but also between functionally distinct depots and (2) whole-organism versus tissue-specific experimental approaches.

      (1) Different adipose tissue biology: browning-prone vs browning-resistant adipose

      Liu et al. (2019, PMID: 31699980) demonstrated that adipose-specific deletion of Foxp1 in mice increases thermogenesis and browning of SAT, with protection from diet-induced obesity (DIO) and improved insulin sensitivity. Conversely, Foxp1 overexpression impaired adaptive thermogenesis and promoted DIO. Mechanistically, Foxp1 directly represses β3-adrenergic receptor transcription, thereby inhibiting the thermogenic program. Strikingly, mouse Foxp1-deleted adipocytes displayed smaller, multilocular lipid droplets characteristic of brown/beige adipocytes.

      These morphological outcomes initially appear opposite to our zebrafish findings: mouse Foxp1 mutants have smaller adipocytes (due to browning), while zebrafish foxp1b mutants have larger lipid droplets (hypertrophy). We believe this fundamental difference may reflect the propensity of adipose tissue to undergo adaptive thermogenesis.

      While it was recently discovered that zebrafish possess thermogenic epicardial adipose tissue (PMID: 38507414), in general zebrafish adipose is not considered thermogenic, and zebrafish as ectotherms are thought to lack adaptive thermogenesis for thermoregulation. The exact thermogenic potential of zebrafish adipose remains to be fully characterised, but potential differences in thermogenic capacity between mouse and zebrafish adipose may help explain the distinct phenotypic outcomes.

      Importantly, Liu et al. studied mouse inguinal subcutaneous WAT - the depot most prone to browning in rodents. It remains unclear what role Foxp1 plays in browning-resistant mammalian WAT depots, where thermogenic conversion does not readily occur. In such depots, Foxp1 loss might produce phenotypes more similar to our zebrafish findings - dysregulated white adipose function without browning.

      The above hypothesis suggest that browning responses may mask other roles for Foxp1 in WAT. Interestingly, although not quantified in the paper, Liu et al.’s Foxp1 overexpression model (Ap2-Foxp1) appeared to reduce adipocyte size despite suppressing Ucp1 expression and reducing lipolysis. These data suggest more complex roles and indicate that Foxp1’s control of adipocyte size might extend beyond simply regulating thermogenesis and may involve coordinating the balance between hyperplastic versus hypertrophic expansion.

      Furthermore, human subcutaneous WAT is not as prone to browning as mouse inguinal WAT. Human browning occurs primarily in specialised depots (e.g. supraclavicular, deep neck), while the majority of human adipose tissue represents constitutive white adipose with limited thermogenic capacity. Therefore, it remains an open question whether FOXP1's primary physiological role in humans relates to thermogenesis regulation (in specialised depots) or white adipose metabolic control (in the majority of adipose tissue). Zebrafish findings examining constitutive WAT function (admittedly the lack of adaptive thermogenesis in zebrafish is presumed at this stage) may be more relevant to human adipose than initially appear.

      (2) Whole-organism vs tissue-specific effects on metabolic health

      A second apparent contradiction concerns metabolic outcomes: mouse adipose-specific Foxp1 deletion improves metabolic health (Liu et al.), whereas our zebrafish whole-organism foxp1b mutants display metabolic dysfunction (baseline hypertrophy, impaired HFD response, hyperglycaemia and fatty liver). We believe this discrepancy reflects comparison of whole-animal mutants (zebrafish) to tissue-specific deletions (mouse), rather than opposite adipose tissue functions.

      Critically, Foxp1 has established roles in hepatic glucose metabolism. Zou et al. (PMID: 26504089) demonstrated that hepatic Foxp1 inhibits expression of gluconeogenesis genes and decreases hepatic glucose production and fasting blood glucose by competing with Foxo1 for binding of insulin responsive gluconeogenic genes. In line with these observations, we observe fatty liver and hyperglycaemia in foxp1a;foxp1b double mutant zebrafish (data not shown), suggesting that the metabolic dysfunction in our whole-animal mutants may be driven primarily by hepatic Foxp1 loss rather than adipose-specific effects.

      We have expanded on the points raised here in the Discussion.

      (8) Line 522-524: "The major phenotype in foxp1a mutants was impaired adipose expansion following HFD, suggesting failure to respond to diet-induced stress signals". In the presented Figure 6j, foxp1a mutant expands adipose LD size following HFD, similar to the control, which is contradictory to the statement above. Please clarify.

      We thank the reviewer for highlighting this apparent inconsistency and apologise for imprecise wording. These measurements are actually consistent but refer to different scales of analysis.

      Tissue level (Supplementary Fig. 7): foxp1a mutants show significantly reduced total adipose expansion (based on whole-animal Nile Red images) compared to wild-type fish on HFD—this is what we refer to as "impaired adipose expansion."

      Cellular level (Fig. 6L-M): At the individual adipocyte level, foxp1a mutants show statistically significant increases in LD diameter following HFD. However, the magnitude is reduced by ~35% compared to wild-type (mutants: +14.4 µm; WT: +22.2 µm; p = 0.023).

      We have revised the text to more precisely state "reduced adipose expansion" rather than "impaired expansion" to avoid implying complete failure to respond.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Mutations in CDHR1, the human gene encoding an atypical cadherin-related protein expressed in photoreceptors, are thought to cause cone-rod dystrophy (CRD). However, the pathogenesis leading to this disease is unknown. Previous work has led to the hypothesis that CDHR1 is part of a cadherin-based junction that facilitates the development of new membranous discs at the base of the photoreceptor outer segments, without which photoreceptors malfunction and ultimately degenerate. CDHR1 is hypothesized to bind to a transmembrane partner to accomplish this function, but the putative partner protein has yet to be identified.

      The manuscript by Patel et al.makes an important contribution toward improving our understanding of the cellular and molecular basis of CDHR1-associated CRD. Using gene editing, they generate a loss of function mutation in the zebrafish cdhr1a gene, an ortholog of human CDHR1, and show that this novel mutant model has a retinal dystrophy phenotype, specifically related to defective growth and organization of photoreceptor outer segments (OS) and calyceal processes (CP). This phenotype seems to be progressive with age. Importantly, Patel et al, present intriguing evidence that pcdh15b, also known for causing retinal dystrophy in previous Xenopus and zebrafish loss of function studies, is the putative cdhr1a partner protein mediating the function of the junctional complex that regulates photoreceptor OS growth and stability.

      This research is significant in that it:

      (1) Provides evidence for a progressive, dystrophic photoreceptor phenotype in the cdhr1a mutant and, therefore, effectively models human CRD; and

      (2) Identifies pcdh15b as the putative, and long sought after, binding partner for cdhr1a, further supporting the theory of a cadherin-based junction complex that facilitates OS disc biogenesis.

      Nonetheless, the study has several shortcomings in methodology, analysis, and conceptual insight, which limits its overall impact.

      Below I outline several issues that the authors should address to strengthen their findings.

      Major comments:

      (1) Co-localization of cdhr1a and pcdh15b proteins

      The model proposed by the authors is that the interaction of cdhr1a and pcdh15b occurs in trans as a heterodimer. In cochlear hair cells, PCDH15 and CDHR23 are proposed to interact first as dimers in cis and then as heteromeric complexes in trans. This was not shown here for cdhr1a and pcdh15b, but it is a plausible configuration, as are single heteromeric dimers or homodimers. Regardless, this model depends on the differential compartmental expression of the cdhr1a and pcdh15b proteins. Data in Figure 1 show convincing evidence that these two proteins can, at least in some cases, be distributed along the length of photoreceptor membranes that are juxtaposed, as would be the case for OS and CP. If pcdh15b is predominantly expressed in CPs, whereas cdhr1a is predominantly expressed in OS, then this should be confirmed with actin double labeling with cdhr1a and pcdh15b since the apicobasal oriented (vertical) CPs would express actin in this same orientation but not in the OS. This would help to clarify whether cdhr1a and pcdh15b can be trafficked to both OS and CP compartments or whether they are mutually exclusive.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we are completed imaging of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections (Fig 1C-H). Additionally, we have recently established an immuno-gold-TEM protocol and showcase co-labeling of cdhr1a and pcdh15b at TEM resolution along the CP (Fig 1I).

      Photoreceptor heterogeneity goes beyond the cone versus rod subtypes discussed here and it is known that in zebrafish, CP morphology is distinct in different cone subtypes as well as cone versus rod. It would be important to know which specific photoreceptor subtypes are shown in zebrafish (Figures 1A-C) and the non-fish species depicted in Figures 1E-L. Also, a larger field of view of the staining patterns for Figures 1E-L would be a helpful comparison (could be added as a supplementary figure).

      The revised manuscript includes labels for the location of different cone subtypes in figure 1. All of the images showcasing CHDR1 localization across species concentrate on the PNA positive R/G cones. Larger fields of view were not collected as we prioritized the highest resolution possible and therefore collected small fields of view.

      (2) Cdhr1a function in cell culture

      The authors should explain the multiple bands in the anti-FLAG blots. Also, it would be interesting to confirm that the cdhr1a D173 mutant prevents the IP interaction with pcdh15b as well as the additive effects in aggregate assays of Figure 2.

      The multiple bands on the WB is like our previous results (Piedade 2020), which we believe arise due to ubiquitination and proteolytic cleavage of cdhr1a. We expect the D173 mutation to result in a complete absence of cdhr1a polypeptide, based on the lack of in situ signal in our WISH studies.

      Is it possible that the cultured cells undergo proliferation in the aggregation assays shown in Figure 2? Cells might differentially proliferate as clusters form in rotating cultures. A simple assay for cell proliferation under the different transfection conditions showing no differences would address this issue and lend further support to the proposed specific changes to cell adhesion as a readout of this assay.

      This is a possibility; however we did not use rotating cultures, this was a monolayer culture. We did not observe any differences in total cell number between the differing transfections. As such, we do not feel proliferation explains the aggregation of K562 cells.

      Also, the authors report that the number of clusters was normalized to the field of view, but this was not defined. Were the n values different fields of view from one transfection experiment, or were they different fields of view from separate transfection experiments? More details and clarification are needed.

      This will be clarified in the revised manuscript, in short we replicated this experiment 3 times, quantifying 5 different fields of view in each replicate.

      (3) Methodological issues in quantification and statistical analyses

      Were all the OS and CP lengths counted in the observation region or just a sample within the region? If the latter, what were the sampling criteria? For CPs, it seems that the length was an average estimate based on all CPs observed surrounding one cone or one-rod cell. Is this correct? Again, if sampled, how was this implemented? In Fig 4M', the cdhr1a-/- ROS mostly looks curvilinear. Did the measurements account for this, or were they straight linear dimension measurements from base to tip of the OS as depicted in Fig 5A-E? A clearer explanation of the OS and CP length quantification methodology is required.

      The revised manuscript will clearly outline measurement methods. In short, we measured every CP/OS in the imaged regions. We did not average CPs/cell, we simply included all CP measurements in our analysis. All our CP measurements (actin or cdhr1a or pcdh15), were measured in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements (landmark) and association with proper cell type. Our new figure 7 now includes cone OS counter staining to better highlight the OS.

      All measurements were taken as best as possible to reflect a straight linear dimension for consistency.

      How were cone and rod photoreceptor cell counts performed? The legend in Figure 4 states that they again counted cells in the observation region, but no details were provided. For example, were cones and rods counted as an absolute number of cells in the observation region (e.g., number of cones per defined area) or relative to total (DAPI+) cell nuclei in the region? Changes in cell density in the mutant (smaller eye or thinner ONL) might affect this quantification so it would be important to know how cell quantification was normalized.

      The revised manuscript will clearly outline measurement methods. In short, rod and cone cell counts were based on the number of outer segments that were observed in the imaging region and previously measured for length. We did not observe any eye size differences in our mutant fish.

      In Figure 6I, K, measuring the length of the signal seems problematic. The dimension of staining is not always in the apicobasal (vertical) orientation. It might be more accurate to measure the cdhr1a expression domain relative to the OS (since the length of the OS is already reduced in the mutants). Another possible approach could be to measure the intensity of cdhr1 staining relative to the intensity within a Prph2 expression domain in each group. The authors should provide complementary evidence to support their conclusion.

      The revised manuscript will clearly outline measurement methods. In short, all of our CP measurements (actin or cdhr1a or pcdh15), were done in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements and association with proper cell type.

      A better description of the statistical methodology is required. For example, the authors state that "each of the data points has an n of 5+ individuals." This is confusing and could indicate that in Figure 4F alone there were ~5000 individuals assayed (~100 data points per treatment group x n=5 individuals per data point x 10 treatment groups). I don't think that is what the authors intended. It would be clearer if the authors stated how many OS, CP, or cells were counted in their observation region averaged per individual and then provided the n value of individuals used per treatment group (controls and mutants), on which the statistical analyses should be based.

      This has been addressed in the revised manuscript. In short, we had an n=5 (individual fish) analyzed for each genotype/time point.

      There are hundreds of data points in the separate treatment groups shown in several of the graphs. It would not be correct to perform the ANOVA on the separate OS or CP length measurements alone as this will bias the estimates since they are not all independent samples. For example, in Figure 6H, 5dpf pcdh15b+/- have shorter CPs compared to WT but pcdh15b-/- have longer compared to WT. This could be an artifact of the analysis. Moreover, the authors should clarify in the Methods section which ANOVA post hoc tests were used to control for multiple pairwise comparisons.

      We have re-analyzed the data using multiple pairwise comparison ANOVA with post hoc tests (Tukey test). This new analysis did not significantly alter the statistical significance outcome of the study.

      (4) Cdhr1a function in photoreceptors

      The Cdhr1a IHC staining in 5dpf WT larvae in Figure 3E appears different from the cdhr1a IHC staining in 5dpf WT larvae in Figure 1A or Figure 6I. Perhaps this is just the choice of image. Can the authors comment or provide a more representative image?

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we have included an image that better represents cdhr1a staining in the WT and mutant.

      The authors show that pcdh15b localization after 5dpf mirrored the disorganization of the CP observed with actin staining. They also show in Figure 5O that at 180dpf, very little pcdh15b signal remains. They suggest based on this data that total degradation of CPs has occurred in the cdhr1a-/- photoreceptors by this time. However, although reduced in length, COS and cone CPs are still present at 180dpf (Figure 5E, E'). Thus, contrary to the authors' general conclusion, it is possible that the localization, trafficking, and/or turnover of pcdh15b is maintained through a cdhr1a-dependent mechanism, irrespective of the degree to which CPs are maintained. The experiments presented here do not clearly distinguish between a requirement for maintenance of localization versus a secondary loss of localization due to defective CPs.

      We agree, this point has been addressed in our revised manuscript. Additionally, we have also included data from 1 and 2 year old samples.

      (5) Conceptual insights

      The authors claim that cdhr1a and pcdh15b double mutants have synergistic OS and CP phenotypes. I think this interpretation should be revisited.

      First, assuming the model of cdhr1a-pcdh15b interaction in trans is correct, the authors have not adequately explained the logic of why disrupting one side of this interaction in a single mutant would not give the same severity of phenotype as disrupting both sides of this interaction in a double mutant.

      Second, and perhaps more critically, at 10dpf the OS and CP lengths in cdhr1a-/- mutants (Figure 7J, T) are significantly increased compared to WT. In contrast, there are no significant differences in these measurements in the pcdh15b-/- mutants. Yet in double homozygous mutants, there is a significant reduction of ~50% in these measurements compared to WT. A synergistic phenotype would imply that each mutant causes a change in the same direction and that the magnitude of this change is beyond additive in the double mutants (but still in the same direction). Instead, I would argue that the data presented in Figure 7 suggest that there might be a functionally antagonistic interaction between cdhr1a and pcdh15b with respect to OS and CP growth at 10dpf.

      If these proteins physically interacted in vivo, it would appear that the interaction is complex and that this interaction underlies both OS growth-promoting and growth-restraining (stabilizing) mechanisms working in concert. Perhaps separate homodimers or heterodimers subserve distinct CP-OS functional interactions. This might explain the age-dependent differences in mutant CP and OS length phenotypes if these mechanisms are temporally dynamic or exhibit distinct OS growth versus maintenance phases. Regardless of my speculations, the model presented by the authors appears to be too simplistic to explain the data.

      We agree with the reviewer, as such we have revised the discussion in our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The goal of this study was to develop a model for CDHR1-based Con-rod dystrophy and study the role of this cadherin in cone photoreceptors. Using genetic manipulation, a cell binding assay, and high-resolution microscopy the authors find that like rods, cones localize CDHR1 to the lateral edge of outer segment (OS) discs and closely oppose PCDH15b which is known to localize to calyceal processes (CPs). Ectopic expression of CDHR1 and PCDH15b in K652 cells indicates these cadherins promote cell aggregation as heterophilic interactants, but not through homophilic binding. This data suggests a model where CDHR1 and PCDH15b link OS and CPs and potentially stabilize cone photoreceptor structure. Mutation analysis of each cadherin results in cone structural defects at late larval stages. While pcdh15b homozygous mutants are lethal, cdhr1 mutants are viable and subsequently show photoreceptor degeneration by 3-6 months.

      Strengths:

      A major strength of this research is the development of an animal model to study the cone-specific phenotypes associated with CDHR1-based CRD. The data supporting CDHR1 (OS) and PCDH15 (CP) binding is also a strength, although this interaction could be better characterized in future studies. The quality of the high-resolution imaging (at the light and EM levels) is outstanding. In general, the results support the conclusions of the authors.

      Weaknesses:

      While the cellular phenotyping is strong, the functional consequences of CDHR1 disruption are not addressed. While this is not the focus of the investigation, such analysis would raise the impact of the study overall. This is particularly important given some of the small changes observed in OS and CP structure. While statistically significant, are the subtle changes biologically significant? Examples include cone OS length (Figures 4F, 6E) as well as other morphometric data (Figure 7I in particular). Related, for quantitative data and analysis throughout the manuscript, more information regarding the number of fish/eyes analyzed as well as cells per sample would provide confidence in the rigor. The authors should also note whether the analysis was done in an automated and/or masked manner.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      The revised manuscript outlines both methods and statistics used for quantitation of our data. (please see comments from reviewer 1). While we do not include direct evidence of the mechanism of CDHR1 function, we do propose that its role is important in anchoring the CP and the OS, particularly in the cones, while in rods it may serve to regulate the release of newly formed disks (as previously proposed in mice). We do plan to test both of these hypothesis directly, however, that will be the basis of our future studies.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Patel et al investigates the hypothesis that CDHR1a on photoreceptor outer segments is the binding partner for PCDH15 on the calyceal processes, and the absence of either adhesion molecule results in separation between the two structures, eventually leading to degeneration. PCDH15 mutations cause Usher syndrome, a disease of combined hearing and vision loss. In the ear, PCDH15 binds CDH23 to form tip links between stereocilia. The vision loss is less understood. Previous work suggested PCDH15 is localized to the calyceal processes, but the expression of CDH23 is inconsistent between species. Patel et al suggest that CDHR1a (formerly PCDH21) fulfills the role of CDH23 in the retina.

      The experiments are mainly performed using the zebrafish model system. Expression of Pcdh15b and Cdhr1a protein is shown in the photoreceptor layer through standard confocal and structured illumination microscopy. The two proteins co-IP and can induce aggregation in vitro. Loss of either Cdhr1a or Pcdh15, or both, results in degeneration of photoreceptor outer segments over time, with cones affected primarily.

      The idea of the study is logical given the photoreceptor diseases caused by mutations in either gene, the comparisons to stereocilia tip links, and the protein localization near the outer segments. The work here demonstrates that the two proteins interact in vitro and are both required for ongoing outer segment maintenance. The major novelty of this paper would be the demonstration that Pcdh15 localized to calyceal processes interacts with Cdhr1a on the outer segment, thereby connecting the two structures. Unfortunately, the data presented are inadequate proof of this model.

      Strengths:

      The in vitro data to support the ability of Pcdh15b and Cdhr1a to bind is well done. The use of pcdh15b and cdhr1a single and double mutants is also a strength of the study, especially being that this would be the first characterization of a zebrafish cdhr1a mutant.

      Weaknesses:

      (1) The imaging data in Figure 1 is insufficient to show the specific localization of Pcdh15 to calyceal processes or Cdhr1a to the outer segment membrane. The addition of actin co-labelling with Pcdh15/Cdhr1a would be a good start, as would axial sections. The division into rod and cone-specific imaging panels is confusing because the two cell types are in close physical proximity at 5 dpf, but the cone Cdhr1a expression is somehow missing in the rod images. The SIM data appear to be disrupted by chromatic aberration but also have no context. In the zebrafish image, the lines of Pcdh15/Cdhr1a expression would be 40-50 um in length if the scale bar is correct, which is much longer than the outer segments at this stage and therefore hard to explain.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we have added images of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections. Additionally, we have established an immuno-gold-TEM protocol and provide data showcasing co-labeling of cdhr1a and pcdh15b at TEM resolution.

      (2) Figure 3E staining of Cdhr1a looks very different from the staining in Figure 1. It is unclear what the authors are proposing as to the localization of Cdhr1a. In the lab's previous paper, they describe Cdhr1a as being associated with the connecting cilium and nascent OS discs, and fail to address how that reconciles with the new model of mediating CP-OS interaction. And whether Cdhr1a localizes to discrete domains on the disc edges, where it interacts with Pcdh15 on individual calyceal processes.

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we include an image that better represents cdhr1a staining in the WT and mutant.

      (3) The authors state "In PRCs, Pcdh15 has been unequivocally shown to be localized in the CPs". However, the immunostaining here does not match the pattern seen in the Miles et al 2021 paper, which used a different antibody. Both showed loss of staining in pcdh15b mutants so unclear how to reconcile the two patterns.

      We agree that our staining appears different, but we attribute this to our antigen retrieval protocol which differed from the Miles et al paper. We also point to the fact that pcdh15b localization has been shown to be similar to our images in other species (monkey and frog). As such, we believe our protocol reveals the proper localization pattern which might be lost/hampered in the procedure used in Miles et al 2021.

      (4) The explanation for the CRISPR targets for cdhr1a and the diagram in Figure 3 does not fit with crRNA sequences or the mutation as shown. The mutation spans from the latter part of exon 5 to the initial portion of exon 6, removing intron 5-6. It should nevertheless be a frameshift mutation but requires proper documentation.

      This was an overlooked error in figure making, we have corrected this typo in the revised manuscript.

      (5) There are complications with the quantification of data. First, the number of fish analyzed for each experiment is not provided, nor is the justification for performing statistics on individual cell measurements rather than using averages for individual fish. Second, all cone subtypes are lumped together for analysis despite their variable sizes. Third, t-tests are inappropriately used for post-hoc analysis of ANOVA calculations.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript.

      (6) Unclear how calyceal process length is being measured. The cone measurements are shown as starting at the external limiting membrane, which is not equivalent to the origin of calyceal processes, and it is uncertain what defines the apical limit given the multiple subtypes of cones. In Figure 5, the lines demonstrating the measurements seem inconsistently placed.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript. We have also clarified that CP measurements were made based on a counterstain for the cone/rod OS so that the actin signal was only CP associated. We have included the counter stain in our revised Figure 7.

      (7) The number of fish analyzed by TEM and the prevalence of the phenotype across cells are not provided. A lower magnification view would provide context. Also, the authors should explain whether or not overgrowth of basal discs was observed, as seen previously in cdhr1-null frogs (Carr et al., 2021).

      The revised manuscript now includes the n number for our TEM samples. We have also added text comparing our results directly to Carr 2021.

      (8) The statement describing the separation between calyceal processes and the outer segment in the mutants is not backed up by the data. TEM or co-labelling of the structures in SIM could be done to provide evidence.

      We have completed both more SIM as well as immuno-gold TEM to support our conclusions, see new Figure 1.

      (9) "Based on work in the murine model and our own observations of rod CPs, we hypothesize that zebrafish rod CPs only extend along the newly forming OS discs and do not provide structural support to the ROS." Unclear how murine work would support that conclusion given the lack of CPs in mice, or what data in the manuscript supports this conclusion.

      In the revised manuscript we have adjusted our discussion to hypothesize that the small length of rod CPs is most likely to represent their interaction with newly forming discs rather than connect with mature discs which are enclosed in the OS.

      (10) The authors state "from the fact that rod CPs are inherently much smaller than cone CPs" without providing a reference. In the manuscript, the measurements do show rod CPs to be shorter, but there are errors in the cone measurements, and it is possible that the RPE pigment is interfering with the rod measurements.

      We have included references where rod CPs have been found to be shorter. We have no doubt that in zebrafish the rod CPs are significantly shorter. All our CP measurements are done with a counter stain for rods and cones to be sure that we are measuring the correct cell type.

      (11) The discussion should include a better comparison of the results with ocular phenotypes in previously generated pcdh15 and cdhr1 mutant animals.

      The revised manuscript has included these points.

      (12) The images in panels B-F of the Supplemental Figure are uncannily similar, possibly even of the same fish at different focal planes.

      We assure the reviewer that each of the images in supplemental figure 1 are distinct and represent different in situ experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the second sentence of the Introduction section, the acronym 'PRC' should be defined.

      This has been corrected

      (2) In the Discussion section, it would be useful to comment on differences between the published Xenopus cdhr1-/- OS phenotypes and the published zebrafish pcdh15b-/- OS phenotypes compared to the present zebrafish cdhr1a-/- phenotypes. In the published studies, OS in these mutants demonstrated dysmorphic and overgrown disc membranes compared to the relatively minor disc layering defects shown for cdhr1a-/- in the present study.

      This discussion has been added.

      (3) CDHR1 mutations in patients cause cone-rod dystrophy, but mutations in PCDH15 (Usher 1F) cause rod-cone dystrophy. In the Discussion section, the authors should comment on what might lead to these different phenotypic trajectories in humans in the context of their proposed model.

      We have added to our discussion highlighting that is not possible to assess rod-cone dystrophy in the pcdh15b model as the mutation is lethal by 15dpf, which is still before most rods mature.

      Reviewer #2 (Recommendations for the authors):

      In addition to defining the 'n' for animal and cell numbers (as well as methods of analysis - automated/masked), there are a few additional recommendations for the authors.

      (1) Expression of USH1 genes in larval zebrafish (Figure S1) is not very convincing. SC RNAseq data exists and argues against this cell type restriction.

      Based on extensive experience with WISH we are confident that our interpretation of the data are valid. Furthermore, analysis of the daniocell data base confirms that cdh23, ush1ga, ush1c (harmonin) and myo7aa all have either no expression in photoreceptors or very low levels especially compared to pcdh15b and cdhr1a.

      (2) The model in Figure 1 is great. The coloring was a bit confusing. Cdhr1 and axoneme are both in green, while Pcdh15 and actin are both in red. Can each have its own color?

      Changed pcdh15b color to blue

      (3) Figure 2A: Please explain the multiple bands in some lanes. What do the full blots look like?

      Full blots were uploaded to eLife and do not exhibit any additional bands. The multiple bands are likely due to ubiquitination or proteolytic cleavage of cdhr1a and have been documented in our previous publication (Piedade 2020).

      (4) Is "data not shown" permissible? (lack of compensation of cdh1b in cdh1a mutants) (nonsense-mediated decay of the mutant transcript).

      We have added a supplementary figure showcasing this data.

      (5) Figure 4: Is there a TEM phenotype in discs before 15dpf? One would think there would be...?

      Due to technical limitations, we have not been able to examine disc phenotypes prior to 15dpf.

      (6) Figure 5: How are calyceal processes discriminated from cortical/PM-associated actin? A bonafide calyceal marker seems to be needed. Espin or Myo3, for example.

      We discriminate to identify CPs as actin signal that originates at the base of the OS and travels along the OS. Pcdh15b is a bonafinde CP marker which we show overlaps with actin signal along CPs.

      (7) Figures 5A-J: How is actin staining for CPs discriminating between rod and cones??? Apical - basal level imaging? This could be better clarified.

      CP identification is based on co-stain for either rod or cone Oss

      (8) Figure 6: Het phenotype for pcdh15b+/- (cone OS length and CP length at 5 and 10 dpf) is surprising ... worth discussing. (Figures 6E, H).

      The discussion section has been updated to discuss this finding.

      (9) Last, the authors state "Data not shown" throughout the manuscript. I do not believe this is allowed for the journal.

      This data (cdhr1b expression in cdhr1a mutants as well as cdhr1a WISH in cdhr1a mutants) has been added as supplementary figures.

      Reviewer #3 (Recommendations for the authors):

      Major comments are addressed above and the most important is the need for a convincing demonstration of Cdhr1a localization on the outer segment and proximity to Pcdh15b. The SIM could be a powerful tool, but the images provided are impossible to assess without any basis for context. Could a membrane, Prph2, and/or actin label be added? And lower magnification views?

      Minor comments.

      (1) The mention of "short CPs" in rodents is not an accurate description. Particular rodents (e.g. mouse, rat) lack CPs altogether or have a single vestigial structure.

      We have adjusted the text to reflect this point.

      (2) Inconsistent spacing between numbers and units.

      We have corrected these inconsistencies

      (3) Missing references.

      We have added missing references

      (4) Indicate the mean or median for bar graphs.

      The materials and methods section now specifies that all of our graphs depict a mean value

      (5) Unclear how rods are distinguished from cones in the cone analysis if both are labeled with prph2 antibody.

      Rods are physiological separate from cones in zebrafish retina and therefore easily identified by location as well as their distinct pattern of actin staining.

      (6) Red and green should not be used together for microscopy images.

      (7) The diagram in Figure 1D is confusing because of the repeated use of red and green for disparate structures. Also, the location and structure of actin are misrepresented, as is the transition of disc structure during maturation in rods.

      We have adjusted the color of pcdh15b to blue.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors provide a simple yet elegant approach to identifying therapeutic targets that synergize to prevent therapeutic resistance using cell lines, data-independent acquisition proteomics, and bioinformatic analysis. The authors identify several combinations of pharmaceuticals that were able to overcome or prevent therapeutic resistance in culture models of ovarian cancer, a disease with an unmet diagnostic and therapeutic need.

      Strengths:

      The manuscript utilizes state-of-the-art proteomic analysis, entailing data-independent acquisition methods, an approach that maximizes the robustness of identified proteins across cell lines. The authors focus their analysis on several drugs under development for the treatment of ovarian cancer and utilize straightforward thresholds for identifying proteomic adaptations across several drugs on the OVSAHO cell line. The authors utilized three independent and complementary approaches to predicting drug synergy (NetBox, GSEA, and Manual Curation). The drug combination with the most robust synergy across multiple cell lines was the inhibition of MEK and CDK4/6 using PD-0325901+Palbociclib, respectively. Additional combinations, including PARPi (rucaparib) and the fatty acid synthase inhibitor (TVB-2640). Collectively, this study provides important insight and exemplifies a solid approach to identifying drug synergy without large drug library screens.

      Weaknesses:

      The manuscript supports their findings by describing the biological function(s) of targets using referenced literature. While this is valuable, the number of downstream targets for each initial target is extensive, thus, the current work does not attempt to elucidate the mechanism of their drug synergy. Responses to drugs are quantified 72 hours after treatment and exclusively focused on cell viability and protein expression levels. The discovery phase of experimentation was solely performed on the OVSAHO cell line. An additional cell line(s) would increase the impact of how the authors went about identifying synergistic targets using bioinformatics. Ovarian cancer is elusive to treatment as primary cancer will form spheroids within ascites/peritoneal fluids in a state of pseudo-senescence to overcome environmental stress. The current manuscript is executed in 2D culture, which has been demonstrated to deviate from 3D, PDX, and primary tumours in terms of therapeutic resistance (DOI: 10.3390/cancers13164208). Collectively, the manuscript is insufficient in providing additional mechanistic insight beyond the literature, and its interpretation of data is limited to 2D culture until further validated.

      We appreciate your positive remarks on the use of NetBox, GSEA, and human curation for predicting anti-resistance effects of second drugs. Regarding the weaknesses you identified:

      Mechanistic Insight: We agree that our current work interprets findings using prior published knowledge and does not attempt to infer detailed mechanisms of drug resistance of the nominated drug combinations. Our primary goal with this study was to establish a robust, unbiased proteomic and computational pipeline for proposing anti-resistance drug combinations, rather than to fully characterize the downstream molecular effects for each combination or to prove causation. To get closer to mechanistic insight, meaning detailed hypotheses of causative interactions, one would need to investigate anti-resistance effects in other pre-clinical materials as a crucial next step for the most promising combinations identified. This was out of scope for us. We assume the proposed combinations are useful for focussed follow-up in the community.

      Discovery Phase on a Single Cell Line: Our discovery phase was focused solely on the OVSAHO cell line due to its resemblance to surgical ovarian cancer samples. Including additional cell lines in the initial proteomic-response discovery phase plausibly would have enhanced the generalizability. But this was not done due to resource constraints. However, we did perform more extensive validation of the effect of drug combinations on proliferation in several cell lines to explore broader applicability.

      2D Culture Limitations: We are fully aware of the limitations of 2D cell culture models, especially in the context of ovarian cancer, where in clinical reality interactions with the microenvironment and other effects can have significant roles in therapeutic resistance. Adn we recognize that in lab experiments 2D culture does not fully recapitulate the complexities of 3D tumors, PDX models, or primary patient tumors. We have added citations to the relevant literature (including the reference you provided), and have emphasized in the Discussion that our findings serve as a strong foundation for future experimental tests (validation) in more physiologically relevant experimental model systems.

      Reviewer #2 (Public review):

      Summary:

      Franz and colleagues combined proteomics analysis of OVSAHO cell lines treated with 6 individual drugs. The quantitative proteomics data were then used for computational analysis to identify candidates/modules that could be used to predict combination treatments for specific drugs.

      Strengths:

      The authors present solid proteomics data and computational analysis to effectively repeat at the proteomics level analysis that have previously been done predominantly with transcriptional profiling. Since most drugs either target proteins and/or proteins are the functional units of cells, this makes intuitive sense.

      Weaknesses:

      Considering the available resources of the involved teams, performing the initial analysis in a single HGSC cell is certainly a weakness/limitation.

      The data also shows how challenging it is to correctly predict drug combinations. In Table 2 (if I read it correctly), the majority of the drug combinations predicted for the initial cell line OVSAHO did not result in the predicted effect. It also shows how variable the response was in the different HGSC cell lines used for the combination treatment. The success rate will most likely continue to drop as more sophisticated models are being used (i.e., PDX). Human patients are even more challenging.

      It would most likely be useful to more directly mention/discuss these caveats in the manuscript.

      Thank you for your summary and positive comments. Regarding the weaknesses you identified:

      Initial Analysis in a Single Cell Line: We concur with your assessment that performing the initial analysis in a single HGSC cell line (OVSAHO) is a limitation. As mentioned in our response to Reviewer #1, resource limitations caused this decision, and we acknowledge that a broader initial screen would have strengthened generalizability. We added this limitation in the discussion section, emphasizing use of diverse cell lines in the initial protein response profiling as an area for future work.

      Challenges in Predicting Drug Combinations and Variability: We thank the observation regarding the challenges in predicting the effect of drug combinations and the variability of antiproliferative effects observed in different HGSC cell lines (Table 2). As with any predictive method, our computational-experimental pipeline is not guaranteed to identify with absolute certainty additive or synergistic interactions, but generates data-informed hypotheses to be considered in the presence of other available observations. We now emphasize in the Discussion that while our computational pipeline provides plausible anti-resistance candidates, the precise results (extent of additivity or synergy) differ in different cell lines. This underscores that experimental validation across diverse physiological models, such as PDXs or organoids (not just additional cell lines) is an essential criterion of validity of the generated hypotheses. And we underscore the (obvious) challenge of the ultimate translation of pre-clinical experiments to therapeutic effects in humans.

      In revision, we have clarified in detail the expectation of predicted synergy implied by the reviewer’s comment, “the majority of the drug combinations predicted for the initial cell line OVSAHO did not result in the predicted effect”. This reflects a misunderstanding of our goals. The predictions are for drug effects that are anti-resistant, such that the proteomic response to one drug is counteracted by the second drug. The predicted effect is not synergy. Indeed, useful anti-resistance effect does not require synergy - additivity is sufficient: if cells are resistant to the original drug, the second drug plausibly still has antiproliferative effect, as it targets the cellular processes that are increased in activity (upregulated) in response to the first drug. So we deleted the red synergy color in Table 2 to avoid the potential conclusion from our results that without synergy, there is no benefit to a drug combination. In fact, additive drug combination effects are in themselves beneficial. For clarity on this point, added coloring in Table 2 to highlight the small number of combinations that did not work well in that the combination was clearly antagonistic, using a combination index CI >= 2.0 cutoff; we clarify this point in the Discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 2b. This figure would be more impactful if presented as an upset plot with the same Venn diagram embedded. I am not sure Figure 2C accurately supports the statement : "Frequently affected proteins generally had expression level changes in the same direction across all drug perturbations (Figure 2c), indicating a potential general stress response. ". It would be beneficial if the authors could present the data in a way that shows the number of genes with similar directional groupings. Likewise, the color scheme for this figure is hard to interpret as grey is the most negative value and values are preselected for absolute fold-change. Please consider colors with a stronger contrast.

      Authors should consider uploading MS files to the PRIDE or MASSIVE repository.

      We have addressed these very useful suggestions. We have edited Figure 2b to include the requested upset plot. It serves to illustrate the intersection of proteins responding to different perturbation conditions; due to figure space constraints, we limit the figure to entries with counts of at least 15. We have added the number of proteins with consistent directional changes in the figure 2c caption and the text.

      For Figure 2c, we have edited the color bar legend to better reflect the colors that appear in the heatmap.

      We have added our mass-spectrometry drug-response dataset to the ProteomeXchange Consortium via PRIDE with accession number PXD066316.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In the work from Qiu et al., a workflow aimed at obtaining the stabilization of a simple small protein against mechanical and chemical stressors is presented.

      Strengths:

      The workflow makes use of state-of-the-art AI-driven structure generation and couples it with more classical computational and experimental characterizations in order to measure its efficacy. The work is well presented, and the results are thorough and convincing.

      We are grateful to this reviewer for his/her thoughtful assessment and supportive feedback. In response, we have addressed each comment and incorporated the necessary revisions into the manuscript.

      Weaknesses:

      I will comment mostly on the MD results due to my expertise.

      The Methods description is quite precise, but is missing some important details:

      (1) Version of GROMACS used.

      We used GROMACS version 2023.2 (single-precision). All subsequent MD simulation procedures mentioned below have been consolidated and described in detail in the Supporting Information (SI).

      (2) The barostat used.

      Pressure coupling was applied using the C-rescale barostat (τ<sub>p</sub> = 5.0 ps, ref<sub>p</sub> = 1.0 bar).

      (3) pH at which the system is simulated.

      No explicit pH was defined during system construction. Proteins were modeled using standard protonation states as assigned by GROMACS preprocessing tools, corresponding to physiological, near-neutral pH (~ 7.0).

      (4) The pulling is quite fast (but maybe it is not a problem)

      The relatively high pulling velocity (1 nm/ns) was selected to enable efficient screening across a large number of designed proteins (211 candidates), while maintaining reasonable computational cost/time. Given the intrinsic orders-of-magnitude difference between simulation and experimental pulling rates, SMD results were used as a comparative screening tool, rather than for direct quantitative comparison with AFM data.

      (5) What was the value for the harmonic restraint potential? 1000 is mentioned for the pulling potential, but it is not clear if the same value is used for the restraint, too, during pulling.

      All positional restraints used in the simulations, including those applied during equilibration as well as the harmonic restraint on the N-terminus and the pulling umbrella restraint during SMD, employed the same force constant (k = 1000 kJ·mol<sup>–1</sup>·nm<sup>2</sup>). We have clarified this point in the revised Methods section.

      (6) The box dimensions.

      Rectangular simulation boxes were used throughout. For equilibrium MD simulations, the box dimensions in each direction were set based on the maximum extent of the protein along that axis, with a minimum distance of 1.2 nm between the protein surface and the box boundary on all sides. For SMD simulations, the same box dimensions were applied in the x and y directions. Along the pulling (z) direction, the box length was extended to accommodate the theoretical stretching length, defined as the initial N–C terminal distance plus 0.36 nm per stretched residue, while maintaining a 1.2 nm buffer at both ends (2.4 nm total). These details have now been clarified in the revised Supporting Information.

      From this last point, a possible criticism arises: Do the unfolded proteins really still stay far enough away from themselves to not influence the result?

      We analyzed the minimum atomic distance between each protein and its periodic images to assess potential artifacts from periodic boundary conditions. For all simulation stages used in screening and statistical analysis, the minimum protein–image separation remained above 1.0 nm for the majority of the simulation time, exceeding the nonbonded interaction cutoff and minimizing cross-boundary interactions. As shown in the Author response image 1for SpecAI89 (left), this separation during SMD simulations is consistently well above the threshold, indicating that the chosen box dimensions are appropriate. In the very late stages of annealing MD, highly unstable proteins may exhibit large conformational fluctuations and transient boundary proximity (right); however, these regimes are associated with large RMSD deviations and are excluded from analysis. Notably, the mechanically relevant unfolding events occur near the center of the simulation box and proceed along the pulling axis in SMD simulations, making boundary effects unlikely to influence the unfolding process or the relative mechanostability ranking.

      Author response image 1.

      Analysis of the minimum atomic distance between the protein and its periodic images under periodic boundary conditions. Left: SpecAI89 during SMD simulations, showing that the minimum protein–image distance remains above 1.0 nm for the majority of the simulation time. Right: WT during AMD simulations, where transient proximity to the periodic boundary is observed at very late stages due to large conformational fluctuations.

      Additionally, no time series are shown for the equilibration phases (e.g., RMSD evolution over time), which would empower the reader to judge the equilibration of the system before either steered MD or annealing MD is performed.

      We thank the reviewer for this suggestion. To assess equilibration, we analyzed the backbone RMSD evolution during the equilibration phase. Using SpecAI89 as a representative example (Author response image 2), the protein backbone RMSD converges rapidly and reaches a stable plateau within approximately 5 ps. The subsequent 125 ps equilibration period therefore sufficiently demonstrates that the system is well equilibrated prior to both steered MD and annealing MD simulations.

      Author response image 2.

      The backbone RMSD of SpecAI89 over time during simulation

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure S2, only one copy (or the average of the three copies; it is not clear from the caption) is shown, would be better to show the individual traces for each repeat. Additionally, only the plot for the forces is shown, and not, similarly to the AMD, the RMSD plot. This could be a stylistic choice, but it just reports on how much force was applied and not on how the protein responded to the force. Moreover, horizontal lines at the maximum value reached by the force could be added in order to directly see the difference in force applied, since it is then remarked on.

      Figure S2 originally shows a representative single SMD trajectory, as the force–extension peak positions vary between independent simulations and averaging the force traces would obscure the characteristic force peaks. In the revised Supplementary Information, we have now added the force–extension traces from the other two independent SMD repeats for each construct (New Figure S2). In addition, horizontal lines indicating the maximum force reached in each trajectory have been included to facilitate direct comparison of force differences between designs.

      (2) In Figure S3 the plots have different y-axis. Maybe it could be valuable to modify it so that in figures b, c, and d the spectrum result is in the background (perhaps in gray) so that the y-axis is not changed to retain the information included in this plot, but one could still compare directly to the spectrum result. With a 0 to 1 nm y-axis part of the spectrin run will be hidden, but in any case, plot a can be used to see the full behavior. Similarly to S2, the repeats (if any) could be shown.

      We have revised Figure S3 as suggested. The y-axis is now unified to 0–1.2 nm across all panels. For panels b–d, the natural spectrin trajectory is displayed in light gray in the background for direct comparison. Additionally, three independent MD replicates are now presented for each construct to demonstrate reproducibility.

      Finally, minor remarks that could nevertheless improve the paper:

      (3) In Figure S7, a bimodal distribution model for the number of events could be used to fit the data better.

      We thank the reviewer for the detailed suggestion. Following this advice, we explored the bimodal Gaussian distribution model for fitting the force-event data in Figure S7. Indeed, our analysis showed that a bimodal fit could fit Figures S7 panel f better (as shown in Author response image 3). The two peaks were centered at F<sub>1</sub> = 190 ± 4 pN and F<sub>2</sub> = 380 ± 6 pN. Interestingly, the force of the first major peak obtained is the same as the previously fitted value. The second one is double force value which we guess maybe is a bi-molecule stretched for unknown reason. Considering the very few numbers of the second peak and the same force value (190 pN), we decide not to change the unfolding force value in the manuscript. But we thank this reviewer’s insightful comment.

      Author response image 3.

      The bimodal fit for unfolding force of SpecAI88-49E102K-6H149H show the same 190 pN unfolding for the first peak as previous fit.

      (4) The colors in the video are not very intuitive, as the spectrin is shown initially in light blue, but becomes grey in the variants, where light blue is reserved for the additional helix. A counter of elapsed time and/or force/temperature applied could help the readers orient. Maybe it could be useful to produce a video with spectrin and the three variants all shown together?

      We thank this comment. The videos have been revised to improve clarity and consistency accordingly. In all cases, the original protein scaffold is now shown in gray, while the additional helix in the designed variants is highlighted in blue. Real-time annotations have been added to aid interpretation: the instantaneous temperature is displayed during AMD simulations, and time is shown during SMD simulations. In addition, for ease of comparison, the AMD and SMD results of all four proteins are each compiled into a single combined video, allowing their behaviors to be viewed side by side.

      Reviewer #2 (Public review):

      Qiu, Jun et. al., developed and validated a computational pipeline aimed at stabilizing α-helical bundles into very stable folds. The computational pipeline is a hierarchical computational methodology tasked to generate and filter a pool of candidates, ultimately producing a manageable number of high-confidence candidates for experimental evaluation. The pipeline is split into two stages. In stage I, a large pool of candidate designs is generated by RFdiffusion and ProteinMPNN, filtered down by a series of filters (hydropathy score, foldability assessed by ESMFold and AlphaFold). The final set is chosen by running a series of steered MD simulations. This stage reached unfolding forces above 100pN. In stage II, targeted tweaks are introduced - such as salt bridges and metal ion coordination - to further enhance the stability of the α-helical bundle. The constructs undergo validation through a series of biophysical experiments. Thermal stability is assessed by CD, chemical stability by chemical denaturation, and mechanical stability by AFM.

      Strengths:

      A hierarchical computational approach that begins with high-throughput generation of candidates, followed by a series of filters based on specific goal-oriented constraints, is a powerful approach for a rapid exploration of the sequence space. This type of approach breaks down the multi-objective optimization into manageable chunks and has been successfully applied for protein design purposes (e.g., the design of protein binders). Here, the authors nicely demonstrate how this design strategy can be applied to successfully redesign a moderately stable α-helical bundle into an ultrastable fold. This approach is highly modular, allowing the filtering methods to be easily swapped based on the specific optimization goals or the desired level of filtering.

      We are thankful for the reviewer’s diligent evaluation and positive remarks. His/her concluding remarks, which encourage our future work at the intersection of AI-protein design and AFM-SMSF, are especially appreciated. All comments have been incorporated into our revisions.

      Weaknesses:

      Assessing the change in stability relative to the WT α-helical bundle is challenging because an additional helix has been introduced, resulting in a comparison between a three-helix bundle and a four-helix bundle. Consequently, the appropriate reference point for comparison is unclear. A more direct and informative approach would have been to redesign the original α-helical bundle of the human spectrin repeat R15, allowing for a more straightforward stability comparison.

      This is an insightful comment. Indeed, a direct comparison between the same structure of the three-helix bundle will be most straightforward with a clear reference point. I will take this advice and try it in our future endeavor.

      In our case, a substantial fraction of the hydrophobic region is relatively shallow and partially solvent-exposed in the wild-type R15 α-helical bundle. So, the added fourth helix provides a new hydrophobic packing interface, increasing core burial, packing density, and strengthening the internal load-bearing network. Consistent with this design rationale, rSASA analysis shows that the designed proteins exhibit a higher degree of hydrophobic core burial compared to the wild-type R15. Specifically, the fraction of residues with rSASA < 0.2 exceeds 30% in the designs, compared to 23% in the natural spectrin repeat.

      While the authors have shown experimentally that stage II constructs have increased the mechanical stability by AFM, they did not show that these same constructs have increased the thermal and chemical stabilities. Since the effects of salt bridges on stability are highly context dependent (orientation, local environment, exposed vs buried, etc.), it is difficult to assess the magnitude of the effect that this change had on other types of stabilities.

      We agree that the effects of salt bridges are highly context-dependent and that different dimensions of stability do not always correlate. Following your suggestion, we evaluated the thermal and chemical stabilities of the Stage II constructs. The experimental results (now added as Figure S9) show that Stage II designs successfully maintain the high thermal stability and resistance to chemical denaturation to different extend. The thermal stability is still as high as the Stage I but the resistance to chemical denaturation is slightly reduced. We have added this result in the manuscript accordingly.

      The three constructs chosen are 60-70% identical to each other, either suggesting overconstrained optimization of the sequence or a physical constraint inherent to designing ultrastable α-helical bundles. It would be interesting to explore these possible design principles further.

      Yes, the observed sequence convergence likely arises from a combination of intrinsic physical constraints of the protein architecture and the applied design and screening criteria. In particular, the tightly packed hydrophobic core imposes strong constraints on side-chain size, packing complementarity, and the alignment of heptad-like motifs reminiscent of coiled-coil organization, which collectively reduce the accessible sequence space. In addition, the strong selection pressure imposed by foldability and stability filters further promotes convergence toward similar solutions. And we agree with the reviewer that this represents an important direction for future work.

      While the use of steered MD is an elegant approach to picking the top N most stable designs, its computational cost may become prohibitive as the number of designs increases or as the protein size grows, especially since it requires simulating a water box that can accommodate a fully denatured protein

      Yes, steered MD can become computationally expensive, particularly as the number of designs increases or as protein size grows. Considering the vast pool created by AI, SMD in this work was applied to a relatively small, high-confidence subset of candidates after multiple rounds of rapid prescreening, keeping the overall computational cost manageable. In future applications, this step could be further accelerated by integrating machine-learning–based predictors to improve scalability.

      Reviewer #2 (Recommendations for the authors):

      I am not convinced that the difference in rSASA between the designs and the natural spectrin repeat is meaningful. It would be helpful to report confidence intervals for the rSASA values of the designs to clarify whether any differences are statistically robust. Even if such differences prove statistically significant, it is not clear that they are large enough to be practically meaningful.

      In our analysis, rSASA values were calculated from equilibrated MD conformations and were consistently higher for all designed proteins that passed the simulation-based screening compared to the wild-type spectrin repeat. However, we believe that rSASA was used only as a supportive structural descriptor to indicate a trend toward a more compact and better-buried hydrophobic core, rather than as a standalone or decisive metric of stability.

      Protein stability is indeed influenced by multiple factors, including hydrogen bonding, salt bridges, metal coordination, and topology-dependent load-bearing interactions, none of which are captured by rSASA alone. Therefore, we agree with the reviewer that differences in rSASA alone should not be overinterpreted as a quantitative measure of protein stability. For this reason, rSASA was not used as a ranking criterion or a predictor of stability, but only as complementary evidence consistent with the overall design rationale and with the experimentally observed stability enhancements.

      The claim "The strong agreement between computational rankings and experimental measurements validates this approach for prioritizing designs based on relative mechanostability, offering a practical pipeline to bridge the gap between in silico design and experimental validation." should be substantiated by a citation or a figure. Since the authors have the experimental AFM data and steered MD data, I suggest adding a Spearman correlation plot of the two.

      Following this comment, we examined the Spearman rank correlation between SMD-derived unfolding forces and experimentally measured AFM forces (Author response image 4). The resulting correlation was modest (ρ = 0.4, p = 0.6), which is not unexpected given (i) the large difference in force and timescales between high-speed SMD simulations and single-molecule AFM experiments, and (ii) the limited number of designs and simulation repeats available.

      Nevertheless, qualitatively, the difference between the first point from wt-spectrin and the other three specAI is clear. Considering the large computational cost, we only performed three times simulation one each design to balance the accuracy and the cost/time. To avoid overinterpretation, we therefore did not include the correlation analysis in the main text and revised the manuscript to soften claims of strong agreement, emphasizing instead the qualitative and comparative role of SMD in the design pipeline.

      Author response image 4.

      Spearman correlation between SMD and AFM unfolding forces for natural spectrin and SpecAI designs. SMD force (x-axis) versus experimental AFM force (y-axis); each point represents one protein.

      Reviewer #3 (Public review):

      Summary:

      Qiu et al. present a hierarchical framework that combines AI and molecular dynamics simulation to design an α-helical protein with enhanced thermal, chemical, and mechanical stability. Strategically, chemical modification by incorporating additional α-helix, site-specific salt bridges, and metal coordination further enhanced the stability. The experimental validation using single-molecule force spectroscopy and CD melting measurements provides fundamental physical chemical insights into the stabilization of α-helices. Together with the group's prior work on super-stable β strands (https://www.nature.com/articles/s41557-025-01998-3), this research provides a comprehensive toolkit for protein stabilization. This framework has broad implications for designing stable proteins capable of functioning under extreme conditions.

      Strengths:

      The study represents a complete framework for stabilizing the fundamental protein elements, α-helices. A key strength of this work is the integration of AI tools with chemical knowledge of protein stability.

      The experimental validation in this study is exceptional. The single-molecule AFM analysis provided a high-resolution look at the energy landscape of these designed scaffolds. This approach allows for the direct observation of mechanical unfolding forces (exceeding 200 pN) and the precise contribution of individual chemical modifications to global stability. These measurements offer new, fundamental insights into the physicochemical principles that govern α-helix stabilization.

      We appreciate the positive assessment of our manuscript from this reviewer and his/her support. We have answered all the comments as follows and modified the manuscript accordingly.

      Weaknesses:

      (1) The authors report that appending an additional helix increases the overcall stability of the α-helical protein. Could the author provide a more detailed structural explanation for this? Why does the mechanical stability increase as the number of helixes increase? Is there a reported correlation between the number of helices (or the extent of the hydrophobic core) and the stability?

      In multi-helix bundle proteins, tight interhelical packing leads to the formation of a dense hydrophobic core, which substantially enhances overall structural stability. The introduction of an additional helix does not merely increase helix count, but expands the buried hydrophobic interface, improving packing density and cooperative side-chain interactions in the core. This, in turn, strengthens the internal load-bearing network that resists force-induced unfolding.

      From a mechanical perspective, adding a helix also increases topological interlocking among secondary-structure elements, which raises the energetic barrier for unfolding and shifts the unfolding pathway toward more cooperative rupture events, thereby increasing the unfolding force threshold. Consistent with this design principle, pioneering studies have reported a positive correlation between the number of helices (or the extent of the hydrophobic core) in helix bundles and their stability (Lim et al., Structure, 2008, 16:449; Minin et al., J. Am. Chem. Soc., 2017, 139, 16168; Bergues-Pupo et al., Phys. Chem. Chem. Phys., 2018, 20, 29105). Inspired by these works, our AI-protein design study uses the appended helix to reinforce the hydrophobic core rather than simply increasing secondary-structure content.

      (2) The author analyzed both thermal stability and mechanical stability. It would be helpful for the author to discuss the relationship between these two parameters in the context of their design. Since thermal melting probes equilibrium stability (ΔG), while mechanical stability probes the unfolding energy barriers along the pulling coordinate.

      We agree this is a crucial distinction. Thermal and chemical stabilities report on the equilibrium free energy (ΔG), while mechanical stability probes the kinetic unfolding barrier (ΔG‡) along a force-dependent pathway. Their inherent difference makes concurrent improvement in all parameters a non-trivial task, which highlights the importance and success of our integrative design approach.

      (3) While the current study demonstrates a dramatic increase in global stability, the analysis focuses almost exclusively on the unfolding (melting) process. However, thermodynamic stability is a function of both folding (k<sub>f</sub>) and unfolding (k<sub>u</sub>) rates. It remains unclear whether the observed ultrastability is primarily driven by a drastic decrease in the unfolding rate (k<sub>u</sub>) or if the design also maintains or improves the folding rate (k<sub>f</sub>)?

      We agree with the reviewer that thermodynamic stability is determined by both the folding rate (k<sub>f</sub>) and the unfolding rate (k<sub>u</sub>). In the present study, we did not directly measure folding kinetics, and therefore cannot quantitatively deconvolute the respective contributions of k<sub>f</sub> and k<sub>u</sub> to the observed ultrastability. Based on the design strategy and the experimental observations, we propose that the enhanced stability primarily originates from a substantial reduction in the unfolding rate (k<sub>u</sub>), corresponding to an increased unfolding energy barrier. The reinforcement of the hydrophobic core, the introduction of stabilizing interactions such as salt bridges and metal coordination, and the additional helix that increases topological and packing constraints all raise the energetic cost of disrupting key interactions in the folded state.

      This interpretation is consistent with the high mechanical unfolding forces observed in both AFM experiments and SMD simulations. In contrast, these stabilizing features are not necessarily expected to accelerate folding and may even modestly increase folding complexity. Addressing folding kinetics explicitly would require dedicated kinetic experiments or simulations, which are beyond the scope of the present work but represent an interesting direction for future studies.

      (4) The authors chose the spectrin repeat R15 as the starting scaffold for their design. R15 is a well-established model known for its "ultra-fast" folding kinetics, with folding rates (k<sub>f</sub> ~105s), near three orders of magnitude faster than its homologues like R17 (Scott et.al., Journal of molecular biology 344.1 (2004): 195-205). Does the newly designed protein, with its additional fourth helix and site-specific chemical modifications, retain the exceptionally high folding rate of the parent R15?

      We did not directly measure the folding kinetics of the newly designed proteins, and therefore cannot determine whether they retain the exceptionally fast folding rate reported for the parent spectrin repeat R15. While R15 is known for its ultrafast folding behavior, the introduction of an additional fourth helix and site-specific chemical modifications, although beneficial for enhancing stability, may increase the complexity of the folding landscape and do not necessarily guarantee that the folding rate (k<sub>f</sub>) remains comparable to that of R15.

      Reviewer #3 (Recommendations for the authors):

      (1) Please clarify the used Gaussian function to fit the unfolding force distribution (Figure 3-4). In Figure S8, the Bell-Evans model is used to analyze unfolding force. The authors should explain the choice of fitting methods and ensure consistency.

      The Gaussian fitting used in Figures 3–4 is intended as a descriptive statistical analysis to summarize the unfolding force distributions and to facilitate direct comparison between different designs. This approach provides a robust estimate of the most probable unfolding force and the distribution width, without invoking a specific physical unfolding model, and is commonly used in single-molecule force spectroscopy for comparative purposes.

      In contrast, the Bell-Evans model applied in Figure S8 is a kinetic framework that explicitly accounts for force-loading-rate dependence and is used to extract mechanistic insights into the unfolding process. Therefore, the two fitting approaches serve complementary roles: Gaussian fitting for quantitative comparison and ranking of mechanostability, and Bell-Evans analysis for mechanistic interpretation. We have clarified this distinction and the rationale for using both methods in the revised Supplementary Information to ensure consistency and transparency.

      (2) The authors utilized steered MD simulation to analyze the mechanical properties via ForceGen (Ni et al., 2024, Sci. Adv. 10, eadl4000). However, the significant discrepancy between the predicted unfolding force (~600 pN) and the experimental value (~50 pN for spectrin, line 376) requires further justification (line 376). Please clarify how the accuracy of these predictions can be established. Specifically, do the MD simulations successfully capture the relative ranking or trends in stability across the different designed variants?

      We agree with the reviewer that there is a substantial discrepancy between the absolute unfolding forces predicted by SMD simulations (~ 600 pN) and those measured experimentally by AFM (~ 50 pN for spectrin). This difference primarily arises from the orders-of-magnitude mismatch in loading rates between simulations and experiments. In our SMD simulations, the pulling velocity (~10<sup>9</sup> nm/s) is several orders of magnitude higher than that used in AFM experiments (~10<sup>3</sup> nm/s), which is to systematically elevate the apparent unfolding force. In addition to loading-rate effects, limitations in force-field accuracy, finite system size, and restricted conformational sampling further contribute to deviations in absolute force values. As a result, the unfolding forces obtained from SMD are not intended to provide quantitative agreement with experimental measurements or absolute mechanical stability.

      Instead, SMD is employed here as a comparative screening tool to assess relative mechanostability across different designed variants under identical simulation conditions. Despite the limited number of repeats imposed by computational cost, the simulations consistently distinguish candidates with markedly different mechanical responses. Importantly, the variants identified by SMD as more mechanically stable were subsequently confirmed experimentally to exhibit enhanced mechanostability relative to the wild-type spectrin repeat. Therefore, while SMD does not yield quantitatively accurate unfolding forces, it successfully captures relative stability trends and provides a practical and effective means for prioritizing designs prior to experimental validation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive and precise comments, which have helped us improve the consistency and clarity of our manuscript. Below, we provide a point-by-point response to each comment. In summary, the main changes introduced in the revised version are as follows:

      (1) We replaced all the statistical analyses to their non-parametric equivalents to ensure compliance with test assumptions and consistency of the results;

      (2) We compare the participants’ reaction times before and during connected practice, revealing a significant reduction in reaction times of both partners when connected;

      (3) We added, in the supplementary materials, a table reporting the vigor scores of each participant in each experimental condition, facilitating the assessment of individual and dyadic behaviors;

      (4) We have reviewed and refined the terminology throughout the manuscript and reduced the number of abbreviations to improve clarity.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present a novel investigation of the movement vigor of individuals completing a synchronous extension-flexion task. Participants were placed into groups of two (so-called "dyads") and asked to complete shared movements (connected via a virtual loaded spring) to targets placed at varying amplitudes. The authors attempted to quantify what, if any, adjustments in movement vigor individual participants made during the dyadic movements, given the combined or co-dependent nature of the task. This is a novel, timely question of interest within the broader field of human sensorimotor control.

      Participants from each dyad were labeled as "slow" (low vigor) or "fast" (high vigor), and their respective contributions to the combined movement metrics were assessed. The authors presented four candidate models for dyad interactions: (a) independent motor plans (i.e., co-activity hypothesis), (b) individual-led motor plans (i.e., leader-follower hypothesis), (c) generalization to a weighted average motor plan (i.e., weighted adaptation hypothesis), and (d) an uncertainty-based model of dynamic partner-partner interaction (i.e., interactive adaptation hypothesis). The final model allowed for dynamic changes in individual motor plans (and therefore, movement vigor) based on partner-partner interactions and observations. After detailed observations of interaction torque and movement duration (or vigor), the authors concluded that the interactive adaptation model provided the best explanation of human-human interaction during self-paced dyadic movements.

      Strengths:

      The experimental setup (simultaneous wrist extension-flexion movements) has been thoroughly vetted. The task was designed particularly well, with adequate block pseudo-randomization to ensure general validity of the results. The analyses of torque interaction, movement kinematics, and vigor are sound, as are the statistical measures used to assess significance. The authors structured the work via a helpful comparison of several candidate models of human-human interaction dynamics, and how well said models explained variance in the vigor of solo and combined movements. The research question is timely and extends current neuroscientific understanding of sensorimotor control, particularly in social contexts.

      We thank the reviewer for their in-depth analysis and constructive assessment of our manuscript.

      Weaknesses:

      (1) My chief concern about the study as it currently stands is the relatively low number of data points (n=10). The authors recruited 20 participants, but the primary conclusions are based on dyad-specific interactions (i.e., analyses of "fast" vs "slow" participants in each pair). Some of these analyses would benefit greatly, in terms of power, from the addition of more data points.

      We understand and appreciate the reviewer’s concern regarding the effective sample size at the dyad level (n=10). While our primary analyses focus on dyad-specific interactions, we note that the reported effects are consistent across multiple dynamic conditions and are associated with large effect sizes. To provide a conservative assessment the Cohen’s D values reported correspond to the smallest effect size observed across the relevant statistical tests, thereby limiting the risk of false positives or overinterpretation. In addition, to ensure robustness given the sample size and distribution properties of the data, we have replaced all parametric tests with their non-parametric counterparts, as some analyses violated ANOVA assumptions. Friedman and Kruskal-Wallis tests are now used for paired and unpaired main effects respectively, and Wilcoxon and Mann-Whitney tests for paired and unpaired post-hoc comparisons respectively. Note that these changes did not alter the conclusions of the study.

      (a) The distribution of delta-vigor (Fast group vs Slow group) is highly skewed (see Figures 3D, S6D), with over half of the dyads exhibiting delta-vigor less than 0.2 (i.e., less than 20% of unit vigor). Given the relatively low number of dyads, it would be helpful for the authors to provide explicit listings of VigorFast, VigorSlow, and VigorCombined for each of the 10 separate dyads or pairings.

      We agree with this comment. However, we note that the distribution of vigor scores within a population is typically centered around 1, with large deviations observed only for the fastest and slowest participants [1]. As a result, the distri bution of ∆-vigor is inherently skewed. Correcting for this skewness would (i) require pairing participants based on their vigor, which is logistically difficult, and (ii) lead to an atypical sampling of dyads, with an over representation of pairs exhibiting very large vigor differences. The distributions of vigor scores for the fast and slow groups before and after the interaction are reported in Supplementary Fig. S21. In addition, as suggested by the reviewer, we have now included Table S.1 in the supplementary materials, listing the values VigorFast, VigorSlow, and VigorCombined for each of the 10 dyads. This table provides a complete view of the evolution of participant’s vigor throughout the experiment.

      (b) The authors concluded that the interactive adaptation hypothesis provided the best summary of the combined movement dynamics in the study. If this is indeed the case, then the relative degree of difference in vigor between the fast and slow participants in a dyad should matter. How well did the interactive adaptation model explain variance in the dyads with relatively low delta-vigor (e.g., less than 0.2) vs relatively high delta-vigor?

      We initially expected the magnitude of difference in individual vigor within a dyad to play a significant role. However, our analysis did not reveal any systematic effect of ∆-vigor on either the interaction force or the resulting dyadic vigor, as shown by the LMM analysis. Importantly, the interactive adaptation hypothesis does per se imply that the magnitude of vigor differences between the two partners should matter, only that their respective roles in selecting the adapted behavior is different. Although the model includes several free parameters, we did not attempt to fit it to individual dyads as would in principle be possible. Instead, we performed a sensitivity analysis to assess how variations in the difference in vigor between the partners influence model predictions. For this purpose, we simulated increasing values of µ and variations in the fast partner’s cost of time. In addition, we demonstrated that uncertainty in the estimated behavior of the slow partner, which is a priori specific to each individual, has a substantial impact on the optimal movement duration of the dyad. Overall, this analysis shows that the model captures the full range of qualitative trends observed in the experimental data. When applied to predict the behavior of the average dyad, the resulting movement time prediction error remain small, as detailed in the Results section.

      (2) The authors shared the results of one analysis of reaction time, showing that the reaction times of the slow partners and the fast partners did not differ during the initial passive block. Did the authors observe any changes in RT of either the slow or fast partner during the combined (primary task) blocks (KL, KH, etc.)? If the pairs of participants did indeed employ a form of interactive adaptation, then it is certainly plausible that this interaction would manifest in the initial movement planning phase (i.e., RT) in addition to the vigor and smoothness of the movements themselves.

      We thank the reviewer for this interesting question, that prompted us to extend our analysis of reaction times to the connected conditions. This additional analysis revealed a significant main effect of the condition on the reaction time for both the fast and slow groups (in both cases: W<sub>2</sub> > 0.39, p < 0.02). Post-hoc comparisons showed a significant reduction in reaction time between the initial null-field block (NF1) and the KH condition for the slow group (p = 0.03, D = 1.46), and a similar trend for the fast group (p = 0.06, D = 1.03). However, the reaction times remained comparable between the two groups, with no significant difference between them. We have incorporated these observations in the Results section (p.4, l.100–109) and expanded the Discussion (p.11, l.341–348) to address their implications for interactive adaptation in human-human and human-robot physical interactions.

      Reviewer #2 (Public review):

      Summary:

      This study examines how individual movement vigor is integrated into a shared, dyadic vigor when two individuals are physically coupled. Participants performed wrist-reaching movements toward targets at different distances while mechanically linked via a virtual elastic band, and dyads were formed by pairing participants with different baseline vigor profiles. Under interaction conditions, movements converged to coordinated patterns that could not be explained by simple averaging, indicating that each dyad behaved as a single functional unit. Notably, under coupling, movement durations for both partners were shorter than in the solo condition, arguing against the view that each individual simply executed an independent movement plan. Furthermore, dyadic vigor was primarily predicted by the slower partner’s vigor rather than by the faster partner’s, suggesting that neither a leader-follower strategy nor a weighted averaging account fully explains the observed behavior. The authors propose a computational model in which both partners adapt to the emerging interaction dynamics ("interactive adaptation strategy"), providing a coherent explanation of the behavioral observations.

      Strengths:

      The study is carefully designed and addresses an important question about how individual movement vigor is integrated during joint action. The experimental paradigm allows systematic manipulation of interaction strength and partner asymmetry. The behavioral results show clear and robust patterns, particularly the shortening of movement durations under elastic coupling (KL and KH conditions) and the asymmetrical contribution of the slower partner’s vigor to dyadic vigor. The computational model captures the main behavioral patterns well and provides a principled framework for interpreting dyadic vigor not as a simple combination of two independent motor plans, but as an emergent property arising from mutual adaptation. Conceptually, the study is notable in extending the notion of vigor from an individual attribute to a dyad-level construct, opening a new perspective on coordinated movement and motor decision-making.

      We thank the reviewer for their thorough analysis of our manuscript and their constructive feedback.

      Weaknesses:

      (1) A key conceptual issue concerns the apparent asymmetry between partners in the computational framework. While dyadic vigor is empirically better predicted by the slower partner’s vigor, the model formulation appears to emphasize the faster partner’s time-related cost and interaction forces. Although the cost function includes an uncertaintyrelated component associated with the slower partner, it remains unclear from the current formulation and description how dyadic vigor is formally derived from the slower partner’s control policy within the same modeling framework. This raises an important question regarding whether the model offers a symmetric account of dyadic vigor formation for both partners or whether it is effectively anchored to the faster partner’s control architecture.

      We have modified our phrasing to clarify the principles according to which the computational framework was designed (p.7, l.226–231 and p.9, l.260–264). As stated in the Results section, the model is indeed asymmetric by design, which corresponds to the different roles of the fast and slow partner exhibited in the data. In that context, the uncertain term associated with the slow partners should be understood as an overarching constraint that conditions the strategy of the dyad, while the fast partner cost of time acts as a contributor to the expected dyad strategy. Conceptually and numerically as reported in the sensitivity analysis, this asymmetry corresponds to the role of the slow partners in setting the vigor ranking among the dyads and the role of the fast partner in setting the average dyadic behavior.

      (2) A second conceptual issue concerns the interpretation of the term "motor plan." It remains unclear whether this term refers primarily to movement-related characteristics such as speed or duration, or more broadly to the underlying optimization structure that governs these variables. This distinction is theoretically important, as it determines whether the reported interaction effects should be understood as adjustments in movement characteristics or as changes in the structure of the control policy itself.

      We agree with the reviewer that this terminology required clarification. In this paper, the term “motor plan” refers to the time series of control inputs planned by the CNS, rather than solely to kinematic descriptors such as speed or duration. These planned control signals are a direct consequence of the underlying optimization structure and cost functions that govern trajectory generation. We have clarified this definition in the Introduction (p.1, l.23–24).

      Reviewer #3 (Public review):

      Strengths:

      This study provides novel insights into how individuals regulate the speed of their movements both alone and in pairs, highlighting consistent differences in movement vigor across people and showing that these differences can adapt in dyadic contexts. The findings are significant because they reveal stable individual patterns of action that are flexible when interacting with others, and they suggest that multiple factors, beyond reward sensitivity, may contribute to these idiosyncrasies. The evidence is generally strong, supported by careful behavioral measurements and appropriate modeling, though clarifying some statistical choices and including additional measures of accuracy and smoothness would further strengthen the support for the conclusions.

      Thank you for this analysis and the insightful feedback.

      Major Comments:

      (1) Given the idiosyncrasies in individual vigor, would linear mixed models (LMMs) be more appropriate than ANOVAs in some analyses (e.g., in the section "Solo session"), as they can account for random intercepts and slopes on vigor measures? Some figures (e.g., Figure 2.B and 3.E) indeed seem to show that some aspects of behaviour may present variability in slopes and intercepts across participants. In fact, I now realize that LMMs are used in the "Emergence of dyadic vigor from the partners’ individual vigor" section, so could the authors clarify why different statistical approaches were applied depending on the sections?

      We thank the reviewer for this thoughtful comment. We deliberately used different statistical approaches throughout the paper in order to address different types of questions. Note that the statistical tests were converted to their nonparametric equivalent for consistency (see answer to Reviewer 1).

      - Friedman tests were used in a limited number of cases to assess population- or group-level effects, such as differences in movement time, smoothness, or accuracy across the solo, connected, and after-effects conditions. Such tests provide a straightforward framework for these descriptive, condition-level comparisons.

      - The stability of individual and dyadic vigor scores across conditions was assessed using Pearson correlations across all condition pairs, which we consider the most direct and interpretable approach for evaluating consistency across sessions.

      - LMMs were employed to examine how dyadic vigor relates to the partners’ individual vigor measured in the solo conditions, which revealed the critical contribution of the slow partner.

      Rather than applying a single statistical framework throughout, we selected the method best suited to each question. While LMMs are well suited for modeling participant-specific variability when linking individual and dyadic measures, their systematic use in all analyses would be less intuitive and would not directly address several of the population-level comparisons central to this study.

      (2) If I understand correctly, the introduction suggests that idiosyncrasies in movement vigor may be driven by interindividual differences in reward sensitivity. However, the current task does not involve any explicit rewards, yet the authors still observe idiosyncrasies in vigor, which is interesting. Could this indicate that other factors contribute to these consistent individual differences? For example, could sensitivity to temporal costs or physical effort explain the slow versus fast subgrouping? Specifically, might individuals more sensitive to temporal costs move faster to minimize opportunity costs, and might those less sensitive to effort costs also move faster? Along the same lines, could the two subgroups (slow vs. fast) be characterized in terms of underlying computational "phenotypes," such as their sensitivities to time and effort? If this is not feasible with the current dataset, it would still be valuable to discuss whether these factors could plausibly account for the observed patterns, based on existing literature.

      We thank the reviewer for this interesting question. We first note that the notion of reward in motor control is quite broad. Although our task did not include explicit external (e.g. monetary) rewards, we assumed that participants attribute an implicit value to completing the task in accordance with the experimenter’s instructions. This assumption has been shown to be appropriate for characterising baseline behavior in previous studies [2–5].

      As discussed in the Introduction, vigor is generally understood to emerge from a tradeoff between effort, accuracy, and time. The reviewer is correct in noting that inter-individual differences in vigor may reflect differences in reward sensitivity or in its discounting [3,6], given that time and reward are intrinsically coupled. Differences in vigor may also arise from inter-individual variability in sensitivity to effort or perceived task difficulty. Because these factors are intertwined—for example, increasing accuracy through co-contraction typically incurs greater effort [7])—it is challenging to disentangle their respective contributions based solely on behavioral data.

      In the present study, our inverse optimal control procedure to identify the cost of time (and thus predict individuals’ vigor) relies on a predefined effort-accuracy tradeoff under fixed final time across multiple movement amplitudes [8]. As a result, the model does not allow us to independently estimate individual sensitivities to effort, accuracy, and time. Such characterization of computational "phenotypes" would likely require experimental paradigms in which each of these factors is systematically manipulated while the others are held constant, which is beyond the scope of the current dataset. In practice, the main value of behavioral modeling lies in revealing the relative weighting of these criteria by the CNS during motor planning [5]. We have expanded the Discussion to clarify these limitations and considerations (see Discussion p.12, l.396–401 & l.407–412).

      Finally, we chose not to emphasize these broader issues in the present manuscript because (i) they are peripheral to our primary research question on how individual vigor influences human-human interaction, and (ii) although we do not yet have definitive and consensual answers, they have been addressed in multiple studies reviewed elsewhere [9,10].

      (3) The observation that dyads did not lose accuracy or smoothness despite changes in vigor is interesting and suggests a shift in the speed-accuracy tradeoff. Could the authors include accuracy and smoothness measures in the main figures rather than only in supplementary materials? I think it would make the manuscript more complete.

      We also find that the preservation of accuracy and smoothness despite changes in vigor is an interesting result, and we therefore chose to report these measures in the Supplementary Materials. However, we believe it is preferable not to include them in the main figures for the following reasons:

      - We avoid framing our results in terms of a speed-accuracy trade-off, as Fitts’ work was initially designed to study fast movements [11], whereas our work focuses on self-paced movements. As outlined in the Introduction, vigor is more appropriately interpreted as reflecting a tradeoff between effort (related to movement speed), accuracy, and time. From this perspective, the reported changes of vigor already capture a shift in the underlying trade-off selected by the CNS, using a framework better suited to our experimental paradigm.

      - The manuscript is technically dense and reports multiple analyses that are essential to establish (i) the existence and definition of dyadic vigor, and (ii) how it emerges from interaction between partners. Although the observed preservation of accuracy and improvements in smoothness are informative, they are not central to these two primary questions and would risk diverting attention from the core contributions of the paper. In addition, accuracy is not a feature predicted by our deterministic modeling and extensions would be needed to capture these aspect. Here we only attempted to replicate average behaviors.

      (4) It is a bit unclear to me whether the variance assumptions for ANOVAs were checked, for instance, in Figure 3H.

      We thank the reviewer for this comment, which prompted us to verify the assumptions underlying our ANOVAs. We found that a few distributions in the original analysis, as well as in some of the new tests, did not meet these assumptions. To ensure consistency, all statistical analyses have now been replaced with non-parametric tests: Friedman and Kruskal-Wallis tests for paired and unpaired main effects, Wilcoxon and Mann-Whitney tests for paired and unpaired post-hocs. The updated results do not change any of the conclusions. the only minor change is accuracy, that appeared slightly improved in a restricted number of connected conditions, and now appears mostly non-impacted.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      (1) Lines 146-147. The authors state, "Whereas the fast partners maintained a similar duration". Figures S6H,I suggest that fast partners made slower movements during the paired task relative to the solo task, not movements with a similar duration.

      We agree that Fig. S.6H,I suggest slightly slower movements for the fast partners, though not significant. We have modified the sentence to be less assertive than in the previous version (see p.6, l.155).

      (2) In the Discussion (Lines 318-319), the authors state that their findings confirm and extend the "benefits of dyadic control in collaborative actions". What benefits are they referring to here, relative to individual control? It would be helpful if the authors would elaborate on this claim.

      We have modified this sentence to clarify that the benefits of dyadic control refer to previously reported advantages over individual control, namely reduced movement time Reed and Peshkin (2008) [12] and improved tracking accuracy [13,14] (see p.11, l.336–337).

      (3) On Lines 87-89, the authors reference a decomposition of variance of vigor scores across the NF1, VL, and VH conditions; however, I did not see an explanation of how this decomposition was performed. The method used to estimate variance explained by inter-individual vs intra-individual differences in vigor should be outlined for the reader.

      Thank you for pointing out this missing information. We now explain in the statistical analysis section (see p.14, l.504–507), that the percentage of inter-individual variability in vigor is estimated using sum-square values as an estimation of inter- and intra-individual variability.

      (4) How was the absolute interaction torque for a paired movement calculated? Was it an integral of the temporal profile of torque for some portion of the combined movement? The method for calculating the absolute interaction torque needs to be specified.

      We have now clarified in the Methods (see p.14, l.490–491) that the reported average interaction effort was computed as the absolute value of the interaction torque as a function of time averaged over the entire movement.

      (5) Lines 123-124: "... interaction torque showed no significant correlation with differences in individual vigor within dyads." This statement should be supported by appropriate statistical measures.

      This result is now supported by reporting the corresponding Pearson correlation analyses. No significant correlations were found between interaction torque and differences in individual vigor within dyads (KL conditions: |r| < 0.43, p> 0.22; KH conditions: |r| < 0.18, p > 0.61, see p.5, l.132–133).

      (6) For the analysis, presented in Figure 3C, and specified on lines 116-123, the text mentions the main effects of both condition and target. There doesn’t appear to be much of an effect of the target for the KH data. Should these results not be reported as an interaction effect between the two factors instead?

      We agree with the reviewer and have corrected our presentation of these results (see p.4, l.126–128). Consistent with the reviewer’s observation, no significant effect of the target is found in the KH condition.

      (7) Figures 3E and S6B. What is the purpose of including the averaged data for each pair in addition to both individuals’ data from each pair? It would be useful to distinguish the individual data from the average data for each pair. Frankly, the number of data points shown on this sub-figure is excessive.

      There may have been a misunderstanding. Because the partners of a dyad are connected by a virtual elastic band (rather than a rigid bar), they do not execute identical movements. Therefore Figs. 3E,S6B display the movement time of all individual participants, together with the corresponding 20 individual regression lines, like in Fig. 2B. The solid black line represents the average across all individuals, and the averaged behaviors of dyads are not included. We have clarified this point by revising the caption of Fig. 3E (see p.5).

      Noted mis-spellings:

      Figure S.3A caption: "trials towards this target."

      Page 10 Line 313: "Importantly, these findings show ...".

      These mis-spellings have been corrected at supplementary p.2 and main text p.11, l.331. Thank you!

      Reviewer #2 (Recommendations for the authors):

      (1) To illustrate the contribution of the three components used to calibrate the overall cost function, it would be informative to include simulation analyses in which each component is selectively removed (i.e., ablation analyses).

      We did not perform ablation analyses, as selectively removing components of the model can lead to instability or ill-suited control inputs, making the resulting simulations difficult to interpret. Instead, we conducted a sensitivity analysis of the key parameters shaping the overall cost function, including the estimated mean and deviation of the slow partner’s movement duration, the weight associated with uncertain torque minimization (Figs. S.18,S.19), and the fast partner’s cost of time (Fig. S20). This analysis reveals the predominant roles of the estimated slow partner movement patterns in determining the model predictions, in agreement with our experimental observations.

      (2) Although the authors refer to the motor-off condition as "passive," participants actively generated the movements in the absence of external forces. Thus, this condition corresponds to active, unassisted movement. A different term may therefore reduce potential confusion for readers.

      We agree that term “passive” was not well-chosen given the context of the paper, thus we have instead replaced this denomination as “null-field” condition. Consequently, the P1 and P2 blocks are now referred to as NF1 and NF2.

      (3) Please clarify the instructions given to participants. Were they informed in advance that their movements would physically interact with those of their partner?

      Thank you for pointing out this missing clarification. We have now specified in the Methods (p.14, l.465–469) that participants were not informed prior to any condition that they would interact with a human partner; they were only told that the robot would provide assistance. When debriefed at the end of the experiment, only one out of the 20 participants reported having realized that they were connected to another human. Most participants believed they were interacting either with a version of themselves or with a robot with some randomness.

      (4) Line 475. Should "Fig. 2D" be "Fig. 2B"?

      Thank you for catching this error. The reference has been corrected to Fig. 2B (see p.15, l.522).

      Reviewer #3 (Recommendations for the authors):

      (1) The analysis of reaction times shows no difference between groups in the passive block, which challenges the assumption that movement vigor covaries with decision speed or action initiation speed. It may be worth discussing this in the context of recent literature.

      We agree that the initial analysis and discussion of reaction times were too superficial. In the revised manuscript, we now report that dyadic interaction leads to significantly shorter reaction times (p.4, l.100–109), concomitantly with improved movement velocity. We have also expanded the Discussion, on the relationship between decision and action speeds/durations (p.11, l.340–348).

      (2) Many abbreviations are unusual for a non-expert. I would recommend using the full terms instead. At least initially, I found it difficult to follow the results because the abbreviations were not immediately clear (at least to me).

      We agree that the paper had to many abbreviations. Therefore, we have removed the abbreviated names of the models and, when possible without impacting the readability, used the full names of the conditions.

      (3) Relatedly, the notation in Figure 1 may be confusing. The labels "S" and "F" (slow and fast) correspond to different concepts than "F" and "L" (follower and leader), so the same participant could be labeled "F" as fast but not "F" as a leader.

      Thank you for pointing out this potential source of confusion. We have therefore modified Fig. 1A (p.2) to avoid any potential confusion by using the full model names rather than abbreviations. In the remainder of the manuscript, "S" and "F" exclusively denote the slower and faster partners within a dyad, and we do not use abbreviations for "leader" or "follower" in the text.

      (4) In figures like 2.C and 3.I, keeping the same scales on the x and y axes and adding a diagonal reference line would make it easier to see shifts across conditions.

      As explained in the Methods, vigor scores in the low- and high-viscosity conditions were computed using the average movement durations from the NF1 condition as a reference. Consequently, because movements are slower in these conditions, the corresponding vigor values are lower than those in NF1. For this reason, using identical scales on the x- and y-axes and adding a 45◦ reference line could mislead the reader in thinking that the vigor scores are expected to be identical and reduce the readability of the figure.

      (5) Multiple hypotheses about dyadic regulation of vigor are nicely explained; it could help to indicate if any of these were a priori favored based on prior literature.

      Previous literature provides mixed evidence regarding how vigor might be regulated in dyadic interaction. For instance, Takagi et al. (2016) [15] reported that mechanically connected partners may rely on independent motor plans, which corresponds to the co-activity hypothesis considered here. However, in that study, movement duration was prescribed. We therefore expected that removing this constraint on movement duration could allow coordination strategies to emerge, particularly in view of findings on haptic communication during tracking of random targets while connected via an elastic band [13,14].

      At the same time, a large body of work on human–human and human–robot interaction has interpreted coordination through a leader–follower framework. In our context, vigor is understood as the outcome of a tradeoff between effort and elapsed time, with time being associated with a decaying reward. Based on this framework, we hypothesized a priori that a leader–follower scheme would emerge, in which the fast partner—being more sensitive to time costs and/or less sensitive to effort—would tend to drive the interaction, even at the expense of increased effort. For these reasons, the leader–follower hypothesis was formulated as the expected outcome throughout the manuscript.

      (6) In the introduction, statements such as "relative vigor of an individual is remarkably stable" appear true only in the solo condition. The same is true in the discussion where it is said that vigor is a stable trait. The whole study show that an individual can shift his/her vigor to the same vigor of another individual, so it doesn’t appear stable to me in such conditions but adaptable.

      Let us first clarify that when we describe vigor as “remarkably stable”, we do not imply that individuals do not adjust their movement timing in response to changes in external dynamics. For example, movement durations increase in visco-resistive conditions even during solo performance; nevertheless, individuals who move faster in the absence of resistance will remain faster relative to others when resistance is introduced. In this sense, stability refers to the preservation of relative rankings across conditions, rather than invariance of absolute movement timing. Because interaction with another individual constitutes a substantial change in task dynamics, an effect on individual pace is therefore expected.

      Told that (and as pointed to by the reviewer) (i) dyadic interactions lead to the emergence of a dyadic vigor characterized by average movement durations close to those of the fast partners, while the ranking across dyads is largely imposed by the slow partners; and (ii) these adaptations persist after the interaction phase. Importantly, the observed vigor adaptations appear to last longer in our physical interaction task than in previous attempts to manipulate vigor using visual feedback [16]. To account for this adaptability of vigor, we have (i) clarified claims in the Introduction regarding the stability of vigor (see p.1, l.18–20), and (ii) expanded the Discussion to more explicitly address vigor adaptability and the possible resulting consequences for the concept of vigor (see p.12, l.407–412).

      References

      (1) O. Labaune, T. Deroche, C. Teulier, and B. Berret, “Vigor of reaching, walking, and gazing movements: on the consistency of interindividual differences,” Journal of Neurophysiology, vol. 123, pp. 234–242, jan 2020.

      (2) L. Rigoux and E. Guigon, “A model of reward-and effort-based optimal decision making and motor control,” PLoS Computational Biology, vol. 8, pp. 1–13, Jan. 2012.

      (3) R. Shadmehr, J. J. O. de Xivry, M. Xu-Wilson, and T.-Y. Shih, “Temporal discounting of reward and the cost of time in motor control,” Journal of Neuroscience, vol. 30, pp. 10507–10516, aug 2010.

      (4) B. Berret and G. Baud-Bovy, “Evidence for a cost of time in the invigoration of isometric reaching movements,” Journal of Neurophysiology, vol. 127, pp. 689–701, feb 2022.

      (5) D. Verdel, O. Bruneau, G. Sahm, N. Vignais, and B. Berret, “The value of time in the invigoration of human movements when interacting with a robotic exoskeleton,” Science Advances, vol. 9, sep 2023.

      (6) K. Jimura, J. Myerson, J. Hilgard, T. S. Braver, and L. Green, “Are people really more patient than other animals? evidence from human discounting of real liquid rewards,” Psychonomic Bulletin & Review, vol. 16, pp. 1071–1075, dec 2009.

      (7) P. L. Gribble, L. I. Mullin, N. Cothros, and A. Mattar, “Role of cocontraction in arm movement accuracy,” Journal of Neurophysiology, vol. 89, pp. 2396–2405, may 2003.

      (8) B. Berret and F. Jean, “Why Don’t We Move Slower? The Value of Time in the Neural Control of Action,” Journal of Neuroscience, vol. 36, pp. 1056–1070, Jan. 2016.

      (9) R. Shadmehr and A. A. Ahmed, Vigor : neuroeconomics of movement control. The MIT Press, 2020.

      (10) D. Thura, A. M. Haith, G. Derosiere, and J. Duque, “The integrated control of decision and movement vigor,” Trends in Cognitive Sciences, vol. 29, pp. 1146–1157, Dec. 2025.

      (11) P. M. Fitts, “The information capacity of the human motor system in controlling the amplitude of movement,” Journal of Experimental Psychology, vol. 47, pp. 381–391, June 1954.

      (12) K. B. Reed and M. A. Peshkin, “Physical collaboration of human-human and human-robot teams,” IEEE Transactions on Haptics, vol. 1, pp. 108–120, July 2008.

      (13) G. Gowrishankar, A. Takagi, R. Osu, T. Yoshioka, M. Kawato, and E. Burdet, “Two is better than one: physical interactions improve motor performance in humans,” Scientific Reports, vol. 4, Jan. 2014.

      (14) A. Takagi, G. Ganesh, T. Yoshioka, M. Kawato, and E. Burdet, “Physically interacting individuals estimate the partner’s goal to enhance their movements,” Nature Human Behaviour, vol. 1, pp. 1–6, Mar. 2017.

      (15) A. Takagi, N. Beckers, and E. Burdet, “Motion plan changes predictably in dyadic reaching,” PLOS ONE, vol. 11, p. e0167314, Dec. 2016.

      (16) P. Mazzoni, B. Shabbott, and J. C. Cortes, “Motor control abnormalities in Parkinson’s disease,” Cold Spring Harbor Perspectives in Medicine, vol. 2, pp. a009282–a009282, Mar. 2012.

    1. Author response:

      Common responses:

      We thank the editors for considering our paper and the reviewers for their thoughtful and detailed feedback. Based on the comments, we will revise our manuscript to better describe how our approach differs from modeling strategies that are common in the field. We also aim to elaborate on the advantages of fastFMM and what scientific questions it is designed to answer. Finally, we will provide more background on our example analyses and the interpretation of the results.

      Within this response, “within-trial timepoints”, “time-varying predictors/behaviors”, and “signal magnitude” are used as specific examples of the general concepts of functional domain”, “functional co-variates”, and “functional outcome”, respectively. To make statements or examples more concrete, we may use the former neuroscience-specific terms when making general claims about functional models.

      - ncFLMM, cFLMM: non-concurrent or concurrent functional linear mixed models.

      - FUI: fast univariate inference. An approximation strategy to perform FLMM Cui et al. (2022).

      - fastFMM the R package that implements FUI.

      - CI confidence interval.

      Before specific line-by-line responses, we provide a brief comparison between cFLMM and fixed effects encoding models. All three reviewers suggested that fixed effects models could be an existing alternative to cFLMM (Reviewer 1 (1B), Reviewer 2 (2C), Reviewer 3 (3A)). Their shared comments highlight that our revision should articulate the advantages and applications of cFLMM relative to existing analysis strategies.

      Functional regression methods like cFLMM produce functional coefficient estimates that quantify how the magnitude of predictor-signal associations evolve across an ordered functional domain such as within-trial timepoints. Standard scalar outcome regression methods, like the GLMs specified in Engelhard et al. (2019), model these associations and their corresponding coefficients as fixed across the functional domain. While GLM encoding models may include time-varying predictors, these analysis strategies do not model the predictor–signal association as changing over the functional domain.

      Moreover, encoding models are less suited to hypothesis testing in clustered or longitudinal settings (e.g., repeated-measures datasets) and yield regression coefficient estimates that are only interpretable with respect to the units of the basis functions. In contrast, cFLMM provides time-varying coefficient estimates that are interpretable as statistical contrasts in terms of the original variables and produces hypothesis tests in clustered settings. cFLMM can be applied to datasets that define covariates in terms of the same flexible representations of covariates used in encoding models; this is a modeling choice rather than a methodological characteristic.

      The remainder of this provisional author response will respond to reviewers’ concerns line-by-line, approximately in the order they appear.

      Reviewer #1 (Public review):

      We thank Reviewer 1 for their comments, especially their efforts to provide first-hand experience with loading and applying fastFMM. We hope that recent improvements to fastFMM’s public release and vignettes address Reviewer 1’s concerns about ease-of-use.

      (1A) Overall, while they make a compelling case that this approach is less biased and more insightful, the implementation for many experimentalists remains challenging enough and may limit widespread adoption by the community.

      We believe the reviewer may have experimented with an old version of fastFMM, so their experience may not reflect recent rewrites and improvements. fastFMM v1.0.0+ is now stable, validated on CRAN, and contains new example data and step-by-step tutorials. We designed fastFMM’s model-fitting code to be similar to common GLM packages in R to reduce the learning curve for new users.

      (1B) …a clearer presentation of how common implementations in the field are performed (i.e. GLM) and how one could alternatively use the cFLMM approach would help.

      We will provide a clearer description of existing methods in the revised manuscript. Briefly, inference with fastFMM can accommodate large datasets that contain clustered data, repeated measures, or complex hierarchical effects, e.g., experiments with multiple animals and multiple trials per animal. When encoding models are fit to each cluster (e.g., animal, neuron) separately, we are not aware of a principled method to pool these cluster-specific models together to quantify uncertainty or yield an appropriate global hypothesis test.

      Reviewer #2 (Public review):

      Reviewer 2’s thoughtful feedback helped structure our points in the common response above, which we will refer to when applicable. In our response, we aim to clarify the problems that cFLMM solves and characterize the advantages in interpretability.

      (2A) The aim of incorporating variables that change within trial into this framework is interesting, and the technical implementation appears to be rigorous. However, I have some reservations as to whether the way in which variables that change within trial have been integrated into the analysis framework is likely to be widely useful, and hence how impactful the additional functionality of cFLMM relative to the previously published FLMM will be.

      We hope that the common response addresses these concerns. We were motivated to provide a concurrent extension of fastFMM based on our experience with statistical consulting in neuroscience research. Questions that benefit from a functional approach are common and often not adequately modeled with a non-concurrent approach, such as the variable trial length analysis we describe below.

      (2B) It is less clear that this approach makes sense for variables that change within trial…This partitioning of variance in the predictor into a between-trial component whose effect on the signal is modeled, and a within-trial component whose effect on the signal is not, is artificial in many experiment designs, and may yield hard to interpret results.

      We thank Reviewer 2 for highlighting a point that we did not adequately explain and that we will address further in the revision. The pointwise and joint CIs estimated by fastFMM account for uncertainty in the coefficient estimates due to variation in the predictors across within-trial timepoints. cFLMM targets a statistical quantity, or estimand, that is defined by trial timepoint specific effects, so the first step of our estimation strategy fits separate pointwise mixed models. However, models from every within-trial timepoint are then combined to calculate uncertainty and smooth the coefficient estimates. Thus, the widths of the pointwise and joint CIs depend on the estimated between-timepoint covariance and a smoothing penalty. Loewinger et al. (2025a) provides further details in Appendices 2 and 3, describing the covariance structure and detailing the power improvements of FUI compared to multiple-comparisons corrections.

      Other functional regression estimation strategies jointly fit the entire model with a single regression, e.g., functional generalized estimating equations Loewinger et al (2025b). However, these methods use basis expansions of the coefficients. In contrast, the encoding models mentioned in 2C below and Reviewer 3 (3A) apply basis-expansions of the covariates, and the resulting model does not capture how signal–covariate associations evolve across some functional domain. Although the first stage in the fastFMM approach fits pointwise linear models, this is only one of three steps in the estimation strategy. fastFMM yields coefficient estimates comparable to those that would be obtained from functional regression estimation strategies that jointly estimate the functional coefficients in a single regression. We mention this to distinguish between the target statistical quantity (functional coefficients) and the estimation strategy (pointwise vs. joint).

      (2C) …an alternative approach would be to run a single regression analysis across all timepoints, and capture the extended temporal responses to discrete behavioural events by using temporal basis functions convolved with the event timeseries. This provides a very flexible framework for capturing covariation of neural activity both with variables that change continuously such as position, and discrete behavioural events such as choices or outcomes, while also handling variable event timing from trial-to-trial.

      Our understanding is that the suggested approach aims to quantify the association between the outcome and within-trial patterns in covariates. This is a great question and we will incorporate a discussion of this into the revision. However, temporal basis functions convolved with the covariate time series cannot directly characterize these relationships. Encoding models can detect the contribution of predictors to neural signals while remaining agnostic to the precise relationship, but this flexibility can come at the cost of interpretability. The coefficients of the convolutions may not be translatable into a clear statistical contrast in terms of the original covariates.

      In our paper, we provide examples of cFLMM models with simple signal-covariate relationships. The coefficient estimates quantify the expected change in signal given a one unit change in the original predictors. Let 𝑌(𝑠) be the outcome and 𝑋(𝑠) be some covariate at within-trial timepoint 𝑠. For brevity, we will suppress subject/trial indices and random effects in the following notation. The coefficient at time point 𝑠 can be captured by the generic mean model

      𝔼[𝑌(𝑠) ∣ 𝑋(𝑠) = 1] − 𝔼[𝑌 (𝑥)|𝑋(𝑠) = 0].

      In contrast, the change in signal associated with patterns in within-trial covariates can be written as

      𝔼[𝑌 (𝑠<sub>1</sub>) ∣ 𝑋(𝑠<sub>2</sub>) = 1] − 𝔼[𝑌 (𝑠<sub>1</sub>) ∣ 𝑋(𝑠<sub>2</sub>) = 0]

      for all pairs of timepoints 𝑠<sub>1</sub>, 𝑠<sub>2</sub>. While simple lagged or offset outcome-predictor associations can be incorporated as covariates in cFLMM, the approach does not capture all within-trial timepoints 𝑠<sub>1</sub>, 𝑠<sub>2</sub>. Encoding models also do not target the above estimand. Instead, a full function-on-function regression could estimate the above. This topic can be incorporated into our revision and may be a future line of inquiry.

      (2D) In the Machen et al. data…From the resulting beta coefficient timeseries (Figure 3C) it is not straightforward to understand how neural activity changed as the subject approached and then received the reward. A simpler approach to quantify this, which I think would have yielded more interpretable coefficient timeseries would have been to align activity across trials on when the subject obtained the reward. More broadly, handling variable trial timing in analyses like FLMM which use trial aligned data, can be achieved either by separately aligning the data to different trial events of interest or by time warping the signal to align multiple important timepoints across trials.

      In this experiment, mice waited in a trigger zone, ran through a linear corridor, then received a food reward in the reward delivery zone of either water or strawberry milkshake Machen et al. (2026). Mice received different rewards between sessions but the same reward within all trials of a given session. This design complicated the analysis, as the reward type produced prominent differences in average latency (water: 3.3 seconds, milkshake: 2.0 seconds). The authors wanted to disentangle whether mean differences in the signal across reward types reflected differences in motivation to obtain the reward or differences in reaction to reward receipt.

      We agree that performing a reward-aligned analysis would be an intuitive approach to visualize the differences in average signal for mice that received milkshake compared to water. In fact, we provide a ncFLMM reward-aligned analysis in Figure S1 of Machen et al. (2025). We will add this analysis to the revision and thank the reviewer for the suggestion. We emphasize, however, that this method answers a different question. It does not identify how the signal change associated with receiving the milkshake evolves with respect to latency, especially if the relationship is non-linear. Time warping faces similar obstacles in this setting, especially since sufficiently flexible curve registration can induce similarity due purely to noise. Generally, time warping does not lend itself to hypothesis testing as it is unclear how to propagate uncertainty from the time warping model into final hypothesis tests.

      We believe cFLMM is an appropriate choice for the specific question, and we will revise the manuscript to better reflect its advantages. The functional coefficient estimates in Figures 3C-iii and 3C-iv provide insights that are not possible to derive from the proposed alternatives. For example, we can infer that for short latencies, we do not see a significant difference in signal magnitude for mice receiving water and mice receiving the milkshake. However, for latencies longer than around 2 seconds, receiving the milkshake is associated with an additional positive change in signal. We agree that we should make Figure 3C and the accompanying discussion more clear and thank Reviewer 2 for their feedback on interpretation.

      Reviewer 3 (Public review):

      (3A) …it is not clear what the conceptual or methodological advance of this work is. As it is written, the manuscript focuses on showing how concurrent regressors offer interpretation advantages over non-concurrent regressors. While the benefit of such time-varying regressors is supported by previous literature (e.g., Engelhard et al., 2020), it is not clear whether the examples provided in the current study clearly support the advantage of one over the other…

      We assume Reviewer 3 is referencing “Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons Engelhard et al. (2019). We hope that the Common response sufficiently contrasts the settings where each approach can be applied. Because these models have different goals and assumptions, they are appropriate for answering different questions.

      (3B) In this specific example, if the question is about speed and reward type, why variables such as latency to reward or a binary “reward zone vs corridor” (RZ) regressors are used instead of concurrent velocity (or peak velocity - in the case of the non-concurrent model)? Furthermore, if timing from trial start to reward collection is variable, why not align to reward collection, which would help in the interpretation of the signal and comparison between methods? Furthermore, while for the non-concurrent method, the regressors' coefficients are shown, for the concurrent one, what seems to be plotted are contrasts rather than the coefficients. The authors further acknowledge the interpretational difficulties of their analysis.

      Thank you for pointing out that we were not clear. This was mentioned by multiple reviewers and highlights the need to elaborate on our motivation in the revision. In this example, we wanted to investigate the change in signal-reward association as a function of within-trial timepoints, not the association between instantaneous velocity and the signal. “Slow” or “fast” means “mouse with below or above average latency”. We ask you to please refer to Reviewer 2 (2C) where we discuss why event alignment is an insufficient correction.

      The functional coefficient estimates in Figure 3C are interpreted as contrasts because the fixed effect coefficients capture the difference in expected signal between strawberry milkshake and water along the functional domain. An advantage of cFLMM is that it is easy to specify models in which the coefficients correspond to interpretable contrasts of the signal across conditions. The coefficient estimate shown in Figure 3B-ii also corresponds to a contrast because the estimates capture the difference in mean signal from strawberry milkshake and water. Equations (7) and (8) in the section “Materials and methods” and sub-section “Variable trial length analysis” provide additional details on the fixed effect coefficients. Based on this confusion, we will convert the two 1 x 4 sub-plots of 3B and 3C into two 2 x 2 sub-plots to avoid unintended direct comparisons.

      To contextualize how we “acknowledge the interpretational difficulties of [our] analysis”, we stated that a non-concurrent FLMM attempting to control for a time-based covariate is difficult to interpret. The concurrent FLMM provides a straightforward interpretation directly related to the question of interest, which we discuss above in Reviewer 2 (2D).

      (3C) Because the relation between behavioral variables and neuronal signal is not instantaneous, previous literature using fixed effects uses, for example, different temporal lags, splines, and convolutional kernels; however, these are not discussed in the manuscript.

      Thank you for this suggestion. All three reviewers raised this topic (see Reviewer 1 (1B), Reviewer 2 (2C), and the Common responses), and we will incorporate our response in the revision.

      (3D) From the methods, it seems that in the concurrent version of fastFMM, both concurrent and non-concurrent regressors can be included, but this is not discussed in the manuscript.

      This is an important point that we mentioned implicitly. In our cFLMM specification of the Jeong et al. (2022) model, “we incorporated trial-specific covariates for trial number and session, modeling these as increasing numerical values rather than identical categorical variables”, which are also plotted in Appendix 3. In Box 1, “if the functional covariate of interest is a scalar constant across the domain, the models fit by the concurrent and non-concurrent procedure are identical”. We will explicitly point out that cFLMM can perform inference on combinations of functional and constant covariates.

      (3E) The methodological advance is not clearly stated, apart from inputting into fastFMM a 3D matrix of regressors x trial x timepoint, instead of a 2D matrix of regressors x trial.

      Prior to our work described in this Research Advance, it was not obvious that the existing approximation approach in fastFMM could be generalized to cFLMM. During the writing of the article, a fastFMM user reached out for help with producing pseudo-concurrent FLMMs by duplicating rows in a nonconcurrent model, which both underscores the unmet need for cFLMMs and the difficulty in fitting them with available tools.

      The “under-the-hood” differences are described in Appendix 4. Concurrent FLMM with fast univariate inference was theoretically possible as early as Cui et al. (2022). The univariate step was straightforward, but guaranteeing “fast” and “inference” was not. We needed to verify, for example, that the method-of-moments estimation of the random effects covariance matrix generalized to cFLMM, which is not a trivial step. Characterizing whether the method achieved asymptotic coverage required extensive simulation studies (Figure 4, Appendix 2). Future work may focus on fully characterizing the asymptotic convergence in high noise or high complexity regimes.

      (3F) This manuscript is neither a clear demonstration of the need for concurrent variables, nor a 'tutorial' of how to use fastFMM with the added extension.

      We hope that the Common responses clarifies how cFLMM compares to existing approaches and fills a gap in the data analysis landscape for neuroscience. The fastFMM R package vignettes contain example analyses, and we intend for these files to be work in tandem with the manuscript. To provide more guidance for interested analysts, we can explicitly reference these tutorials within the revision.

      Planned revisions

      The following summary is not exhaustive.

      Writing additions:

      Per 1B, 2C and 3A, the Common responses will be incorporated in the revision.

      Per 2B, we will discuss function-on-function regression and explore how to estimate statistical contrasts for complex within-trial relationships. Relatedly, we will clarify that the CIs in fastFMM are constructed using an estimate of the within-trial covariance of the predictors, and clarify the definition of pointwise and joint CIs.

      Per 3D, we will explicitly state that concurrent FLMMs can include covariates that are constant over within-trial timepoints.

      Though we cannot prescribe a universally correct model selection procedure, we will mention that AIC, BIC, and other summary statistics can inform the specification of the random effects.

      Analysis modifications:

      Parts of Appendix 3 may be included in Figure 2 to directly address the question investigated by Jeong et al. (2022) and Loewinger et al (2024).

      When discussing Machen et al. (2025) data, the supplementary analysis with reward-aligned ncFLMM models might be added to clarify the ncFLMM/cFLMM difference.

      Per \ref{rvw2:encoding}, the additional analysis aimed at disentangling latency and reward in Machen et al.’s variable trial length data may be incorporated as an additional sub-figure in Figure 3.

      Aesthetic changes:

      Figure 3 will be reorganized to avoid unintended direct comparisons between the coefficients of the non-concurrent and concurrent model.

      Citations for Machen et al. (2026) will be updated to reflect publication of the preprint.

      The version number for fastFMM will be updated.

      References

      Cui E, Leroux A, Smirnova E, Crainiceanu CM. Fast Univariate Inference for Longitudinal Functional Models. Journal of Computational and Graphical Statistics. 2022; 31(1):219–230. https://doi.org/10.1080/10618600.2021.1950006, doi: 10.1080/10618600.2021.1950006, pMID: 35712524.

      Engelhard B, Finkelstein J, Cox J, Fleming W, Jang HJ, Ornelas S, Koay SA, Thiberge SY, Daw ND, Tank DW, Witten IB. Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature. 2019 Jun; 570(7762):509–513. https://www.nature.com/articles/s41586-019-1261-9, doi: 10.1038/s41586-019-1261-9.

      Jeong H, Taylor A, Floeder JR, Lohmann M, Mihalas S, Wu B, Zhou M, Burke DA, Namboodiri VMK. Mesolimbic dopamine release conveys causal associations. Science. 2022; 378(6626):eabq6740. https://www.science.org/doi/abs/10.1126/science.abq6740, doi: 10.1126/science.abq6740.

      Loewinger G, Cui E, Lovinger D, Pereira F. A statistical framework for analysis of trial-level temporal dynamics in fiber photometry experiments. eLife. 2025 Mar; 13:RP95802. doi: 10.7554/eLife.95802.

      Loewinger G, Levis AW, Cui E, Pereira F. Fast Penalized Generalized Estimating Equations for Large Longitudinal Functional Datasets. ArXiv. 2025 Jun; p. arXiv:2506.20437v1. https://pmc.ncbi.nlm.nih.gov/articles/PMC12306803/.

      Machen B, Miller SN, Xin A, Lampert C, Assaf L, Tucker J, Herrell S, Pereira F, Loewinger G, Beas S. The encoding of interoceptive-based predictions by the paraventricular nucleus of the thalamus D2R+ neurons. iScience. 2026 Jan; 29(1):114390. doi: 10.1016/j.isci.2025.114390.

    1. Author response:

      Reviewer #1:

      We appreciate the reviewer’s suggestions. In the revision, we will clarify which results are new and better position this work relative to our earlier publication. We will also expand the discussion of the functional implications of polymerase clustering and its cell-cycle dynamics.

      Regarding the condensate interpretation, we agree that the current evidence is suggestive but not definitive. In the revised manuscript, we will clarify how our measurements relate to commonly used criteria for condensate assemblies and revise the text to avoid overstating this interpretation. We will also add quantification to additional figures and revise the model diagram to more accurately reflect the conclusions supported by the data.

      Reviewer #2:

      We thank the reviewer for the positive assessment of the imaging quality. We agree that the manuscript would benefit from a broader discussion of possible models for the observed polymerase foci. In the revision, we will expand the discussion to include alternative interpretations, such scaffolded assemblies as suggested by the reviewer 3, and further clarify the properties of the RNA Pol II and RNA Pol III foci.

      Reviewer #3:

      We thank the reviewer for the positive evaluation of the study and the helpful suggestions. We agree that the current evidence is indicative but not sufficient to definitively demonstrate condensate formation. In the revision, we will revise the language and discuss alternative interpretations, including scaffolded assemblies. We will also provide additional quantifications for the relevant figures.

      Overall, we appreciate the reviewers’ suggestions and believe that the planned revisions will improve the clarity and impact of the manuscript.

    1. Author response:

      Reviewer #1:

      We appreciate the reviewer’s insightful suggestions. In the revised manuscript, we will provide quantitative analysis of Western blot data throughout the study to improve data robustness and reproducibility. In addition, we will expand the “Discussion” session to address the following points raised by the reviewer #1: (1) Potential mechanisms underlying the regulation of LAMP1 transcript levels by NINJ2; (2) Whether Ninjurin1 may play a similar role in regulating lysosomal membrane permeabilization (LMP); (3) The potential clinical implications of our findings, particularly in relation to cancer progression and therapeutic targeting.

      Reviewer #2:

      We thank the reviewer for the insightful and constructive suggestions, which would further deepen the mechanistic understanding of the NINJ2-LAMP1 pathway and its role in ferroptosis regulation. To address the reviewer’s concerns, we will clarify the interpretation of our findings, add quantitative analyses where appropriate, and expand the Discussion to acknowledge these important mechanistic questions and future research directions. Specifically, we will revise the Statistical Analysis section to clearly describe the statistical methods used, including whether corrections for multiple comparisons were applied where appropriate. We will further discuss the potential interaction domain(s) between NINJ2 and LAMP1. We will also discuss the potential role of NCOA4, a central mediator of ferritinophagy, in the NINJ2-FTH1-LAMP1 pathway. Finally, we will include a schematic model summarizing the proposed NINJ2-LAMP1-iron-ferroptosis axis to better illustrate the working model of our study.

    1. Author response:

      Reviewer 1 (Public review):

      Summary:

      This study aims to test whether human mate choice is influenced by HLA similarity while accounting for genome-wide relatedness, using the Himba as an evolutionarily relevant small-scale society population, unique among most HLA-mate choice studies. By comparing self-chosen ("love") and arranged marriages and using NGS-based 8-locus HLA class I and II sequences and genome-wide SNP data, the authors ask whether partners who freely choose each other are more HLA-dissimilar than those paired through social arrangements or random pairs. They further extend their work by examining functional differences in peptide-binding divergence among pairs and predicted pathogen recognition in potential offspring.

      Strengths:

      This study has many strengths. The most obvious is their ability to test for HLA-based mate choice in the Himba, a non-European, non-admixed, small-scale society population, the type of population that has been missing, in my opinion, from the majority of HLA mate choice studies. While Hedrick and Black (1997) used a similarly evolutionarily relevant remote tribe of native South Americans, they only considered 2 class I loci (HLA-A and HLA-B) at the first typing field (serological allele group) and did not have data for genome-wide relatedness. The Himba are also unique among previously studied populations because they have both socially arranged and self-chosen partnerships, so the authors could test if freely-chosen partners had lower MHC-similarity than assigned or randomly chosen partners.

      Another key strength of the study was the relatively large sample size (HLA allele calls from 366 individuals, 102 unrelated) and 219 individuals with HLA data, whole genome SNP data, and involved in a partnership.

      The study was also unique among HLA-mate choice studies for comparing peptide binding region protein divergence (calculated as the Grantham distance between amino acid sequences) among partner types and randomly generated pairs. This was also the first time I have seen a study use peptide binding prediction analysis of relevant human pathogens for potential offspring among partners to test if there would be a pathogen-relevant fitness benefit of partner selection.

      Weaknesses:

      My main concerns relate to the reliance on imputed HLA haplotypes and on IBD-based metrics in a region of the genome where both approaches are known to be problematic.

      First, several key results depend on HLA haplotypes inferred through imputation rather than directly observed sequence data. The authors trained HIBAG imputation models on Himba SNP data across the full 5 Mb HLA region using paired HLA allele calls from target capture sequencing (L251-253). However, the underlying SNP data were generated by mapping reads to a 1000 Genomes Yoruba reference, meaning that both SNP discovery and subsequent imputation depend on the haplotypes represented in that reference panel. As a result, the imputation framework is likely biased toward common haplotypes shared between the Himba and Yoruba populations, while rare or Himba-specific HLA alleles are less likely to be imputed accurately or at all. This limitation has been noted previously for HLA imputation, particularly for novel or low-frequency variants and for populations that are poorly represented in reference panels. While the authors compare (first-field) imputed alleles to sequenced alleles to assess imputation accuracy, this validation step itself may be biased toward the same common haplotypes that are easiest to impute. This becomes especially problematic if IBD is inferred using imputed haplotypes, because haplotype sharing would then primarily reflect common, reference-supported haplotypes, while true population-specific variation would be effectively invisible. In this scenario, downstream estimates of IBD sharing may be inflated for common haplotypes and deflated for rare ones, potentially biasing conclusions about haplotype sharing, selection, and mate choice at the HLA region.

      We appreciate the reviewer's concern, but would like to clarify two important misunderstandings in this assessment.

      First, the reviewer suggests that our SNP data were generated by mapping reads to a 1000 Genomes Yoruba reference, and that IBD inference may therefore be biased toward haplotypes common between the Himba and Yoruba. This is not the case. Our SNP genotype data were generated from the H3Africa and MEGAex genotyping arrays, which incorporated diverse reference variation to minimize ascertainment bias in non-European ancestries. No read mapping to a Yoruba reference genome was involved in SNP discovery or genotyping. The Yoruba 1000 Genomes data were used solely to provide an ancestry-matched recombination map for phasing and IBD calling–this would not bias IBD inference toward common Yoruba haplotypes. The reviewer's concern about imputation-driven inflation of IBD sharing for common haplotypes should not be relevant in our case.

      Second, regarding HLA haplotype resolution: we trained a bespoke HIBAG model directly on the Himba SNP array genotype data paired with ground-truth HLA allele calls from our own targeted HLA capture sequencing. This Himba-specific model was then used to impute HLA alleles from pseudo-homozygous genotypes derived by extracting phased SNP-based haplotypes across the HLA region for the same individuals. In this way we resolved the phase of the HLA allele calls.. To our knowledge, this paired-data approach to individual-level HLA haplotype resolution is novel; existing HLA haplotype resolution tools generally provide only population-level haplotype frequency estimates rather than individual-level phase assignments. We are confident in the reliability of the haplotypes we report. Resolved haplotypes were required to match the known targeted-sequencing HLA allele calls at a minimum of the first field for at least one allele, and both haplotypes could not be assigned to the same allele unless the individual's HLA allele calls were homozygous. Of 722 total haplotypes, 698 were successfully resolved under these criteria. We report results only on these confidently resolved haplotypes.

      Second, the interpretation of excess identity-by-descent (IBD) sharing in the HLA region is difficult given the well-documented genomic properties of this locus. The classical HLA region is highly gene-dense, structurally complex, and characterized by extreme heterogeneity in recombination rates, with pronounced hot- and cold-spots (Miretti et al. 2005; de Bakker et al. 2006, reviewed in Radwan et al. 2020). Elevated IBD in such regions can arise from low recombination, background selection, or demographic processes such as bottlenecks, all of which can mimic signals of recent positive selection. While the authors suggest fluctuating or directional selection, extensive haplotype sharing is also consistent with long-term balancing selection at the MHC (Albrechtsen et al. 2010) or recent demographic history in this population.

      We thank the reviewer for highlighting the difficulty in modeling selection at the HLA - a problem that deserves considerable attention. We acknowledge that demographic processes such as the documented Himba population bottleneck can result in elevated IBD sharing (Swinford et al. 2023, PNAS). However, our comparison of HLA IBD sharing rates against a genome-wide baseline is designed to address this: demographic processes affect all regions of the genome, so if the HLA region maintains elevated IBD sharing significantly above the genome-wide threshold, this provides meaningful evidence for a locus-specific effect beyond demographic history alone.

      We agree with the reviewer that the recombination landscape of the HLA region is complex, but this complexity itself is consistent with the region being a frequent target of selection. Previous HLA analyses have found that at the allele level, frequencies are consistent with balancing selection, while multi-locus haplotype frequencies are consistent with purifying selection and positive frequency-dependent selection (Alter et al., 2017), patterns that contribute to the complex recombination rate heterogeneity observed in the region. Recombination rate can be both a cause of extended haplotypes but also the consequence of selection against combinations of alleles.

      As Alter et al. note, the high levels of linkage disequilibrium observed among HLA alleles serve to limit the amount of diversity within HLA haplotypes, but balancing selection at the allelic level maintains multiple HLA haplotypes at high frequency across populations over long periods of time — so-called "conserved extended haplotypes" as we observe (Supplementary Figures 1 and 9). Regarding the specific selective mechanism, our results are not equally consistent with all forms of balancing selection. Albrechtsen et al. (2010) explicitly modeled overdominant balancing selection and demonstrated that equilibrium overdominance does not produce elevated IBD sharing as we observe — our results are therefore inconsistent with this mechanism. Instead, Albrechtsen et al. conclude that allele frequency change is required to generate elevated IBD, consistent with bouts of directional selection such as negative frequency-dependent or fluctuating positive selection. We will make explicit that while our findings do not support overdominance, they are consistent with these temporally dynamic forms of selection driving periodic allele frequency change at the HLA locus. We will also incorporate local recombination rate into Figure 4 to provide a comparison of local recombination rate across chromosome 6 with the observed areas of elevated IBD sharing.

      Alter, I., Gragert, L., Fingerson, S., Maiers, M., & Louzoun, Y. (2017). HLA class I haplotype diversity is consistent with selection for frequent existing haplotypes. PLoS computational biology, 13(8), e1005693.

      Beyond these main issues, there are several additional concerns that affect interpretation. Sample sizes and partnership counts are sometimes unclear; some figures would benefit from clearer scaling (Figure 1) and annotation (Figures S6 and S7), and key methodological choices (e.g., treatment of DRB copy number variation, no recombination correction in IBD calling) require further explanation. Finally, some conclusions, particularly those invoking optimality or specific selective mechanisms, are not directly tested by the analyses presented and would benefit from more cautious framing.

      We will clarify the presentation of partnership counts and sample sizes throughout the manuscript and improve the scaling and annotation of the flagged figures. Regarding DRB copy number variation, we will add explicit discussion of our analytical choices and their potential limitations. As described in our responses to the main concerns above, we will also provide more nuanced framing of the selective mechanisms consistent with our IBD results, avoiding conclusions that go beyond what our analyses directly support.

      Reviewer #2 (Public review):

      Summary:

      Evidence for the influence of MHC on mate choice in humans is challenging, as social structures and norms often confound the power of studying populations. This study uses an unusual, diverse, but relatively isolated population that allows a direct comparison of arranged and chosen partners to determine if MHC diversity is increased when choice drives mate choice. Overall, the authors use a range of genetic analyses to determine individual relationships alongside different measures of MHC diversity and potential selection pressures. The overall finding that there is no heterozygous dissimilarity difference between arranged and chosen partners. There is evidence of positive selection that may be a stronger driver, or at least it may mask other selection forces.

      Strengths:

      A rare opportunity to study human mate choice and genetic diversity. An excellent range of data and analysis that is well applied, and all results point to the same conclusion.

      Overall, this is a very well-written and concise paper when considering the significant amount of data and excellent analysis that has been undertaken.

      Weaknesses:

      (1) For the type of samples and data available, none are obvious.

      (2) Although this paper is clearly focused on humans, I was expecting more discussion around the studies that have been undertaken in animals. It is likely that between populations and species, there are different pressures that have driven the MHC evolution, but also mate choice.

      We will improve the framing of our project within the broader non-human MHC mate choice literature in our discussion.

      (3) The peptide presentation based on pathogen genomes is interesting but usually not significant. I wondered if another measure of MHC haplotype diversity to complement this would be the overall repertoire of peptides that could be presented, pathogen-based or otherwise. There is usually significant overlap in the peptides that can be presented, for example, between HLA-A and HLA-B, and this may reveal more significant differences between the alleles and haplotype frequencies.

      We would like to clarify that we did assess the unique pathogen peptides bound across all HLA class I and class II genes by each population's common haplotypes (Figures S12–S13). We acknowledge the reviewer's point that non-pathogenic peptides are also important — for example, binding with self-produced proteins. However, binding with self-produced proteins is more relevant to autoimmune risk, and the selective pressures involved are outside the scope of our current work, which focuses on pathogen-induced fluctuating directional selection and heterozygote advantage. Furthermore, selection on non-pathogenic peptide binding repertoires likely operates in the opposite direction to pathogen repertoire; whereas broader pathogen peptide binding is advantageous, broader self-peptide binding risks excessive immune activation.

      Reviewer #3 (Public review):

      The study investigates MHC-related mate choice in humans using a sample of couples from a small-scale sub-Saharan society. This is an important endeavour, as the vast majority of previous studies have been based on samples from complex, highly structured societies that are unlikely to reflect most of human evolutionary history. Moreover, the study controls for genome-wide diversity, allowing for a test of the specificity of the MHC region, as theoretically predicted. Finally, the authors examine potential fitness benefits by analysing predicted pathogen-binding affinities. Across all analyses, no deviations from random pairing are detected, suggesting a limited role for MHC-related mate choice in a relatively homogeneous society. Overall, I find the study to be carefully executed, and the paper clearly written. Nevertheless, I believe the paper would benefit if the following points were considered:

      (1) The authors claim (p. 2, l. 85) that their study is the first to employ a non-European small-scale society. I believe this claim is incorrect, as Hendrick and Black (1997) investigated MHC similarity among couples from South American indigenous populations.

      We thank the reviewer for this important clarification. Our claim was intended to be more specific: to our knowledge, this is the first study to investigate HLA-based mate preferences in a non-European small-scale society while explicitly controlling for genome-wide relatedness. Hedrick and Black (1997) did not include genome-wide relatedness controls, which is a critical distinction given that ancestry-assortative mating can produce spurious patterns of HLA similarity or dissimilarity in the absence of such correction. We will make this qualification explicit in the revised manuscript.

      (2) Regarding the argument that in complex societies, mating with a random individual would already result in sufficient MHC dissimilarity (p. 2, 78), see the paper from Croy et al. 2020, which used the largest sample to date in this research area.

      We thank the reviewer for this reference. In our revision, we will incorporate Croy et al. (2020) into our discussion and use it as a reference for comparing the Himba’s probability of highly homozygous offspring given population allele frequencies. This comparison will help support our claim that background HLA diversity in the Himba is sufficiently high so that any unrelated partner is already likely to yield adequately dissimilar offspring—a scenario that would reduce the selective benefit of active HLA-based mate choice and could mask any such preference even if it exists.

      (3) Dataset. As some relationships are parallel, I assume that certain individuals entered the dataset multiple times. This should be explicitly reported in the Methods. If I understand the analyses correctly, this non-independence was addressed by including individual identity as a random effect in the model - the authors should confirm whether this is the case. I am also wondering to what extent so-called "discovered partnerships" may affect the results. Shared offspring may be the outcome of short or transient affairs and could have a different social status compared with other informal relationships. Would the observed patterns change if these partnerships were excluded from the analyses?

      The reviewer is correct that individuals appear multiple times in the dataset—some individuals are members of multiple known partnerships, and all individuals are additionally included many times across the full set of possible random heterosexual pairings that meet our age and relatedness criteria. This non-independence is explicitly addressed in our dyadic linear mixed models by including female ID and male ID as random effects, which account for each individual's unique contribution to their similarity scores across all pairings, both real and random. We explain this explicitly in the (n) Statistical Models section of the methods section.

      Regarding discovered partnerships: we grouped these with reported informal partnerships in the current analyses due to modest sample sizes. We agree this is worth examining more carefully and will test, in our revision, whether treating discovered partnerships as a separate category, or excluding them entirely, meaningfully affects our results. We will report these analyses as a sensitivity check.

      (4) How many pairs were due to relatedness closer than 3rd degree? In addition, why was 4th degree relatedness used as a threshold in some of the other analyses?

      This information is reported in the (n) ‘Statistical Models section of the Methods’. No pairs were found to be closer than 3rd degree relatives. No arranged marriages were related at 3rd degree or closer; 1 love match marriage and 2 informal partnerships discovered through pedigree analysis were found to be 3rd degree relatives.

      Regarding the difference in relatedness thresholds: we used a 4th degree cutoff to define the unrelated set of individuals for allele and haplotype frequency analyses (n=102), as even 3rd degree relatives would inflate allele frequency estimates. In contrast, we permitted 3rd degree relatives in the background distribution for the partnership analyses to reflect the stated cultural preference for cousin marriages in arranged unions—excluding them would have made the background distribution less representative of the actual mating pool. We explain both decisions in Methods sections (d) and (n).

      (5) I was surprised by the exclusion of HIV, given that Namibia has a very high prevalence of HIV in the general population (e.g., Low et al. 2021).

      While HIV prevalence is indeed high in Namibia generally, the Himba are a relatively isolated population and, based on personal communication with Dr. Ashley Hazel—who has extensive field experience studying sexually transmitted infections in the Himba (see references 36, 52, 53, and 54)—there is no evidence of HIV transmission within this population. Dr. Hazel's expertise on this question was the basis for our exclusion of HIV from the pathogen list.

      (6) It appears that age criteria were applied when generating random pairs (p. 8, l. 350). Could the authors please specify what they consider a realistic age gap, and on what basis this threshold was chosen? As these are virtual couples used solely to estimate random variation within the population, it is not entirely clear why age constraints are necessary. Would the observed patterns change if no age criteria were applied?

      We will clarify this in our revision, but we restricted random couples to have an age gap within the range observed in actual, known partnerships (the woman is maximum 16 years older than then man and minimum 53 years younger than the man). We included this criteria to make sure random couples represented the best approximation of background, realistic partners. Our age gap criteria was quite permissive due to the large range observed in our actual pairs and we do not imagine it significantly impacted our results.

      (7) I think it would be helpful for readers if the Results section explicitly stated that real couples did not differ from randomly generated pairs. At present, only the comparison between chosen and arranged pairs is reported.

      We would like to clarify that for each analysis we explicitly report both the effects of chosen and arranged partnerships relative to the background distribution intercept, and the pairwise contrast between chosen and arranged partnerships. The intercept of each model is derived from the full background distribution of random opposite-sex pairings meeting our age and relatedness criteria, providing a null expectation under random mating. A non-significant effect for both partnership types therefore indicates that neither arranged nor chosen partnerships differ from random mating with respect to the metric in question. We describe this explicitly in the Statistical Models section of the Methods, but we will ensure this interpretation is stated more prominently in the Results section of the revised manuscript to avoid any confusion.

      (8) I appreciate the separate analyses of pathogen-binding properties for MHC class I and class II, given their functional distinctiveness. For the same reason, I would welcome a parallel analysis of MHC sharing conducted separately for class I and class II loci.

      We can incorporate separate HLA similarity/log odds of homozygous offspring analyses for class 1 and class 2 in our revision.

      (9) I think the Discussion would benefit from a more detailed comparison with previous studies. In addition, the manuscript does not explicitly address limitations of the current study, including the relatively limited sample size given the extensive polymorphism in the MHC region.

      We will expand our discussion in the revision to provide a more detailed comparison with previous studies, including Croy et al. (2020), and will add an explicit limitations section incorporating suggestions from multiple reviewers on more careful framing of optimality and specific selective mechanisms. Regarding sample size, we acknowledge this as a genuine limitation given the extensive polymorphism of the MHC region. However, our unrelated sample size used for allelic diversity estimated is comparable to previous studies in African populations (Figure 1), and our dataset is uniquely comprehensive in combining HLA class I, class II, genome-wide SNP data, and partnership data within the same individuals—a combination that enables the genome-wide relatedness correction that distinguishes our study from much of the prior literature.

      References

      Hedrick, P. W., & Black, F. L. (1997). HLA and mate selection: no evidence in South Amerindians. The American Journal of Human Genetics, 61(3), 505-511.

      Croy, I., Ritschel, G., Kreßner-Kiel, D., Schäfer, L., Hummel, T., Havlíček, J., ... & Schmidt, A. H. (2020). Marriage does not relate to major histocompatibility complex: A genetic analysis based on 3691 couples. Proceedings of the Royal Society B, 287(1936), 20201800.

      Low, A., Sachathep, K., Rutherford, G., Nitschke, A. M., Wolkon, A., Banda, K., ... & Mutenda, N. (2021). Migration in Namibia and its association with HIV acquisition and treatment outcomes. PLoS One, 16(9), e0256865.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The current study by Xing et al. establishes the methodology (machine vision and gaze pose estimation) and behavioral apparatus for examining social interactions between pairs of marmoset monkeys. Their results enable unrestrained social interactions under more rigorous conditions with detailed quantification of position and gaze. It has been difficult to study social interactions using artificial stimuli, as opposed to genuine interactions between unrestrained animals. This study makes an important contribution for studying social neuroscience within a laboratory setting that will be valuable to the field.

      Strengths:

      Marmosets are an ideal species for studying primate social interactions due to their prosocial behavior and the ease of group housing within laboratory environments. They also predominantly orient their gaze through head movements during social monitoring. Recent advances in machine vision pose estimation set the stage for estimating 3D gaze position in marmosets but require additional innovation beyond DeepLabCut or equivalent methods. A six-point facial frame is designed to accurately fit marmoset head gaze. A key assumption in the study is that head gaze is a reliable indicator of the marmoset's gaze direction, which will also depend on the eye position. Overall, this assumption has been well supported by recent studies in head-free marmosets. Thus the current work introduces an important methodology for leveraging machine vision to track head gaze and demonstrates its utility for use with interacting marmoset dyads as a first step in that study.

      Weaknesses:

      One weakness that should be easily addressed is that no data is provided to directly assess how accurate the estimated head gaze is based on calibrations of the animals, for example, when they are looking at discrete locations like faces or video on a monitor. This would be useful to get an upper bound on how accurate the 3D gaze vector is estimated to be, for planned use in other studies. Although the accuracy appears sufficient for the current results, it would be difficult to know if it could be applied in other contexts where more precision might be necessary.

      Please see our detailed responses to the reviewer comments below.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes novel technique development and experiments to track the social gaze of marmosets. The authors used video tracking of multiple cameras in pairs of marmosets to infer head orientation and gaze and then studied gaze direction as a function of distance between animals, relationships, and social conditions/stimuli.

      Strengths:

      Overall the work is interesting and well done. It addresses an area of growing interest in animal social behavior, an area that has largely been dominated by research in rodents and other non-primate species. In particular, this work addresses something that is uniquely primate (perhaps not unique, but not studied much in other laboratory model organisms), which is that primates, like humans, look at each other, and this gaze is an important social cue of their interactions. As such, the presented work is an important advance and addition to the literature that will allow more sophisticated quantification of animal behaviors. I am particularly enthusiastic with how the authors approach the cone of uncertainty in gaze, which can be both due to some error in head orientation measurements as well as variable eye position.

      Weaknesses:

      There are a few technical points in need of clarification, both in terms of the robustness of the gaze estimate, and possible confounds by gaze to non-face targets which may have relevance but are not discussed. These are relatively minor, and more suggestions than anything else.

      Please see our detailed responses to the reviewer comments below.

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) It appears that the accuracy of the estimated gaze angle must be well under the size of the gaze cone (+/- 10 degrees), but I can't find any direct estimate of the accuracy even if it is just a ballpark figure. On Lines 219-233 is where performance is described for viewing images and video on a monitor, where it should be possible to reconstruct the point of gaze on the monitor while images and video are shown, in order to evaluate the accuracy of the system for where the marmoset is looking? Would you see eye position traces that would show fixation clusters around those images or videos with stationary points on the monitor much like that seen for head-fixed animals looking at faces on a screen (Mitchell et al, 2014)? If so, what is the typical spread of those clusters during fixations on an image, both in terms of the precision by RMS error during a fixation epoch and the spread around the images at different locations (accuracy of projection)? For example, if gaze clusters were always above the displayed images one would have an idea that the face plane is slightly offset above the true gaze direction. It is not completely clear how well the face plane and corresponding gaze cone do in describing gaze direction in space, but the monitor stimuli could be used as an initial validation of it.

      We thank the reviewer for this important suggestion regarding the quantitative validation of gaze accuracy. We agree that, when animals view stimuli presented on a monitor, the estimated gaze direction can be evaluated by examining the spatial distribution of gaze–monitor intersection points relative to stimulus locations.

      To address this, we generated a new figure (Fig. S2A) analyzing gaze behavior following the onset of video stimuli presented at different locations on the monitor. Specifically, we selected video clips in which human annotators verified that the marmosets were looking at the monitor. Consistent with prior work in head-fixed marmosets (Mitchell et al., 2014), we observe clustering of gaze–monitor intersection centers within and around the corresponding stimulus locations after stimulus onset. These clusters provide an empirical validation that the estimated gaze direction aligns with stimulus position in space.

      Importantly, unlike the head-fixed preparation used in Mitchell et al. (2014), marmosets in our study were freely moving. As a result, they do not exhibit prolonged, stationary fixations on the monitor, and fixation clusters are therefore more diffuse. This increased spread reflects natural head and body motion rather than limitations of the gaze estimation method itself. Despite this, gaze intersection points remain spatially localized to the vicinity of the presented stimuli across different monitor locations.

      We did observe small offsets in some gaze clusters relative to stimulus centers; however, these offsets were not systematic across stimulus locations or animals. Crucially, there was no consistent bias (e.g., clusters appearing uniformly above or below stimuli) that would indicate a systematic misalignment of the face plane or gaze cone relative to true gaze direction. Together, these observations support the conclusion that the face-plane-based gaze cone provides an accurate estimate of gaze direction in space, with precision well within the ±10° aperture of the gaze cone.

      While the freely moving component of the behavior precludes direct estimation of fixation RMS error comparable to head-fixed paradigms, the observed stimulus-locked clustering serves as an initial validation of both the accuracy and practical utility of our approach under naturalistic conditions.

      (2) A second major comment is about clarity in the writing of the results and discussion. At the end of the manuscript, a major takeaway is the difference between familiar and unfamiliar dyads, that males show more interest in viewing females including unfamiliar females, but for familiar females, this distinction is also associated with being likely to look at them if they look at the male, and then to engage in joint gaze with them after looking at them, which indicates more of a social interaction than simply monitoring them when they are unfamiliar. Those aspects of the results could be emphasized more in the topic sentences of paragraphs presenting data to support those features of the gaze data (at present is buried at the ends of results paragraphs and back in the discussion).

      We thank the reviewer for this insightful suggestion. We have restructured the Results and Discussion sections to lead with the primary social takeaways rather than technical descriptions (Tracked changes in Word). Specifically, we now emphasize the distinction between "social monitoring" (characteristic of unfamiliar dyads) and "active social coordination" (characteristic of familiar dyads).

      (1) Topic Sentences: We revised the topic sentences of all Results paragraphs to immediately highlight the findings regarding male interest and the influence of familiarity on reciprocation.

      (2) Conceptual Framework: We added a conceptual distinction in the Discussion, explaining that while unfamiliar marmosets maintain high social attention through "peripheral monitoring" and proximity-dependent joint gaze, familiar pairs exhibit sophisticated, distance-independent coordination and gaze reciprocation.

      (3) Clarification of Male Interest: We explicitly stated that while male interest in females is high regardless of familiarity, it manifests as persistent monitoring in unfamiliar pairs versus a more aware, reciprocal state in familiar pairs.

      Minor comments:

      (1) Methods:

      a) Lines 522-539: The 200 continuous frames used for validation of the model containing two marmosets are sufficient to test how well it generalizes to other animals outside the training set? The RMSE reported, does it vary for animals inside vs outside the training set? To what extent does the RMSE, in image pixels, translate into accuracy in estimating the gaze direction, for example, as assessed by estimating error when marmosets look at images or video on the monitor?

      To address the reviewer’s concern regarding generalization and the translation of pixel RMSE to angular accuracy, we emphasize that the six facial features selected are prominent, high-contrast features across the species. Consequently, we observed that the RMSE remained consistent for marmosets both inside and outside the training set. To quantify how pixel-level tracking error translates into gaze estimation accuracy, we performed a sensitivity analysis. We simulated landmark (i.e., feature) jitter by sampling perturbations from circular distributions based on our empirical data (2.4 pixels for eyes; 2.1 pixels for the central blaze). Our results, illustrated in uthpr response image 1, show that 90% of the resulting head gaze deviations fall within 10°, which is consistent with the angular threshold used for our gaze cone model. This confirms that the reported RMSE provides sufficient precision for reliable gaze estimation.

      Author response image 1.

      Probability distribution of gaze angular deviation under circular perturbation. The histogram (blue) represents the change in reconstructed gaze angle (degrees) following stochastic perturbation of facial features. To simulate real-world variance, noise was sampled from circular distributions with radii of 2.4 pixels (eyes) and 2.1 pixels (central blaze). The red curve represents an exponential fit to the empirical data (y=ae<sup>bx</sup>, a=0.9591, b=0.1813. Approximately 90% of the reconstructed gaze deviations remain below 10°, indicating the model’s localised stability under pixel level coordinate jitter.

      b) Line 542-43: Is there any difference between a rigid model fit to the six facial points, versus using the plane defined by the two eyes and central blaze in terms of direction accuracy (in the ground truth validation)? How does the "semi-rigid" set of six points (mentioned also in lines 201-203) constrain the fit of the three points (two eyes and central blaze) that define the normal plan for the gaze cone?

      We thank the reviewer for the opportunity to clarify our geometric model. The plane used to define the gaze cone's origin was indeed determined by the two eyes and the central blaze. However, a plane defined by only three points was insufficient to determine a unique gaze direction, as the normal vector was ambiguous (it could point forward through the face or backward through the head).

      To resolve this, we utilized the relative positions of the two ear tufts. Because the tufts are anatomically situated behind the eyes and blaze, these additional points provide the necessary spatial context to orient the gaze vector correctly. In our validation, we found that the mouth does not alter the angular accuracy compared to a 3-point fit, supporting that the facial features are correctly identified.

      We use the term 'semi-rigid' to describe the six-point constellation because their relative spatial configurations remain stable across individuals and expressions, imposing a biological constraint on the model. This prevents unphysical warping of the face frame during 3D reconstruction and ensures the gaze cone remains anchored to the animal's true midline.

      (2) Results:

      a) Lines 203-205: What is the distinction between gaze orientation (defined by facial plane, 3D vector) and gaze direction (defined by ear tufts) ... is gaze direction in the 2D x-y plane? Why are two measures needed or different? It does not appear gaze orientation is used further in the manuscript and perhaps could be omitted.

      We appreciate the reviewer’s comment regarding the terminology. We have replaced all instances of ‘gaze orientation’ with ‘gaze direction’ to ensure consistency throughout the manuscript.

      To clarify, both terms referred to the same 3D unit vector. The ear tufts were not used to define a separate 2D measure; rather, they served as posterior anatomical anchors to resolve the 3D polarity of the normal vector (ensuring the vector points 'forward' from the face rather than 'backward'). Gaze direction was calculated in 3D space and was not restricted to a 2D x-y plane. We have clarified this in the revised Methods section (Lines 203–205) to avoid further ambiguity.

      b) Line 215-216: why is head-gaze velocity put in normalized units instead of degrees visual angle per second? How was the normalization performed (lines 549-557)? It would be simpler to see velocity as an angular speed (degrees angle per second) rather than a change in norms.

      We thank the reviewer for this suggestion. We agree that the expression is misleading.

      (1) We have replaced "face norm" with "face normal vector" (N) throughout the manuscript to clarify that we are referring to the 3D unit vector perpendicular to the facial plane.

      (2) Lines 224-225 and the corresponding Methods section (Lines 599-609) have been updated to reflect this change in units and terminology.

      We chose to use the change in the face normal vector in normalized units for our primary calculations because it allows for efficient spatiotemporal smoothing and is computationally robust at the very low thresholds required for our stability analysis. However, to address the reviewer's concern regarding interpretability, we have verified that our threshold of 0.05 normalized units corresponds to an angular velocity of 2.87 degrees/frame duration [33ms]. Since we are operating at very small angular changes, the Euclidean distance between unit vectors is a near-linear proxy for the angular displacement in radians.

      c) Lines 215-216: How do raw gaze traces appear over time ... are there gaze saccades and then stable fixations, or does it vary continuously? A plot of the gaze trace might be useful besides just showing velocity with a threshold, to evaluate to what extent stable fixation vs shifts are distinct.

      Author response image 2.

      Time course of gaze, angular velocity and stability, thresholding. The plot illustrates the temporal dynamics of the face normal vector velocity used to define stable gaze states. The blue trace represents the raw gaze velocity calculated in normalised units. The red dashed line demotes the empirical cut off threshold of 0.05 units per frame.

      To clarify the temporal dynamics of marmoset head movements, we have provided a representative time course of head gaze velocity as shown in Author response image 2. The data clearly show a "saccade-and-fixate" pattern: large, distinct spikes in velocity (representing rapid head redirections) are separated by periods of relative stability.

      While minor high-frequency fluctuations in the raw trace (blue) may be attributed to facial feature detection noise, they remain significantly below our stability threshold (red dashed line). By applying this threshold, we successfully isolated biologically relevant "stable fixations" from "head saccades," ensuring that our subsequent social gaze analysis is based on periods of intentional head gaze direction.

      d) Lines 237-286: The writing in this section does not emphasize the main results. There seem to be three takeaway points that could be emphasized better in the topic sentences of each of the paragraphs: i) Marmosets tended to spend most of their time on either end of the elongated box, not in the middle, ii) Males spent more time near the front of the box near the other animal than females, iii) Familiar pairs spent more time closer to each other.

      To address this comment, we have reorganized this section to lead with the three key behavioral findings:

      (1) We now state clearly in the topic sentence that marmosets preferred the ends of the arena over the middle.

      (2) We have highlighted the finding that males spend significantly more time near the inner edge (closer to the partner) than females, irrespective of familiarity.

      (3) We emphasized that familiar pairs maintain closer and more dynamic social distances over time, whereas unfamiliar pairs tend to move further apart as a session progresses.

      e) Line 303: It would be useful to see time traces of head velocity of each member of the pair and categorization over time of the gaze event types. A stable epoch must be brief on the order of 100-200ms. It is unclear how distinct the stable fixation epochs are from the moments when the gaze is shifting. Also, the state transition analysis treats each stable epoch like one event, and then following a gaze movement by either of the pair, the state is defined again, is that correct?

      We defined stable epochs as continuous periods where the face normal vector velocity remained below 0.05 normalized units for both animals. This ensures that a "gaze state" is only categorized when both marmosets have relatively fixed head orientations. As shown in the provided time traces in Author response image 2), the velocity profile is characterized by sharp peaks (head saccades) and clearly defined troughs (fixations). Further, we generated a probability histogram of stable head-gaze epoch durations (Author response image 3). The median duration of these stable epochs is 200ms, which aligns with biological expectations for fixation durations in primates and confirms that these states are distinct from the high-velocity shifts.

      The reviewer’s interpretation is correct. Our Markov chain model treats each stable epoch as a single event. A transition occurs when at least one animal moves (exceeding the velocity threshold), resulting in a new stable epoch where the relative gaze state is re-evaluated. This approach allows us to model the sequence of social interactions as a series of discrete behavioral decisions.

      Author response image 3.

      Temporal characteristics of stable gaze, head gaze, epochs. The histogram illustrates the probability distribution of the duration (ms) of stablegaze behaviour epochs. A minimum duration threshold of 100 ms was applied to exclude transient, non-purposeful head gazes.

      f) Lines 316-326: Some general summarizing statements to lead this paragraph would be useful. It seems that familiar pairs are more likely to participate in joint gaze, especially when close to each other, and perhaps, that males tended to gaze at females more than the reverse. Is there any notion that males were following the gaze of females?

      We thank the reviewer for these suggestions. We have revised the topic sentences of this section to lead with a summary of the social takeaways, specifically highlighting the higher level of male interest and the shift toward reciprocal coordination in familiar pairs.

      The reviewer correctly identified an important dynamic. Our transition analysis (Fig. 4D) confirms that males in both familiar and unfamiliar dyads frequently follow the female's gaze. This is evidenced by a robust transition probability (~17%) from "Male-to-Female Partner Gaze" (blue node) to "Joint Gaze" (green node). We found that this gaze-following behavior was a general feature of the dyads and did not differ significantly by familiarity, which is why it was not previously emphasized. However, we have now added a statement to the Results (Lines 358-365) to explicitly describe this male-led gaze-following behavior.

      g) Lines 328-337: Can these findings in this paragraph be summarized more generally? It seems males view unfamiliar females longer, whereas for familiar females they are more likely to reciprocate viewing if being viewed by them and then to join in joint gaze with them. Would that event, viewing a female and then a transition to joint gaze, not be categorized as a gaze-following event?

      We have now summarized the paragraph to emphasize the transition from vigilant monitoring in unfamiliar pairs to reciprocal awareness in familiar pairs.

      Regarding "longer" viewing: We have clarified the text to specify that males' interest in unfamiliar females is persistent and robust rather than simply "longer" in a single duration. The high recurrence probability signifies that males consistently re-orient their gaze back to the unfamiliar female even if the interaction is briefly interrupted by movement.

      Regarding gaze following and joint gaze: The reviewer asks if the transition from viewing a female to joint gaze constitutes gaze following. We agree that a transition from "male-to-female gaze" to "joint gaze" is indeed a gaze-following event (as noted in our previous response regarding Fig. 4D). However, the specific transition discussed in this paragraph (female-to-male gaze to male-to-female gaze) is different: it describes a "reciprocal" event where the male responded to being looked at by looking back at the female, while the female simultaneously shifted her gaze away. Since the two gaze cones did not intersect on an external object or on each other's faces simultaneously at the end of this transition, it was not categorized as joint gaze or gaze following.

      h) Lines 339-351: It is not clear why gazing at the region surrounding a female's face (as opposed to the face itself) reflects "gaze monitoring tied to increased social attention (Dal Monte et l, 2022). This hypothesis could be expanded to make the prediction clear in this paragraph.

      We thank the reviewer for identifying the need to clarify the hypothesis regarding the region surrounding the face. We have expanded this paragraph to explain why gazing at the peripheral facial region reflects social monitoring.

      In many primate species, direct and sustained eye contact can be often interpreted as a threat or a challenge, particularly between unfamiliar individuals. Peripheral monitoring (looking at the area immediately surrounding the face) can strategically allow an animal to stay highly attentive to the partner's head orientation, gaze direction, and facial expressions—all critical for anticipating future actions—while minimizing the risk of social conflict. By demonstrating that unfamiliar marmosets utilize this peripheral strategy significantly more than familiar ones, we provide evidence that social attention in novel dyads is characterized by a social monitoring strategy that balances the need for information with social caution.

      i) Lines 354-373: This section seems to suggest again that in a familiar male/female pair, the male is more likely to follow the female gaze and establish a joint gaze, and this occurs less with the unfamiliar pair only when closer in distance. Some summary sentences to begin the paragraph could help frame what to expect from the results.

      We have added summarizing topic sentences to this section to clarify the relationship between familiarity and the spatial distribution of joint gaze.

      (3) Discussion:

      Lines 380-463: This section reads more clearly than most of the results, where it is often hard to connect the data plots to their significance for behavior. Overall, I believe the manuscript could be improved by setting up a hypothesis before presenting results in the paragraphs demonstrating the data. Some of the main findings appear in text from lines 413-419 (somewhat hidden even in discussion).

      We sincerely appreciate the reviewer’s positive feedback on the clarity of the latter sections of our Discussion. We have taken the suggestion to heart and have performed a comprehensive restructuring of the Results and Discussion sections.

      (1) We have moved the key takeaways, specifically the distinction between vigilant monitoring in unfamiliar pairs and reciprocal coordination in familiar pairs, from the end of the Discussion to the topic sentences of the relevant Results paragraphs.

      (2) We established a unified framework throughout the manuscript that connects pixel-level tracking stability to the biological "saccade-and-fixate" movement pattern, and ultimately to the social dimensions of sex and familiarity.

      (4) A couple of additional questions to address in the discussion:

      a) Can you speculate why in this behavioral context the marmosets do not engage in reciprocal gaze where both are simultaneously looking at each other (lines 297-301)? How low is the incidence of this event, numerically, in comparison to the other events (1 in 1000 events, etc)?

      We appreciate the reviewer’s interest in the lack of reciprocal gaze (mutual eye contact).

      Numerically, reciprocal gaze events occurred with a frequency of approximately 1 in 500 social gaze events (comprising less than 0.2% of our social dataset). Given this extreme scarcity, we felt that any statistical comparisons across sex or familiarity would be underpowered and potentially misleading, leading to our decision to focus on partner and joint gaze states.

      We speculate that the rarity of reciprocal gaze is primarily due to our task-free experimental setup. Unlike directed cooperation tasks where animals must look at each other to coordinate actions for a reward (e.g., Miss & Burkart, 2018), our study focused on task-free interactions. In a free-moving context without a common goal, marmosets may prioritize monitoring the environment or the partner’s actions (joint or partner gaze) over direct, sustained mutual eye contact, which can sometimes be perceived as a confrontational or high-arousal signal in primate social hierarchies.

      b) Does a transition from a marmoset viewing their partner, to a joint gaze, count as a gaze-following event? It appears the authors are reluctant to use that terminology. What are the potential concerns in that terminology? Is there a concern that both animals orient to the same object that is salient to them without it being due to their gaze?

      A transition from a partner-directed gaze to a joint gaze is indeed a gaze-following event. We distinguish these events from a transition between partner-directed gazes (e.g., male-to-female to female-to-male). In these "reciprocation" cases, once the second animal looked at the first, the first animal shifted their gaze away. Because the two gaze cones did not intersect on a common object at the end of the transition, I classified such events as a social exchange of attention rather than a coordinated gaze-following event.

      Reviewer #2 (Recommendations for the authors):

      I do have a few questions/points for clarification:

      (1) While your approach appears to be able to track head orientation when the face is occluded or turned away from the primary cameras, how was the accuracy of this validated? Since you have multiple cameras, it should be possible to make the estimate using the occluded cameras and then validate using the non-occluded ones.

      We appreciate the reviewer's comment regarding the validation of our tracking during partial occlusions.

      We wish to clarify that our system does not utilize "primary" vs "auxiliary" cameras. Rather, any two or more cameras that capture facial features with high confidence are used to triangulate the points into 3D space. Thus, the "primary" cameras are dynamically determined frame-by-frame based on the animal's orientation.

      To validate the accuracy of our 3D reconstruction during occlusions, we utilized a "projection-validation" approach. As demonstrated in Figure 2B (left panel), when the face is turned away from a specific camera, leaving only the back of the head visible, we used the facial features triangulated from the other non-occluded cameras and projected them onto the image plane of the occluded camera. The fact that these projected points aligned precisely with the expected (but hidden) anatomical landmarks confirms the global accuracy of our 3D model.

      We previously benchmarked this approach using a three-camera system where we triangulated coordinates via two cameras and successfully projected them onto the third camera's image plane with high accuracy. This ensures that even when a camera is "blind" to the face, the 3D position estimated by the rest of the array remains robust.

      (2) Marmosets, like other non-human primates, also look at other body postures for their social communication, though admittedly marmosets are far more likely to look others in the face than larger primates. The tail-raised genital displays come to mind. While the paper primarily focuses on shared vs deviant gaze, and I believe tracks not only the angle of viewing towards the target but also the distance from the face (please clarify if I am wrong), it would also be useful to know how often marmosets are looking at each other beyond just the face. This is particularly interesting if the gaze towards the partner varies depending on whether that partner was generally oriented towards the gazer, or not. For the joint gaze, were there conditions in which the two were looking at the same target, but had body postures that were not oriented toward one another (i.e. looking at a distant target beyond one of the animals, like looking over someone else's shoulder)?

      We thank the reviewer for highlighting the importance of body postures and non-facial social signals (e.g., genital displays) in marmoset communication.

      At the inception of this project, we explored tracking multiple body parts. However, due to the marmoset's dense fur and the lack of distinct skeletal markers under naturalistic lighting, human annotators and early automated tools struggled to achieve the precision required for high-resolution 3D kinematics. While recent advances in whole-body tracking now make these questions approachable, we chose to focus on the face normal vector because it provided the most robust and high-confidence signal for social orientation in our current dataset.

      Regarding the "looking over the shoulder" scenario, we utilized a hierarchical classification system to prevent wrong categorization. Intersection with the partner’s face always took priority. If one animal’s gaze cone contained the other’s face, the state was classified as "Partner Gaze", even if the two gaze cones happened to intersect at a distant point in space. This ensures that "Joint Gaze" specifically captures instances where both animals ignore one another’s face regions to focus on a shared external target.

      We agree that the relationship between body posture and head gaze is a fascinating area for future research. In our current setup, while "Joint Gaze" requires the head-gaze cones to intersect, the animals' bodies could indeed be oriented in different directions (e.g., looking at a distant target behind the partner). We have added a note to the Discussion acknowledging that incorporating whole-body gestures would further deepen the understanding of marmoset social ethology.

      (3) In the introduction, (line 70), you raise the question of ecological relevance, using rhesus in laboratory settings. This could use a little more expansion/explanation of the limitations of current/past approaches.

      We thank the reviewer for the suggestion to expand upon the ecological limitations of traditional laboratory paradigms.

      We have substantially revised the Introduction (Lines 70–82) to provide a more detailed critique of past approaches. Specifically, we now highlight how traditional head-fixed or screen-based paradigms decouple eye movements from natural head-body dynamics and lack the reciprocal, multi-agent complexity found in real-world social environments (e.g., Land, 2006; Shepherd, 2010). By contrasting these constraints with the spatially and socially embedded nature of marmoset interactions, we clarify why a more naturalistic, quantitative approach is necessary to understand the true dynamics of social gaze. These additions provide a stronger theoretical foundation for our move toward a free-moving experimental model.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors show experimentally that, in 2D, bacteria swim up a chemotactic gradient much more effectively when they are in the presence of lateral walls. Systematic experiments identify an optimum for chemotaxis for a channel width of ~8µm, a value close to the average radius of the circle trajectories of the unconfined bacteria in 2D. These chiral circles impose that the bacteria swim preferentially along the right-side wall, which indeed yields chemotaxis in the presence of a chemotactic gradient. These observations are backed by numerical simulations and a geometrical analysis.

      Reviewer #3 (Public review):

      This paper addresses, through experiment and simulation, the combined effects of bacterial circular swimming near no-slip surfaces and chemotaxis in simple linear gradients. The authors have constructed a microfluidic device in which a gradient of L-aspartate is established, to which bacteria respond while swimming while confined in channels of different widths. There is a clear effect that the chemotactic drift velocity reaches a maximum in channel widths of about 8 microns, similar in size to the circular orbits that would prevail in the absence of side walls. Numerical studies of simplified models confirm this connection.

      The experimental aspects of this study are well executed. The design of the microfluidic system is clever in that it allows a kind of "multiplexing" in which all the different channel widths are available to a given sample of bacteria.

      The authors have included a useful intuitive explanation of their results via a geometric model of the trajectories. In future work it would be interesting to analyze further the voluminous data on the trajectories of cells by formulating the mathematical problem in terms of a suitable Fokker-Planck equation for the probability distribution of swimming directions. In particular, this might help understand how incipient circular trajectories are interrupted by collisions with the walls and how this relates to enhanced chemotaxis.

      The authors argue that these findings may have relevance to a number of physiological and ecological contexts. As these would be characterized by significant heterogeneity in pore sizes and geometries, further work will be necessary to translate the present results to those situations.

      Thanks to the referees' input and more work, we think our revised manuscript now meets the high standard of eLife

      Recommendations for the authors:

      The importance of the circular swimming chirality for the observed phenomenon could be further emphasized by actually using the word "chiral" or "chirality" in the text. Also indicating what would change is swimming were counterclockwise rather then clockwise would help the reader understand the key significance of chirality.

      We thank the reviewer for this insightful suggestion. We agree that the chirality of the surface interaction is central to the observed phenomenon and should be explicitly highlighted to improve the reader's understanding.

      In response, we have incorporated the terms "chiral" and "chirality" throughout the manuscript (Abstract, Introduction, Results, and Discussion) to emphasize this aspect. Furthermore, we have added a specific explanation in the Results section (the last paragraph of subsection “The cells in the right sidewall region dominated the chemotaxis of E. coli with lane confinements”) detailing the hypothetical scenario of counter-clockwise swimming. We clarify that in such a case, the hydrodynamic interaction would cause cells to veer left, resulting in up-gradient accumulation along the left sidewall rather than the right. We believe these additions significantly improve the clarity of the underlying physical mechanism.

      Reviewer #1 (Recommendations for the authors):

      I still have several comments that the authors may want to consider for the last version.

      - The run and tumble behavior of the cells at the surface remains puzzling and would need some more explanation in the text. Tumbles with no significant reorientation angle amount largely to smooth swimmers. How can a model based on run-and-tumbles be used to explain the difference between LSW and RSW?

      We apologize for the lack of clarity regarding the surface run-and-tumble behavior. While it is true that surface tumbles often result in smaller reorientation angles compared to bulk swimming, they are not negligible and play a critical role in the observed asymmetry. As shown in the tumble angle distributions (Fig. 2E and 2F), the probability of a tumble angle exceeding π/2 is approximately 9% for sidewall trajectories and 30% for the middle area. This tumbling behavior leads to differences between the left sidewall (LSW) and right sidewall (RSW) in two key ways:

      First, as detailed in our geometric analysis (Fig. 6), running cells following stable clockwise circular paths are geometrically favored to reach the RSW. Because cells moving up-gradient (towards the RSW) experience suppressed tumbling, they maintain these stable circular trajectories and accumulate effectively. Conversely, cells moving down-gradient (towards the LSW) experience enhanced tumbling. These frequent interruptions distort the circular trajectories required to reach the LSW, resulting in fewer bacteria entering the LSW compared to the RSW.

      Second, once at the wall, the difference in tumbling frequency dictates retention. Majority of LSW cells are swimming down-gradient (LSW-DG) and thus tumble more frequently, increasing their probability of escaping the wall. Majority of RSW cells are swimming up-gradient (RSW-UG), suppressing tumbles and increasing their residence time at the wall.

      The relevant clarifications have been included in the last paragraph of “Results” in the manuscript.

      - Figure 5B would need more explanation. I still don't understand the different behaviors for the right and left side walls at small widths. Is it noise really or a more complex behavior? Since most of these calculations are based precisely on the shape of these curves it would be useful to discuss them in more detail.

      We apologize for the lack of clarity. The behavior observed at small widths in Figure 5B is not noise; rather, it reflects the idealized nature of our simulation model.

      In the simulation, bacteria were modeled as active particles without explicit steric exclusion for the flagella and cell body. Consequently, simulated cells retain the ability to reorient and turn freely even in very narrow lanes (w ≤ 6 μm), allowing the geometric sorting mechanism (which favors the RSW) to function efficiently even at small widths. This is why the simulation shows a distinct difference between LSW and RSW proportions in this regime.

      In the experimental reality, however, the finite size of the bacterial body and flagella creates steric hindrance. In narrow channels, this physical constraint restricts the cells' ability to turn, thereby disrupting the circular swimming mechanism required to sort cells into the RSW. As a result, experimental data shows that the proportions of LSW and RSW cells tend to equalize in narrow channels (e.g., w = 6 μm in Fig. 4B), leading to a lower chemotactic drift velocity than predicted by the simulation.

      We have added a discussion regarding these steric effects and the deviation at narrow widths to the Results section (the penultimate paragraph of subsection "Simulation of E. coli chemotaxis within lane confinement") in the revised manuscript.

      - The importance of the chirality of the circular trajectories, although essential, remains insufficiently mentioned in the text.

      We have incorporated the terms "chiral" and "chirality" throughout the manuscript (Abstract, Introduction, Results, and Discussion) to emphasize this aspect. Furthermore, we have added a specific explanation in the Results section (the last paragraph of subsection “The cells in the right sidewall region dominated the chemotaxis of E. coli with lane confinements”) detailing the hypothetical scenario of counter-clockwise swimming.

      - It would be useful to color-code the trajectories of Figure 1B and alike with time.

      Thank you for the suggestion. Now the trajectories in Fig. 1B have been redrawn. Distinct colors denote individual trajectories, with color intensity darkening to indicate time progression.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Lenz and colleagues describes a detailed examination of the epigenetic changes and alterations in subnuclear arrangement associated with the activation of a unique var gene associated with placental malaria in the human malaria parasite Plasmodium falciparum. The var gene family has been heavily studied over the last couple of decades due to its importance in the pathogenesis of malaria, its role in immune avoidance, and the unique transcriptional regulation that it displays. Aspects of how mutually exclusive expression is regulated have been described by several groups and are now known to include histone modifications, subnuclear chromosomal arrangement, and in the case of var2csa, regulation at the level of translation. Here the authors apply several methods to confirm previous observations and to consider a possible role for DNA methylation. They demonstrate that the histone mark H3K9me3 is found at the promoters of silent genes, var2csa moves away from other var gene clusters when activated, and while DNA methylation is detectable at var genes, it does not seem to correlate with transcriptional activation/silencing. Overall, the data and approach appear sound.

      Strengths:

      The authors employ the latest methods for epigenetic analysis of histone marks, transcriptomic analysis, DNA methylation, and chromosome conformation. They also use strong selection pressure to be able to examine the gene var2csa in its active and silent state. This is likely the only paper that has used all these methods in parallel to examine var gene regulation. Thus, the paper provides readers with confidence in the interpretation of independent methods that address a similar subject.

      We thank the reviewer for this positive assessment. We appreciate the recognition that our study combines complementary approaches including histone mark profiling, transcriptomic analysis, DNA methylation mapping, and chromosome conformation capture in parallel to the use of strong population selection that enables a controlled comparison of var2csa in active versus silent states. We agree that the convergence of independent methods strengthens confidence in the interpretation.

      Weaknesses:

      The primary weakness of the paper is that none of the conclusions are novel and the overall conclusions do not shed much new light on the topic of var gene regulation or antigenic variation in malaria parasites. The paper is largely confirmatory. The roles of H3K9me3 and subnuclear localization in var gene regulation are well established by many groups (including for var2csa), albeit in some cases using alternative methods. The only truly unique aspect of the manuscript is the description of 5mC at var2csa when the gene is transcriptionally active or silent. Here the authors demonstrate that the mark has no clear role in transcriptional activation or silencing, however, this will not be surprising to many in the field who have previously cast doubt on a regulatory role for this modification.

      While we agree that some individual features of var gene regulation, including H3K9me3 enrichment, have been described previously, our study integrate for the first time several layer of gene regulation on the clinically important var2csa locus using phenotypically homogeneous placental-binding parasite populations. As expected, var2csa activation coincided with a loss of H3K9me3 at the locus. However, using high-resolution chromatin conformation capture (to our knowledge, this experiment had never been applied to phenotypically homogeneous parasite populations), we quantified the repositioning of var2csa relative to heterochromatic telomeric clusters. We further assessed DNA methylation in this framework and show that 5-methylcytosine is broadly present at var genes and may correlate with transcript level, but is uncoupled from transcriptional activation, repression, and switching. Together, these findings integrate transcriptional state, chromatin marks, and 3D genome organization at var2csa and argue against models in which 5mC acts as a primary regulatory switch for var gene expression.

      Reviewer #2 (Public Review):

      Summary:

      Dr Lenz and colleagues report on their in vitro studies comparing gene transcription and epigenetic modifications in Plasmodium falciparum NF54 parasites selected or not selected for adhesion of the infected erythrocytes (IEs) to the placental IE adhesion receptor chondroitin sulfate A (CSA).

      The authors report that selection led to preferential transcription of var2csa, the gene that encodes the VAR2CSA-type PfEMP1 well-established as the PfEMP1 mediating IE adhesion to CSA. They confirm that transcriptional activation of var2csa is associated with distinct depletion of H3K9me3 marks and that transcriptional activation is linked to repositioning of var2csa. Finally, they provide preliminary evidence potentially implicating 5mC in the transcriptional regulation of var2csa.

      Strengths:

      The study confirms previously reported features of gene transcription and epigenetic modifications in Plasmodium falciparum.

      As stated in our response to Reviewer 1, our study combines, for the first time, complementary approaches, including transcriptomic analysis, histone mark profiling, DNA methylation mapping, and chromosome conformation capture, together with strong population selection to enable a controlled comparison of var2csa in active versus silent states.

      Weaknesses:

      No major new finding is reported. The strength of the evidence presented is mostly solid, although certain elements, e.g., the role of 5mC in transcriptional regulation of var2cs, appear preliminary and incomplete.

      While we agree that no major new finding is reported, we were able to use for the first time a high-resolution chromatin conformation capture method to quantify the repositioning of var2csa relative to heterochromatic telomeric clusters. We also further assessed that 5-methylcytosine is present at var genes and may correlate with transcript level, but is uncoupled from transcriptional activation, repression, and switching. Together, these findings integrate for the first time transcriptional state, chromatin marks, and 3D genome organization at var2csa and argue against models in which 5mC acts as a primary regulatory switch for var gene expression.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the Authors):

      (1) In the second paragraph of the introduction, the authors state "....such as the shielding of the parasite antigens expressed on pRBC surfaces by other cells and the evasion of splenic clearance (8)." What does "other cells" mean here?

      We thank the reviewer for this comment. We have clarified the cell type in the text.

      (2) In their interpretation of the Hi-C data, the authors conclude that the var2csa expressing parasites display "tighter heterochromatin control of var gene regions" and "interactions around other silent var genes were increased" and "an overall compaction of telomere ends and var gene-containing intrachromosomal regions". While the data appear to show that this is true when they compare the two parasite populations, I am concerned that the authors might be misinterpreting the data. It is important to note that the NF54CSAh line is heavily selected to be nearly entirely homogeneous for var gene expression while the NF54 line is exceptionally heterogeneous. This is shown in Figure 1G. Thus, any chromosomal arrangement specific for var gene expression in the unselected NF54 population will be similarly heterogeneous and therefore could appear less tight. In other words, interactions around silent var genes and overall compaction of telomere ends might be identical between individual parasites within these populations, but appear tighter or more compact in the var2csa expressing line simply because it is a homogeneous population. Perhaps this is what the authors meant to convey, however as currently written, it seems that they conclude the expression of var2csa results in a unique change in chromosome organization. A better comparison would be two populations homogeneously expressing different var genes, one expressing var2csa and one expressing an alternative var gene. Such lines can be generated through clonal isolation or selection for binding to a different host receptor.

      We thank the reviewer for this comment. The reviewer is correct, and we have revised the Discussion section of the manuscript to clarify this issue.

      (3) The title of the last section of the Results is "Distribution of DNA methylation influences gene expression overall but does not mediate transcriptional activation and switching in antigenic variation". This is an overstatement. The authors show that DNA methylation is absent at var gene promoter regions and enriched in coding regions, but there they provide no evidence that it "influences gene expression overall". This is speculation. Lastly, when the authors examined 5mC occupancy across genes, did they normalize for GC content of the DNA sequences? GC content is known to increase dramatically in coding regions (particularly in var genes) and thus could explain the distribution of this mark. If the authors corrected for this, they should directly state this in the results section. If they did not, they should explain why they don't think this property of the P. falciparum genome explains the distribution of 5mC.

      There is often a misconception in the field that DNA methylation is primarily confined to CpG islands in promoter regions and functions mainly as a repressor of transcription. However, in contrast to promoter methylation, methylation within gene bodies is generally associated with higher levels of gene expression, suggesting a role in facilitating transcription elongation. Gene-body methylation can also repress internal promoters, thereby preventing spurious transcription initiation within the gene. In addition, it has been shown to influence alternative splicing by affecting RNA polymerase II elongation kinetics.

      We propose that, in Plasmodium, DNA methylation may be associated with priming genes for transcriptional activity rather than repressing transcription. Specifically, higher methylation levels may facilitate recruitment of the RNA polymerase II transcriptional machinery to enable transcription. In Figure 4B, we observe higher levels of DNA methylation in the first exon of highly expressed genes in both the NF54 and NF54CSAh lines. Interestingly, we also detect high levels of methylation across most introns of the var genes, introns that must be transcribed, cannot be degraded, and are essential for var gene regulation, suggesting a possible sequence-recognition function. We have edited the manuscript to improve clarity.

      (4) In the legend to Figure 3D, the authors state that the centromeres are shown in blue, however in the figure they appear to be grey while var2csa is blue.

      We have revised the figure legend accordingly.

      Reviewer #2 (Recommendations For The Authors):

      I recommend using the term "transcription" rather than "expression" when discussing events at the gene level.

      We have revised the manuscript accordingly.

      I also recommend using the term "adhesion" to describe the physical interaction between infected erythrocytes and adhesion receptors rather than adherence", which should be reserved to describe non-physical affinity (e.g., beliefs, faith).

      We have revised the manuscript accordingly.

      Important new evidence regarding transcriptional regulation of var genes in general and var2csa in particular should be discussed and cited.

      We have revised the manuscript accordingly.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The "number sense" refers to an imprecise and noisy representation of number. Many researchers propose that the number sense confers a fixed (exogenous) subjective representation of number that adheres to scalar variability, whereby the variance of the representation of number is linear in the number.

      This manuscript investigates whether the representation of number is fixed, as usually assumed in the literature, or whether it is endogenous. The two dimensions on which the authors investigate this endogeneity are the subject's prior beliefs about stimuli values and the task objective. Using two experimental tasks, the authors collect data that are shown to violate scalar variability and are instead consistent with a model of optimal encoding and decoding, where the encoding phase depends endogenously on prior and task objectives. I believe the paper asks a critically important question. The literature in cognitive science, psychology, and increasingly in economics, has provided growing empirical evidence of decision-making consistent with efficient coding. However, the precise model mechanics can differ substantially across studies. This point was made forcefully in a paper by Ma and Woodford (2020, Behavioral & Brain Sciences), who argue that different researchers make different assumptions about the objective function and resource constraints across efficient coding models, leading to a proliferation of different models with ad-hoc assumptions. Thus, the possibility that optimal coding depends endogenously on the prior and the objective of the task, opens the door to a more parsimonious framework in which assumptions of the model can be constrained by environmental features. Along these lines, one of the authors' conclusions is that the degree of variability in subjective responses increases sublinearly in the width of the prior. And importantly, the degree of this sublinearity differs across the two tasks, in a manner that is consistent with a unified efficient coding model.

      Comments on revisions:

      The authors have done an excellent job addressing my main concerns from the previous round. The new analyses that address the alternative model of "no cognitive noise and only motor noise" are compelling and provide quantitative evidence that bolsters the paper's overall contribution. The authors also went above and beyond by reanalyzing the Frydman and Jin (2022) dataset to provide new and very interesting analyses that provide an additional out of sample test of the model proposed in the current paper.

      Reviewer #2 (Public review):

      Summary:

      This paper provides an ingenious experimental test of an efficient coding objective based on optimization as a task success. The key idea is that different tasks (estimation vs discrimination) will, under the proposed model, lead to a different scaling between the encoding precision and the width of the prior distribution. Empirical evidence in two tasks involving number perception supports this idea.

      Strengths:

      - The paper provides an elegant test of a prediction made by a certain class of efficient coding models previously investigated theoretically by the authors. The results in experiments and modeling suggest that competing efficient coding models, optimizing mutual information alone, may be incomplete by missing the role of the task.

      - The paper carefully considers how the novel predictions of the model interact with the Weber/Fechner law.

      Weaknesses:

      The claims would be even more strongly validated if data were present at more than two widths in the discrimination experiment (also noted in Discussion).

      Reviewer #3 (Public review):

      Summary:

      This work investigates whether human imprecision in numeric perception is a fixed structural constraint or an endogenous property that adapts to environmental statistics and task objectives. By measuring behavioral variability across different uniform prior distributions in both estimation and discrimination tasks, the authors show that perceptual imprecision increases sublinearly with prior width. They demonstrate that the specific exponents of this scaling (1/2 for estimation and 3/4 for discrimination) can be derived from an efficient-coding model, wherein decision-makers optimally balance task-specific expected rewards against the metabolic costs of neural coding. The revised manuscript expands this framework to accommodate logarithmic representations and validates the core model against an independent dataset of risky choices.

      Strengths:

      The authors have effectively addressed my previous concerns with rigorous additions:

      (1) The mathematical formulation has been revised into a discrete signal accumulation framework, making the objective function and resource trade-offs much more transparent and mathematically tractable.

      (2) The incorporation of the logarithmic representation resolves prior ambiguities regarding structural constraints.

      (3) The new split-half analysis effectively addresses the temporal dynamics of adaptation. The stability of the sublinear scaling across the experiment provides solid evidence that human subjects utilize rapid, top-down modulation to adjust their encoding strategy when explicitly informed about the environment.

      (4) Validating the derived scaling exponents on an independent risky-choice dataset robustly supports the generalizability of the theoretical framework beyond a single cognitive domain.

      Weaknesses:

      The methodological and theoretical issues raised in the first round have been thoroughly resolved, and the evidence supporting the claims regarding response variance is convincing.

      There is one remaining theoretical point that warrants discussion to provide a complete picture of the proposed generative model. The manuscript exquisitely models and predicts response variance (imprecision), but it remains largely silent on the closed-form predictions for the mean estimation (i.e., bias). Under the assumption of optimal Bayesian decoding combined with specific encoding schemes (e.g., linear vs. logarithmic), the model implicitly generates mathematical predictions for the subjects' mean estimates. Specifically, varying the scaling exponent (α) and the prior width (w) should systematically alter the predicted bias in different conditions.

      While fitting or explicitly explaining this mean bias is not strictly necessary for the core claims regarding variance scaling, acknowledging what the optimal decoder analytically predicts for the mean estimation-and how it aligns or contrasts with typical empirical observations-would strengthen the theoretical transparency of the paper.

      We thank the reviewers for their attention to our revised manuscript. We are very glad that the reviewers seem satisfied with how we have addressed their concerns. The paper is now stronger than in its first iteration.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have no further requests for the authors, I congratulate the authors on a great paper.

      Reviewer #2 (Recommendations for the authors):

      No further suggestions.

      Reviewer #3 (Recommendations for the authors):

      In the Figure 2b caption, the phrase "from which the numbers of dots are sampled" appears to be a typo carried over from the estimation task. It should likely read "from which the numbers are sampled", as the discrimination task uses Arabic numerals rather than dot arrays.

      We thank the reviewers for their attention to our revised manuscript. We are very glad that the reviewers seem satisfied with how we have addressed their concerns. The paper is now stronger than in its first iteration.

      Reviewer #3 points out that we have focused on the subjects’ response variability, and we did not report the mean estimates. We agree that the reader could reasonably expect to see this. We now include this in Figure 6.

      The subjects exhibit the typical patterns observed in numerosity-estimation task (most notably, the ‘central tendency of judgment’). The dotted line shows the predictions of the best-fitting model (with 𝛼 = 1/2) with the logarithmic encoding, which reproduces the subjects’ main behavioral patterns.

      We have slightly revised the manuscript. The revised version includes this Figure, in Methods (p. 28). We have modified the text of the Methods accordingly (bottom of p. 27), and we now refer to this analysis in the main text (line 6 of p. 5). We have also corrected the typo noted by Reviewer #3 (caption of Fig. 2b).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The integration of single-cell datasets across species is a powerful approach to understanding how cell types and patterns of gene expression have evolved. Current methods to perform such integrations require multiple steps: clustering, the integration itself, and downstream differential expression analysis. In this study, the authors describe a new approach, called ANTIPODE, that combines these steps by integrating deep learning with interpretable decoding and linear modeling. This method builds on previous deep learning approaches to dataset integration, namely SCVI and scANVI, that employ a variational autoencoder to model single-cell RNA-sequencing datasets. However, gene expression estimates from these previous methods are challenging to interpret due to non-linear decoding from the modeled latent space. ANTIPODE seeks to address this issue by using a single-layer decoder coupled to a linear model to estimate patterns of differential expression, e.g. differential expression by coexpression module, across cell types, etc.

      The authors apply their framework to a large single-cell RNA-seq dataset (~1.8M cells) containing cells from the central nervous systems of humans, macaques, and mice spanning in utero developmental time points. They identify a consensus set of cell clusters across each species. They find that ANTIPODE performs at least as well as SCVI in terms of species integration and batch correction. The authors demonstrate several use cases of this integrated approach by analyzing differential expression that correlates with gene structure, the evolution of expression differences in neuropeptide systems, and the anatomical and phylogenetic variation in neurodevelopmental timing.

      Strengths:

      ANTIPODE is a welcome addition to techniques that integrate large single-cell RNA-seq datasets across multiple species. The approach's simultaneous inference of cell clusters, integration manifolds, and differential expression should streamline analysis pipelines whose elements are often disjointed and sometimes work at cross purposes.

      Weaknesses:

      The authors note several limitations to their method that will be targets for future development. First, clustering "resolution" is inferred from the data and cannot be tuned as with other approaches. Second, because of the linear decoding, ANTIPODE does not accommodate combining datasets obtained from different modalities (e.g. single-cell with single-nucleus RNA-seq). Third, as currently implemented, ANTIPODE does not explicitly model phylogenetic relationships. However, the authors describe an extension that could enable this, enhancing the power of multiple species integrations. A weakness with the current manuscript is the organization and readability of the figures. The supplemental figures in particular need to be restructured and reformatted to increase their interpretability.

      We thank this reviewer for their positive feedback regarding the utility of the model and how it may simplify challenging evolutionary analysis.

      We acknowledge that the figures are a bit difficult to read, and we will improve annotation and tidiness to make them more accessible to the reader.

      We have implemented changes for an ANTIPODE version 0.2 version which includes regression of gene expression differences on a phylogeny. We have updated the github with this “antipode.phylo” module. For this study, the 3 species case is equivalent for flat or phylogenetic regression, where for example mouse up is equivalent to primate down, so we will do not plan to redo the analyses in the text using this new version.

      We have already provided examples for running ANTIPODE on our own and public datasets (https://github.com/mtvector/scANTIPODE/tree/main/real_examples), as well as in-line documentation of classes and functions, however it is true that these may be insufficient information for new users. We will provide true explanatory tutorials for both to address the reviewer’s concerns. ANTIPODE version 0.1 is currently installable from either github or PyPI.

      Reviewer #2 (Public review):

      Summary:

      This work presents ANTIPODE, a bilinear generative model developed for the simultaneous integration and identification of cell types across species and developmental stages using single-cell RNA-seq data. ANTIPODE is inspired by scANVI, a well-established semi-supervised framework for single-cell transcriptomics. After describing its implementation, the authors use ANTIPODE to integrate data from 15 species comprising 1,854,767 cells. Then, the authors benchmark ANTIPODE against commonly used methods (scVI, Harmony, and Scanorama) using two snRNAseq datasets and report comparable or superior performance. They then return to the initial integrated dataset and analyse patterns of gene expression evolution. Finally, they leverage the model to study the "later-is-larger" concept, evaluating the relationship between gene expression, developmental timing and structure size and finding gene expression signatures of this concept.

      Strengths:

      A major strength of the paper is that ANTIPODE employs a bilinear decoding architecture, which produces more interpretable model parameters while performing at least as well as existing, more opaque nonlinear integration approaches.

      The authors demonstrate the utility of ANTIPODE by integrating single-cell mRNA sequencing data from mouse, macaque, and human brains and confirming general principles regarding developmental timing and cell-type-specific gene expression divergence.

      They also propose a conceptually interesting framework for studying gene expression evolution: instead of focusing solely on differentially expressed genes between homologous cell types, they jointly model gene expression across developmental states and species-specific divergence, allowing them to define and analyse four categories of differential expression.

      Finally, the authors' conclusions are well supported by the analyses presented, although these conclusions remain relatively conservative and reinforce already established principles.

      Weaknesses:

      A central weakness of the paper is its limited accessibility to a broad audience. Despite attempting to keep computational details in the supplement, the main text still uses substantial jargon, undermining the goal of providing an intuitive explanation of the model. The figures are also insufficiently annotated (e.g., colour schemes in Figure 2 heatmap, bubble plot details in Figure 3, entropy definition in Figure 3), and the figure legends are overly brief and lack essential information. I strongly recommend that the authors revise both text and figures to improve clarity and readability.

      Similarly, the materials and methods lack a lot of information about the implementation of the model, the statistical tests used, the calculations of entropy, etc.

      The study sits between tool development and biological discovery but does not fully commit to either. As a result, it cannot be evaluated as a full benchmarking study, yet it also does not provide new biological insights that are validated experimentally.

      Finally, the GitHub repository for ANTIPODE is not yet functional and lacks documentation or tutorials, making it impossible to assess usability or reproducibility.

    1. Author response:

      We would like to thank the Editor and the three Reviewers for their detailed assessment of our manuscript and their constructive feedback. We found the suggestions valuable for refining our work. Before presenting the fully updated manuscript, we would like to clarify a few points in this initial response. This manuscript identifies a heat-induced, alternativelyspliced short isoform of PIF4 (PIF4-S) that contributes to the physiological responses observed in heat-stressed etiolated seedlings. First, we agree with all Reviewers that including PIF4 protein data will strengthen our findings an more definitely demonstrate the generation of a protein-coding alternative isoform under heat stress. Therefore, this will be one of our main priorities in the revision. Evidence for the functionality of this alternative isoform is clearly demonstrated by the distinct phenotypes exhibited by transgenic lines expressing either the long or the short versions of PIF4. Nevertheless, we agree that a more comprehensive characterization of these lines, as well as of the pif4 mutant lines, will further strengthen the demonstration of the functional relevance of this alternative splicing event. In addition, we will extend the phenotypic analysis of the PIF4-S lines to heat stress conditions. Importantly, the phenotypes observed in these lines suggest that additional molecular mechanisms may act in parallel with this alternative splicing event to regulate development in heat-stressed etiolated seedlings. As proposed by Reviewer #1, other PIFs may be involved in this response, and we will address this possibility. We will also provide new experimental data to show that alternative splicing in this gene is specific to heat stress and does not occur in other PIFs. Finally, we would like to clarify that the main scope of this manuscript is to demonstrate the functional relevance of the alternative isoform generated by splicing in PIF4 under heat stress. A detailed investigation of its molecular mode of action is beyond the scope of the present study. We sincerely appreciate the thoughtful feedback provided by all Reviewers. We will carefully consider their suggestions and use them to guide the inclusion of additional experiments and analyses in our revised manuscript to reinforce and clarify our conclusions.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      The manuscript by Shukla et al. provides important mechanistic insights into kinesin-1 autoinhibition and cargo-mediated activation. Using a convincing combination of protein engineering, computational modeling, biophysical assays, HDX-MS, and electron microscopy, the authors reveal how cargo binding induces an allosteric transition that propagates to the motor domains and enhances MAP7 binding. Despite limitations arising from conformational heterogeneity and structural resolution, the study presents a unified mechanism for kinesin-1 activation that will be of broad interest to the motor protein, structural biology, and cell biology communities.

      We are grateful for the time and effort from the reviewers and editors in providing fair and constructive comments that have helped to improve the manuscript. Our point-by-point response is provided below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aim to interrogate the sets of intramolecular interactions that cause kinesin-1 hetero-tetramer autoinhibition and the mechanism by which cargo interactions via the light chain tetratricopeptide repeat domains can initiate motor activation. The molecular mechanisms of kinesin regulation remain an important question with respect to intracellular transport. It has implications for the accuracy and efficiency of motor transport by different motor families, for example, the direction of cargos towards one or other microtubules.

      Strengths:

      The authors focus on the response of inactivated kinesin-1 to peptides found in cargos and the cascade of conformational changes that occur. They also test the effects of the known activator of kinesin-1 - MAP7 - in the context of their model. The study benefits from multiple complementary methods - structural prediction using AlphaFold3, 2D and 3D analysis of (mainly negative stain) TEM images of several engineered kinesin constructs, biophysical characterisation of the complexes, peptide design, hydrogen/deuterium-exchange mass spectrometry, and simple cell-based imaging. Each set of experiments is thoughtfully designed, and the intrinsic limitations of each method are offset by other approaches such that the assembled data convincingly support the authors' conclusions. This study benefits from prior work by the authors on this system and the tools and constructs they previously accrued, as well as from other recent contributions to the field.

      Weaknesses:

      It is not always straightforward to follow the design logic of a particular set of experiments, with the result that the internal consistency of the data appears unconvincing in places.

      For example, i) the Figure 1 AlphaFold3 models do not include motor domains whereas the nearly all of the rest of the data involve constructs with the motor domains;

      We appreciate the reviewer’s comment regarding the absence of the motor domains in the AlphaFold3 models shown in Figure 1. These domains were intentionally excluded to improve visual clarity and to better highlight the interaction between the TPR domains and CC1 in the inhibited kinesin-1 conformation. We felt that this simplified presentation in the main figure helps readers focus on the key mechanistic advance introduced in this work at the outset of the paper. For completeness, we have provided full-length kinesin-1 AlphaFold3 models that include the motor domains in the Supplementary Information (Fig. S1), and they are described in detail in the main text. In addition, we have added a note to the Figure 1 legend to explicitly direct readers to these full-length models.

      ii) the kinesin constructs are chemically cross-linked prior to TEM sample preparation - this is clear in the Methods but should be included in the Results text, together with some discussion of how this might influence consistency with other methods where crosslinking was not used.

      Thank you. Chemical crosslinking is typically important for obtaining high-quality negative-stain TEM grids of kinesin-1 complexes and has been employed in all prior EM studies by our group and others. While this was described in the Methods, we agree that it should also be stated explicitly in the Results. Accordingly, we have added a sentence to the Results section noting that the proteins were stabilized using the amine-to-amine crosslinker BS3 (“Proteins were also stabilised using the amine-to-amine crosslinker BS3 that was important for achieving reproducibly high-quality samples for imaging.”).

      Please see point below for acknowledgement of risks of using crosslinker.

      Can those cross-links themselves be used to probe the intramolecular interactions in the molecular populations by mass spec?

      We had considered this, however, cross-linking mass spectrometry (XL-MS) has been applied extensively to essentially identical kinesin-1 complexes by Tan et al. (eLife 2023). That work provided important insights into the overall architecture of the complex, including the new head–CC1 interactions. However, as fully acknowledged by the authors, significant ambiguity remained with respect to the positioning of the TPR domains, with many cross-links that could not be straightforwardly rationalized in a single model. These unresolved aspects provided part of the motivation for the present study, as highlighted in the Introduction.

      We believe that this ambiguity likely reflects an underlying conformational equilibrium of the kinesin-1 complex (e.g. opening/closing transitions) and/or dynamic docking and undocking of the TPR domains, and lysine-rich features of the TPR domains (most notably the loops that connect the TPR alpha helices) which may make them prone to lock in non-native states, which limits the interpretability of static cross-linking data in this system. In this context therefore, we feel that XL-MS has already been thoroughly explored for kinesin-1 and that its practical limitations in resolving these TPR interactions have been reached.

      This consideration was a primary motivation for pursuing cross-linker-free, solution-based approaches, particularly HDX-MS, which we argue provide the most relevant new insights into the assembly and conformational dynamics of the complex. To make this rationale clearer, we have added an explicit note in the HDX-MS section emphasizing that this is a cross-linker-free method. The added text reads:

      “To determine how the local structural changes from adaptor binding and shoulder dislocation affected the dynamics of kinesin-1 complexes in solution, as directly and least invasively as possible, and without the risk of cross-linker artefacts.”

      In general, the information content of some of the figure panels can also be improved with more annotations (e.g. angular relationship between views in Figure 1B, approximate interpretations of the various blobs in Fig 3F, and more thought given to what the reader should extract from the representative micrographs in several figures - inclusion of the raw data is welcome but extraction and magnification of exemplar particles (as is done more effectively in Fig S5) could convey more useful information elsewhere.

      We appreciate these suggestions. We have modified the figures throughout the manuscript in line with the reviewer’s points. Raw data is now provided at higher magnification throughout so the reader can better distinguish individual particles, angular relationships have been added and further annotations provided on 2D class averages. We do not want the reader to draw too many conclusions from images of single closed particles (with the exception of open vs closed in Fig S7) as these require averaging and 2D classification to obtain meaningful insights, and so we have not added zoom panels in these cases. Figure 3F has been annotated as requested.

      Reviewer #2 (Public review):

      Summary:

      In this paper, Shukla, Cross, Kish, and colleagues investigate how binding of a cargo-adaptor mimic (KinTag) to the TPR domains of the kinesin-1 light chain, or disruption of the TPR docking site (TDS) on the kinesin-1 heavy chain, triggers release of the TPR domains from the holoenzyme. This dislocation provides a plausible mechanism for transition out of the autoinhibited lambda-particle toward the open and active conformation of kinesin-1. Using a combination of negative-stain electron microscopy, AlphaFold modeling, biochemical assays, hydrogen-deuterium exchange mass spectrometry (HDX-MS), and other methods, the authors show how TPR undocking propagates conformational changes through the coiled-coil stalk to the motor domains, increasing their mobility and enhancing interactions with the microtubule-bound cofactor MAP7. Together, they propose a model in which the TDS on CC1 of the heavy chain forms a "shoulder" in the compact, autoinhibited state. Cargo-adaptor binding, mimicked here by KinTag, dislodges this shoulder, liberating the motor domains and promoting MAP7 association, driving kinesin-1 activation.

      Strengths:

      Throughout the study, the authors use a clever construct design - e.g., delta-Elbow, ElbowLock, CC-Di, and the high-affinity KinTag - to test specific mechanisms by directly perturbing structural contacts or affecting interactions. The proposed mechanism of releasing autoinhibition via adaptor-induced TPR undocking is also interrogated with a number of complementary techniques that converge on a convincing model for activation that can be further tested in future studies. The paper is well-written and easy to follow, though some more attention to figure labels and legends would improve the manuscript (detailed in recommendations for the authors).

      Weaknesses:

      These reflect limits of what the current data can establish rather than flaws in execution. It remains to be tested if the open state of kinesin-1 initiated by TPR undocking is indeed an active state of kinesin-1 capable of processive movement and/or cargo transport. It also remains to be determined what the mechanism of motor domain undocking from the autoinhibited conformation is, and perhaps this could have been explored more here. The authors have shown by HDX-MS that the motor domains become more mobile on KinTag binding, but perhaps molecular dynamics would also be useful for modelling how that might occur.

      We are grateful for the reviewer’s comments. We agree that the weaknesses the reviewer has outlined define the limitations of the study and establish important priorities for future work, that includes molecular dynamics simulations. An important prerequisite for the latter is a starting model that one has confidence in. We think that our study and earlier work now provide a good experimentally supported foundation for using AF3 generated assemblies for this purpose, by ourselves and others.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Shukla and colleagues presents a comprehensive study that addresses a central question in kinesin-1 regulation - how cargo binding to the kinesin light chain (KLC) tetratricopeptide repeat (TPR) domains triggers activation of full-length kinesin-1 (KHC). The authors combine AlphaFold3 modeling, biophysical analysis (fluorescence polarization, hydrogen-deuterium exchange), and electron microscopy to derive a mechanistic model in which the KLC-TPR domains dock onto coiled-coil 1 (CC1) of the KHC to form the "TPR shoulder," stabilizing the autoinhibited (λ-particle) conformation. Binding of a W/Y-acidic cargo motif (KinTag) or deletion of the CC1 docking site (TDS) dislocates this shoulder, liberating the motor domains and enhancing accessibility to cofactors such as MAP7. The results link cargo recognition to allosteric structural transitions and present a unified model of kinesin-1 activation.

      Strengths:

      (1) The study addresses a fundamental and long-standing question in kinesin-1 regulation using a multidisciplinary approach that combines structural modeling, quantitative biophysics, and electron microscopy.

      (2) The mechanistic model linking cargo-induced dislocation of the TPR shoulder to activation of the motor complex is well supported by both structural and biochemical evidence.

      (3) The authors employ elegant protein-engineering strategies (e.g., ElbowLock and ΔTDS constructs) that enable direct testing of model predictions, providing clear mechanistic insight rather than purely correlative data.

      (4) The data are internally consistent and align well with previous studies on kinesin-1 regulation and MAP7-mediated activation, strengthening the overall conclusion.

      Weaknesses:

      (1) While the EM and HDX-MS analyses are informative, the conformational heterogeneity of the complex limits structural resolution, making some aspects of the model (e.g., stoichiometry or symmetry of TPR docking) indirect rather than directly visualized.

      We agree with the reviewers point. Conformational heterogeneity is a significant challenge, and the model has been developed from multiple complementary approaches. A higher resolution cryoEM study remains a priority, but is challenging because of the size, shape and flexibility of the particle, but we hope that some the approaches used here (e.g. nanobody TPR stabilisation, ElbowLock) will provide a path to achieve this.

      (2) The dynamics of KLC-TPR docking and undocking remain incompletely defined; it is unclear whether both TPR domains engage CC1 simultaneously or in an alternating fashion.

      We agree that this is a limitation. We strongly suspect that the TPR domains dynamic and are working to overcome experimental challenges to resolve this important outstanding question. We have expanded the discussion section to better highlight this important priority.

      (3) The interplay between cargo adaptors and MAP7 is discussed but not experimentally explored, leaving open questions about the sequence and exclusivity of their interactions with CC1.

      We agree that this is a limitation but will be an important priority for future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are a number of places where the text could be more precise or clear, or the figures could be designed to be more informative:

      (1) The word "unitarily" is used in several places, and I don't know what it means in this context.

      We have changed the phrasing throughout the manuscript to this term. We were attempting to contrast with presumed cooperative multivalent interactions in the context of the kinesin-1 tetramer but agree that this choice of word doesn’t quite achieve that.

      (2) On page 5 the phrase "We focused on the ElbowLock background" is introduced and needs to be explained more clearly.

      Thank you. We have amended the text to read “This KIF5C construct contains a short 5 amino acid deletion that restricts flexibility around the elbow and helps maintain particles in their lambda conformation, providing homogenous samples, and facilitating subsequent analysis (34).”

      (3) On page 6, the phrase "To improve the resolution of our images, we turned to single-particle cryoEM analysis" is imprecise - what do the authors mean by the resolution of the images? Cryo-EM data does not always guarantee a higher resolution structure, but it offers the possibility of visualising finer structural features. This is probably what is meant here, but needs to be stated more precisely.

      We have amended the text to ‘visualise finer structural details’ as suggested.

      (4) Page 7 - "suggesting that TPR domains had loosely dissociated from the core" - I don't think the evidence points to dissociation of KLCs from the complex, but the phrase "loosely dissociated" implies this - would benefit from rephrasing.

      We have changed this to ‘undocked’ for consistency with other descriptions in the manuscript.

      (5) Was the effect of the CC-Di insertion (ΔTDS) detectable by AlphaFold prediction? It would be interesting to include this, partly for completeness and partly because a slightly imperfect and maybe a more dynamic coiled-coil in this region of the molecule may be important in supporting the conformational changes required for activation.

      Thank you for this suggestion. Modelling of deltaTDS complex indeed shows displacement of the TPR domains. In the standard 5 output models, the TPR domains now occupy a variety of different positions, all with essentially zero confidence (high position error). Consistent with biochemical data, the CCDi insertion is modelled with with no overall disruption to the architecture or length of CC1 as expected. We think that this is a valuable addition to the study and have included it as a new supplementary figure (Fig S5), with main text reading.

      …. “Supporting this, models of ΔTDS complexes using AF3 showed the expected seamless insertion of CCDi into CC1, with displacement of the TPR domains to a variety of different positions, in 5 models, all with high position error with respect to KHC (Fig S5).”

      (6) Figure S1 has two sections designated (C) in the legend.

      Corrected

      (7) Figure S3 - given the resolution and level of interpretation of the 3D reconstructions, it is not relevant to include an FSC curve, but other standard information, such as angular distribution and any evidence of variability from 3D classifications (and how many particles per 3D class) should be included for all structures.

      Thank you, a complete workflow for all complexes has now been provided in Figure S8 with the information requested. In each case there were typically two ‘good’ classes. For ElbowLock, this included one without a prominent shoulder, consistent with 2D classification and quantification. We assume this may reflect a docking/undocking equilibrium. For the deltaTDS and KinTag particles, neither class showed the shoulder feature. The main text has been modified to reflect this and reads “For ElbowLock complexes, this resulted in classes with and without a prominent shoulder, in agreement with 2D classification. For ElbowLock-ΔTDS and ElbowLock-KinTag complexes, no prominent shoulder containing classes were observed.”

      Reviewer #2 (Recommendations for the authors):

      Overall, the figures would benefit from more labels for clarity, some examples and suggestions below:

      (1) Figure 1A - Connect motors to the rest of the structure e.g., wiggly lines.

      Corrected.

      (2) Figure 1B - Add arrows and angles to indicate different views of the model.

      Corrected.

      (3) Figure 1B - Label TPR1-6 (e.g., inset zoom in).

      Corrected.

      (4) Figure 2D and 3D - Label the lack of a shoulder in all averages (perhaps with an arrow instead of a circle to not obscure density), include an example average which shows prominent shoulder density.

      Corrected. Full sets of classes showing shoulder like features for deltaTDS and KinTag complexes are now shown in Figure S4.

      (5) Figure 3D: Label motor domains and elbow as in other figures.

      Corrected.

      (6) Methods: Include more information on how EM classes were compared to AF projections (e.g., Figure 1D). Was this done visually or computationally? Likewise, more information is needed on how classes were judged to have prominent/weak shoulder density (Figure 2D). In the figure legend, there is a statement that "Full sets of classes are provided in Fig. S4" but this is absent in the supplement.

      Thank you. This information has been added to the methods.

      “For comparison to the AF3 model, simulated density was generated using the molmap command in ChimeraX (73) filtering to 15 Å, and projections were generated/selected automatically using the Reference Based Auto Selected 2D function in CryoSPARC”.

      Full sets of classes are now provided in Figure S4.

      (7) Figure 1-3 - Raw micrographs are a very useful inclusion but would benefit from being a more zoomed-in view (e.g., Figure S5 scale). Particularly useful for 3C, where the mixture of open and closed would be good to see.

      Higher zoom micrographs have been provided throughout.

      (8) Figure 5D: Panels too small to see the result, suggest making full width and moving E below.

      Thank you. We have expanded the panel and moved the model to a new Figure 6.

      (9) Figure S1: PAE plot convincing, but pLDDT colour models needed.

      A representative model coloured for pLDDT has been added to Figure S1. Most of the structure sits within the light blue confident range (90 > pLDDT > 70) with the exception of the disordered regions and neck coil.

      (10) Figure 5B: Reason for the variable inputs?

      The reviewer raises an interesting point. The slightly reduced expression of deltaElbow and slightly increased expression of ElbowLock is a consistent feature of these experiments. We note that this effect is in the ‘opposite direction’ to the impact on binding to MAP7 and so does not affect our conclusions from the experiment. However, we wonder whether opening and closing of the complex may impact on turnover of kinesin proteins, which could have implications for their normal homeostasis and possible degradation after transport in polarised cells. We are considering how to explore this going forwards. We have added a note to the results section to highlight this interesting observation to the reader.

      “We also noted slightly elevated expression of ElbowLock complexes and slightly lower expression of DeltaElbow complexes, suggesting that opening/closing of the complex could impact on kinesin-1 turnover”

      (11) Figure legend 5B: Insufficient detail, the end result is stated, but the three separate gels are not described.

      Legend has been expanded.

      (12) Figure 3F: Currently somewhat problematic. It is unclear if the models are in the same view, and so comparison is difficult. Figure 1C (bottom right) shows class averages with a clear, separate CC density, so the relatively featureless model in this region is puzzling. A statement on how the three model views are related to each other, if aligned with each other, would be useful.

      We appreciate the reviewers point. Models were aligned in Chimera, using the fit in map command. Because of the limited features of the models presumably due to flexibility, achieving a good alignment for all three models was challenging, but we think that showing the 180-degree rotations is probably about the best we can achieve here.

      (13) The following statement is too strong: "Nonetheless, we obtained reference-free 2D class averages that appeared to show full-length 'side' views of the complex with clear definition of the elbow, hinge 2, and KHC-KLC (coiled-coil) interface features which enabled us to identify CC1 confidently (Fig. 1D)". Given that the negative-stain EM data were collected primarily to validate the AlphaFold model, the assignment of CC1 should be described as consistent with rather than confidently identified from the class averages. The resolution of the EM data does not independently support such an assignment, and the wording needs to be softened.

      We appreciate the reviewer’s point, we have softened the wording as suggested. The paragraph now reads.

      “To visualise finer structural details, we turned to single-particle cryoEM analysis of frozen-hydrated samples. We were unable to obtain optimal samples suitable for determining the complete structure. Nonetheless, we obtained reference-free 2D class averages that appeared to show full-length ‘side’ views of the complex with clear definition of the elbow, hinge 2, and KHC-KLC (coiled-coil) interface features (Fig. 1D). The motor domains were poorly resolved in these classes, suggesting that the head assembly is somewhat flexible relative to the coiled coil/TPR body. A comparison to low-pass filtered back-projections from the AF3 model (without motor domains) revealed density at a position concurrent with the docked TPR domains (Fig. 1D).”

      (14) There is a typo in the figure legend of Figure 3 - (E) and (F) should be (F) and (G).

      Corrected

      Reviewer #3 (Recommendations for the authors):

      I recommend the following additions:

      (1) Figure 1 labeling - In panel A, please label the "linker domain" and the "KLC subunits" explicitly to help orient the reader. In panel B, please mark the "TPR shoulder" corresponding to the docked TPR domains on CC1; this will help the reader connect parts B and C.

      Thank you, we have modified Figure 1A with this additional information.

      (2) The TPR docking site (TDS) is a central structural element, and its sequence boundaries are provided in the Methods. It would help to visualize this directly in Figure 2A or in an inset.

      We hope that the reviewer agrees that the zoomed in model in Figure 5A (alongside MAP7) provides a sufficiently detailed view of the structural interface to highlight the orientation of TPR1 with respect to CC1. The side chain contacts in the model are very plausible and confidently predicted (and can be straightforwardly reproduced in AF3 using the sequence information provided in the methods), but as our study has not explored this interaction at the single residue level, we would prefer not to imply this to the reader at this stage.

      (3) The authors' model of cargo-induced TPR dislocation is convincing. However, the Discussion could benefit from a clarification on whether both KLC-TPR domains are expected to be bound simultaneously or if a dynamic exchange occurs, as the EM data suggest potential asymmetry.

      Thank you, please see point 5 below where we have modified the discussion to reflect the reviewer’s thoughtful comments.

      (4) The HDX-MS analysis is comprehensive, but the authors may want to briefly comment on the coverage of low-signal regions (especially within CC2-CC3) to enhance clarity.

      We have added an additional supplementary figure (S10) showing sequence coverage. Overall, this is 88% but with some lower coverage around KHC-CC0 (neck) and the acidic linker that connects the KLC coiled-coil to the TPR. We have added a note to the main text to reflect this.

      “Sequence coverage was high (overall 88%) with the exception of KHC-CC0 (neck coil) and the acidic-linker region that connects the KLC coiled-coil to the TPR domains where coverage was lower”

      (5) In the Discussion, the proposed interplay between MAP7 and cargo adaptors is intriguing, especially considering the results from Anna Akhmanova's lab showing that MAP7 activates kinesin-1 processivity. Do the authors suggest that competition for CC1 is mutually exclusive or sequential? The answer has mechanistic implications.

      We have been considering questions for some time, and the short answer is that we don’t fully understand the dynamics yet. However, we appreciate the reviewer’s prompt to clarify our thinking on this. We have attempted to do this in a revised discussion section where we more explicitly outline these outstanding questions.

    1. Author response:

      eLife Assessment

      This manuscript provides an important contribution to the field of platelet biogenesis, and the convincing evidence will advance our understanding of signal transduction driving the development of late megakaryopoiesis and platelet reactivity that results in bleeding diathesis. The paper is noteworthy for analyzing two related, either singly or in combination, tyrosine phosphatases in this conditional, stage development gene knockout. Because SHP1 is a negative regulator and SHP2 is an activator, the synergistic effects found in the double knockout were surprising.

      We thank the reviewer for acknowledging the importance and novelty of our findings.

      Public Reviews:

      Reviewer #1 (Public review):

      Barré et al. investigated the role of Shp1 and Shp2 in megakaryocytes (MKs) and platelets by conditional knock-out of Shp1, Shp2, or both under the control of the Gp1ba promoter. Deletion of Shp1 and Shp2 in MKs and platelets was almost complete. The Shp1/Shp2 double knock-out mice displayed macrothrombocytopenia and increased bleeding, whereas the single knock-outs did not show significant defects. Platelet function was aberrant in DKOs, but not in single knock-outs, and so was ligand-induced signaling, particularly Syk phosphorylation.

      Megakaryocyte maturation was impaired in Shp1/Shp2 DKO mice. Ligand-induced signaling was impaired in Shp2 knock-out and DKO. Ex vivo formation of platelets and in vivo maturation of MKs were impaired in DKO mice. Pharmacological inhibitors of Shp1 and Shp2 had largely similar effects as observed in the single knock-outs. The authors conclude that Shp1 and Shp2 have synergistic functions in the MK/platelet lineage, and that Shp2 may be a potential therapeutic target in myeloproliferative neoplasms.

      Strengths:

      The data clearly show effects of the Shp1/Shp2 double knock-out on MKs and platelets.

      Weaknesses:

      There appears to be a discrepancy between the results with the Shp2 single knock-out and the Shp2 inhibitor: the Shp2 knock-out does not affect MKs and platelets, except Erk1/2 signaling, whereas the Shp2 inhibitors appear to affect MK function.

      This work is interesting and may have potential from a therapeutic point of view.

      Pharmacological effects do not always correlate with congenital anomalies arising for genetic defects. The Shp2 allosteric inhibitors used in our study only inhibit catalytically inactive Shp2, whereas targeted deletion of Ptpn11 results in a loss of total Shp2 expression, including catalytic and non-catalytic related functions, with developmental consequences. Further, Gp1ba-Cre+;Shp2fl/fl megakaryocytes express approximately 22% of normal Shp2 level, which likely also contributes to differences observed between pharmacological inhibition and genetic ablation of Shp2.

      We thank the reviewer for recognizing the therapeutic potential of our findings.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Barré et al. investigate the roles of the phosphatases Shp1 and Shp2 in the megakaryocyte and platelet lineage using genetic depletion in mice. By employing Gp1ba-Cre-based models, the study builds on the authors' previous work and addresses some limitations associated with earlier Pf4-Cre approaches. The authors report relatively mild alterations in megakaryocyte and platelet parameters in mice lacking either Shp1 or Shp2 alone, whereas combined deletion of both phosphatases results in macrothrombocytopenia, mild bleeding, and impaired GPVI-dependent platelet aggregation accompanied by reduced Syk phosphorylation. The functional platelet defects are linked to reduced expression of GPVI and integrin α2, while thrombocytopenia is associated with impaired megakaryocyte maturation, reduced ploidy, defective proplatelet formation, and altered TPO-dependent Ras/MAPK signaling. Similar effects on megakaryopoiesis are also observed in vitro following treatment with newly developed Shp2 inhibitors.

      Strengths and Weaknesses:

      The study addresses an important biological question and presents a substantial dataset that could contribute to a better understanding of Shp1 and Shp2 function in platelet biology. However, several aspects of data presentation and interpretation would benefit from additional clarification. In particular, while the authors conclude that single genetic deletion or pharmacological inhibition of Shp1 has a limited impact and that the major phenotypes are specific to combined Shp1/2 deletion or Shp2 inhibition, some of the data suggest more nuanced effects that may warrant further discussion.

      We thank the reviewer for raising this point. The manuscript is being revised accordingly, including highlighting the potential role of Shp1 in megakaryopoiesis and thrombopoiesis under steady-state and stressed conditions, requiring more detailed investigation.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Barré et al utilize the Gp1ba-Cre transgenic mouse model to build upon previous findings in a Pf4-Cre system to investigate the effects of individual and combined Shp1 and Shp2 deletion in megakaryocytes and platelets. They report decreased megakaryocyte maturation, macrothrombocytopenia, and increased bleeding primarily in association with the Shp1/Shp2 double-knockout condition. The authors further show that this phenotype appears to be driven primarily by Shp2 and implicate dysregulation of Mpl signaling and downstream Ras/MAPK pathways, including ERK1/2. Given the key role of these pathways in human diseases such as myeloproliferative neoplasms and the challenges associated with modulating such a central pathway, identification of a specific regulator of Mpl signaling poses intriguing questions for future studies on clinical applicability.

      We thank the reviewer for acknowledging the importance and novelty of our findings.

      Strengths:

      Overall, the experiments combine in vitro, in vivo, and ex vivo approaches and appear to have been carefully designed and carried out, with multiple technical and biological replicates where relevant. The authors make a compelling argument for using the Gp1baCre as opposed to the Pf4-Cre system and demonstrate both the dose- and stagedependent effects of Shp1 and Shp2 on megakaryopoiesis and thrombopoiesis. They find that Shp1 and Shp2 are required in late-stage megakaryocyte maturation and that even low levels of expression compared to baseline are likely sufficient to yield generally normal megakaryocytes. Their findings also lead to specific future directions, such as the mechanism by which Shp1 regulates megakaryopoiesis and thrombopoiesis that is distinct from TPO-mediated signaling.

      Weaknesses:

      While the experiments have been thoughtfully designed and carried out, there is limited background explanation on relatively complex or niche pathways/mechanisms, such as the relationship between P-selectin, CRP, and PAR4p; the interactions between SFK, Syk, GPVI, and CLEC-2; and TPO, MPL, ERK1/2, AKT, and STAT3, which, while likely intuitive to experts in their respective fields, may be less obvious to a reader approaching this manuscript with a global interest in megakaryopoiesis/thrombopoiesis and thus detract from the impact of the findings.

      We thank the reviewer for raising this point. The manuscript is being revised, to better explain the rationale and molecular mechanisms linking these pathways and functions.

      With regard to the science itself, some of the conclusions feel premature based on the available data.

      (1) The section "Aberrant ITAM signaling in Shp1- and Shp2-deficient platelets" is challenging to follow for those not well-versed in ITAM signaling and associated pathways, and may take additional outside reading to follow the conclusion that Syk-dependent signaling is modulated downstream of GPVI and CLEC-2 based on lack of change in Src p-Tyr418, especially considering that Src p-Tyr418 was previously introduced as a measure of SFK rather than Syk. In the introduction, Shp1 is specifically mentioned as a negative regulator of the ITAM/Syk/phospholipase pathway. However, in Figure 4Ai and Bi, Syk phosphorylation/activation in Shp1 knockout cells did not appear to be different from Shp2 knockout cells, and is lower than the control, which is surprising for a negative regulator. It is also not clear why, in the section (Figure 4A-B), there is reduced Syk activation in Shp1 and Shp2 single knockout cells upon CLEC2 stimulation (but apparently not with CRP) when there was no difference in response to CLEC2 (but a difference in response to CRP) in the previous section (Figure 3A, C).

      We thank the reviewer for raising these important points. The manuscript is being revised accordingly, including clarifying the roles of SFKs, Shp1 and Shp2 in the ITAM-Syk-PLCg2 signaling pathway.

      Briefly, SFKs are essential for phosphorylating ITAMs, allowing SH2-dependent docking of Syk. Reduced reactivity of Shp1/2 DKO platelets to CRP and collagen is likely due to downregulation of the ITAM-containing GPVI-FcR g-chain complex and integrin a2 subunit, and concomitant reduction in Syk phosphorylation.

      However, the marginal albeit significant reduction in Syk phosphorylation downstream of CLEC-2 in Shp1 and Shp2 KO platelets was not determined and was insufficient to impact CLEC-2-mediated platelet aggregation under the conditions tested.

      Differences in the stoichiometry and docking of Syk to phosphorylated GPVI-FcR g-chain and CLEC-2 likely contribute to the differences in platelet reactivity and Syk phosphorylation downstream of the two receptors in the absence of Shp1 and Shp2.

      (2) In the section "Reduced Tpo signaling in Shp1/2-deficient MKs," only Western blot data for (p)ERK1/2, AKT, and STAT3 are presented before concluding that decreased ERK1/2 activity is a mechanistic explanation for thrombocytopenia seen in the Shp1/2 doubleknockout condition. Such a statement would benefit from additional experiments, such as protein or transcriptional levels of ERK1/2 targets specifically relevant to megakaryopoiesis, such as ETS, FOS, and JUN, to assess the consequences of decreased phosphorylated ERK1/2.

      We thank the reviewers for these constructive comments. Further experiments are being planned to determine the biological and transcriptional consequences of reduced ERK1/2 phosphorylation during megakaryopoiesis and thrombopoiesis.

      (3) Suggesting that "inhibiting Shp2 will not have any bleeding consequence in patients" and that Shp2 may be a therapeutic target in myeloproliferative neoplasms when none of these studies have been carried out in a human model is a bold conclusion. There are no data presented on, for example, whether Shp2 inhibition can help reverse the MPL/JAK/STAT pathway in the setting of gain-of-function mutations specifically associated with myeloproliferative neoplasms.

      This conclusion is being tempered in the revised manuscript. Genetic- and pharmacological-based approaches will be used to establish the therapeutic potential of inhibiting Shp1 and Shp2 in mouse models of MPN, including Jak2 gain-of-function mice. Bleeding and thrombotic complications of inhibiting Shp1 and Shp2 will be explored as part of these studies.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer # 1 (Public review):

      (1) Structure and Presentation of Results

      • I recommend reordering the visual-cue experiments to progress from simpler conditions (no cues) to more complex ones (cue-conflict). This would improve narrative logic and accessibility for non-specialist readers. The authors have chosen not to implement this suggestion, which I respect, but my recommendation stands.

      Thank you for this suggestion. We understand your point that presenting the experiments from simpler to more complex conditions may seem more intuitive. However, we have kept the original order because it better reflects the logic of the study itself. Our work first asked whether fall armyworms, like the Bogong moth, use a magnetic compass that is integrated with visual cues. Only after establishing this behavioral feature did we go on to test whether visual cues are required to maintain magnetic orientation. To make this reasoning clearer to readers, we have explicitly stated in the Introduction that magnetic orientation in the Bogong moth depends on the integration of visual cues, which provides clearer context for the experimental design.

      (2) Ecological Interpretation

      • The authors should expand their discussion on how the highly simplified, static cue setup translates to natural migratory conditions, where landmarks are dynamic, transient, or absent. Specifically, further consideration is needed on how the compass might function when landmarks shift position, become obscured, or are replaced by celestial cues. Additionally, the discussion would benefit from a more consolidated section with concrete suggestions for future experiments involving transient, multiple, or more naturalistic visual cues. This point was addressed partially in one paragraph of the Discussion, which reads as follows:

      "In nature, they are likely to encounter a range of luminance-gradient visual cues, including relatively stable celestial cues as well as transient or shifting local features encountered en route. Although such natural cues differ from our simplified laboratory stimulus, they may represent intermittently sampled visual inputs that can be optimally integrated with magnetic information, with the congruency between visual and magnetic cues likely playing a key role in maintaining a stable compass response. Whether the cues are static or changing, brief periods without them may still allow the subsequent recovery of a stable long-distance orientation strategy. Determining which types of natural visual cues support the magnetic-visual compass, and how they interact with magnetic information, including how their momentary alignment or angular relationship is integrated and how such visual cue-magnetic field interactions may require time to influence orientation, together with elucidating the genetic and ecological bases of multimodal orientation, will be important objectives for future research." While this paragraph is informative, the wording remains lengthy, somewhat unclear, and vague. Shorter, clearer statements would improve readability and impact. For example:

      • How could moths maintain direction during periods when only the magnetic field is present and visual landmarks are absent?

      • Could celestial cues (e.g., stars) compensate, and what happens if these are also obscured?

      • What role does saliency play when multiple visual landmarks are present simultaneously?

      • How might a complex skyline without salient landmarks affect orientation?

      Including simple, concise sentences that pose concrete open questions and suggest experimental designs would strengthen the discussion without creating space issues. In my view, a comprehensive discussion of how the simplified, static cue setup relates to natural migratory conditions-where landmarks are dynamic, transient, or absent-would add significant value to the paper.

      Thank you for this constructive and insightful comment. You correctly point out that our articulation of the ecological relevance of the simplified, static cue setup was not sufficiently clear. We also agree that the original wording in the Discussion remained overly general. In the revised Discussion, we updated the manuscript to incorporate recently published findings on the use of light–dark gradients for orientation in fall armyworms. However, we explicitly note that it remains unclear whether fall armyworms can exploit naturally occurring luminance gradients, such as those generated by the moon, for orientation under natural conditions. We further emphasize that during natural migration the visual environment is dynamic, with celestial cues available intermittently and local visual features changing continuously during flight. In this context, we outline several key unresolved questions, including whether celestial cues can compensate when local landmarks are absent; how multiple visual cues are weighted and integrated with geomagnetic information; how transient visual cues (like moving clouds or changing illumination) influence orientation; and how luminance gradients that are common in natural nocturnal environments interact with the geomagnetic field to support orientation. For each of these issues, we briefly suggest experimental approaches to guide future research.

      (3) Methodological Details and Reproducibility

      • The lack of luminance level measurements should be explicitly highlighted.

      Thank you for your helpful suggestion. You are right that luminance level is an important experimental parameter. We have stated this information in the Methods section under Behavioral apparatus: “The ambient light level in the experimental environment was measured to be below 1 lux using a Testo 540 lux meter (Testo SE & Co. KGaA, Titisee-Neustadt, Germany). Further work is still required to compare the illuminance used in this study with that under natural conditions, which are inherently variable.” This point is also clarified in the legend of Figure S3 in the supplementary material.

      • The authors chose not to adjust figure legends by replacing "magnetic South" with "magnetic North." While I believe this would be more conventional and preferable, this is ultimately a minor stylistic issue.

      Thank you very much for your suggestion. We understand your point and agree that using “magnetic North” would be more conventional. However, because our experiments focus on the orientation behavior of the autumn population, magnetic South is aligned with the landmark direction representing the potential migratory direction, which we believe makes the figures more intuitive for readers. We therefore consider this a minor stylistic issue.

      (4) Conceptual Framing and Discussion

      • Although the authors made a good attempt to explain the limitations of using an artificial visual cue, I believe there is room or a more explicit argument. For example, it could be stated clearly that this species is unlikely to encounter a situation in nature where a single, highly salient landmark coincides with its migratory direction. Therefore, how these findings translate to real migratory contexts remains an open question. A sentence or two making this point directly would strengthen the discussion.

      Thank you for your helpful suggestion. We now address this point explicitly in the Discussion, noting that fall armyworms are unlikely to experience a natural visual environment dominated by a single, static, and highly salient landmark coinciding with their migratory direction. Consequently, how these findings translate to real migratory contexts remains an open question.

      (5) Technical and Open-Science Points

      • Sharing the R code openly (e.g., via GitHub) should be seriously considered. The code does not need to be perfectly formatted, but making it available would be highly beneficial from an open-science perspective.

      Thank you for the suggestion. We agree that making code openly available is valuable from an open-science perspective. The MMRT script used in this study is Moore’s Modified Rayleigh Test, available from the original publication by Massy et al. (2021; https://doi.org/10.1098/rspb.2021.1805). In the previous version, we only cited this reference in the Materials and Methods section; we have now added a direct link to the script to improve clarity and accessibility. We have also provided a public link to the data-recording scripts used in the Flash Flight Simulator (https://doi.org/10.17632/6jkvpybswd.1). This repository additionally includes a map-based optical flow script that was not used in the present study but is shared for completeness.

      Reviewer #1 (Recommendations for the authors):

      • LL. 133-137 (end of paragraph starting with "The fall armyworm is a migratory crop pest native to the Americas"): Suggest splitting into shorter, clearer sentences. The limitations of this method could be better articulated here and elaborated in the Discussion.

      Thank you for this suggestion. We have revised this paragraph by splitting it into shorter, clearer sentences and by articulating the limitations of this method more explicitly. These limitations are further elaborated in the Discussion.

      • LL. 181-185 (end of paragraph starting with "To examine if fall armyworms integrate geomagnetic and visual cues for seasonal migratory orientation"): It would be helpful to state explicitly that season-specific headings have been confirmed in the lab using a flight simulator, but destination regions remain unknown without further tracking experiments.

      Thank you for this helpful suggestion. We have now clarified in the revised manuscript that season-specific orientation headings have been confirmed in the laboratory using a flight simulator, while the actual migratory destination regions remain unclear in the absence of tracking experiments.

      • LL. 230-234 (start of paragraph "Our previous research showed that fall armyworms reared under artificially simulated fall conditions…"): Clarify which migratory season is being referenced.

      Thank you for this helpful suggestion. We have clarified in the text that the migratory season referenced here is the autumn migratory season. In addition, we have added information in the Methods to specify the actual calendar season during which the insects were reared under the simulated conditions.

      • LL. 270-272 (middle of Fig. 2 caption): Suggest explicitly mentioning that for this population, the seasonally appropriate direction is southbound in autumn and northbound in spring, as this may not be clear to non-specialists.

      Thank you for this helpful suggestion. We have now explicitly stated the seasonally appropriate migratory directions for this population, indicating southbound migration in autumn and northbound migration in spring, to improve clarity for non-specialist readers.

      • LL. 421 (middle of paragraph starting with "We also considered the limitations of the Rayleigh test…"): Add that the groups lacking visual cues exhibited "lower directedness as per lower vector length (r)" in addition to lower flight stability.

      Thank you for this helpful suggestion. We further note that the conclusions drawn from the flight stability analysis are consistent with those based on individual r-value analyses.

      • LL. 499-501 ("unlike some vertebrates that can rely solely on magnetic information (Mouritsen, 2018)"): This point is slightly downplayed. It should be emphasized that nearly all tested vertebrates and invertebrates (e.g., birds, mole rats, fish, frogs, and other insects) demonstrate a magnetic compass without requiring visual landmarks. Moths are the only tested invertebrates so far that show landmark-magnetic field dependency for their magnetic compass to be manifested in a behavioural orientation response in Flight Simulator.

      Thank you for this important comment. We agree that this point represents a key synthesis in the Discussion, as it concerns how our findings relate to, and differ from, magnetic orientation demonstrated in other animal groups. We have therefore expanded the Discussion to note that studies have shown that some animals can exhibit directional preferences in simplified visual environments solely in response to changes in the magnetic field, and we now cite representative examples from birds and mole rats. At the same time, we also acknowledge important methodological and phenotypic differences among taxa. In particular, moths’ magnetic orientation has been assessed using a flight simulator, a setup in which stable directional behavior must be actively maintained during continuous movement. This is an important difference from orientation assays in birds during take-off or in terrestrial mammals such as mole rats. Moreover, whether birds and other animals rely on visual input to detect or calibrate magnetic information under certain conditions remains an open question. We therefore emphasize here both the phenotypic differences observed across experimental systems and the methodological considerations.

      • LL. 560-565 (paragraph starting with "Our flight simulator system (Dreyer et al., 2021) …"): Suggest clarifying what the Flash flight simulator system is and how it differs from the Mouritsen-Frost flight simulator.

      Thank you for this suggestion. We have added a brief clarification of the Flash flight simulator and how it differs from the Mouritsen–Frost system.

      • LL. 605-608 ("Spectral measurements …"): Explicitly mention that total illuminance was not measured and that further work is required to compare the illuminance used with natural conditions which of course vary.

      Thank you for this helpful suggestion. We agree that total illuminance is an important factor. We have now added a statement noting that the ambient light level in the experimental environment was measured to be below 1 lux using a Testo 540 lux meter, and we further acknowledge that additional work is required to compare the illuminance used in this study with that under naturally variable conditions.

      • LL. 628-641 (end of paragraph starting with "Electromagnetic noise at the experimental site ... "): Explain why this matters for interpreting behavioural responses. Highlight that although conditions were somewhat magnetically noisy which based on the past work may disrupt magnetic compass as it was shown in birds (eg Engels et al. 2014 Nature), the observed magnetic response under certain conditions indicates that the magnetic sense remained functional when landmark and magnetic field were aligned. This way you can pre-empt this criticism of your magnetic conditions being not ideal and noise on the left handside of the spectrum measured (which is not uncommon).

      Thank you for this helpful suggestion. We have now cited Engels et al. (2014, Nature) in this section and expanded the text to explain why electromagnetic noise at the experimental site is relevant for interpreting the behavioural responses. We also clarify the rationale for measuring electromagnetic noise and discuss the observed low-frequency (“left-hand side”) noise in the spectrum.

      • Fig. 51: Suggest adapting Y-axes and using violin or box plots (e.g., panels A/B starting from 30 up to 50, etc.).

      Thank you for this helpful suggestion. We have revised Fig. 5 accordingly by adapting the Y-axis scaling and replacing the original plots with box plots, as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      The researchers aimed to identify which neurotransmitter pathways are required for animals to withstand chronic oxidative stress. This work thus has important implications for disease processes that are caused/linked to oxidative stress. This work identified specific neurotransmitters and receptors that coordinate stress resilience, both prior to and during stress exposure. Further, the authors identified specific transcriptional programs coordinated by neurotransmission that may provide stress resistance.

      Strengths:

      The manuscript is very clearly written with a well-formulated rationale. Standard C. elegans genetic analysis and rescue experiments were performed to identify key regulators of the chronic oxidative stress response. These findings were enhanced by transcriptional profiling that identified differentially expressed genes that likely affect survival when animals are exposed to stress.

      We thank the reviewer for their positive assessment.

      Weaknesses:

      Where the gar-3 promoter drives expression was not discussed in the context of the rescue experiments in Figure 7.

      We now provide information about expression using 7.5 kb gar-3 promoter fragment  and compare directly with our analysis of endogenous gar-3 expression using the genome-modified gar-3::SL2::GFP strain (Page 16, new Figures 8 and S3).

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 3B is not mentioned in the text.

      Fixed. Figure 3B is now called out on page 10 of the revised manuscript.

      (2) The rationale for using the specific PQ concentration was not provided.

      We selected this concentration based on its use for chronic assays by other studies in the field to allow for direct comparison with our results. We now clarify this point in the Methods section (Page 26 of the revised text).

      (3) Transgenic animals injected with the unc-17βp::gar-3 transgene (25 ng/μL) displayed strikingly increased survival in the presence of 4 mM PQ compared to either gar-3 mutants or wild type (should have a Figure cited here)

      Fixed. Figure 9E is now referenced on Page 19 of the revised text.

      (4) The text describing Figure 7C details a comparison with the gar-3 single mutant but the graph shows the unc-17 single mutant

      Figure 7C is a comparison of the survival of gar-3 single mutants with either wild type or gar-3;ric-3 double mutants as described in the text.

      Reviewer #2 (public comments)

      In this paper, Biswas et al. describe the role of acetylcholine (ACh) signaling in protection against chronic oxidative stress in C. elegans. They showed that disruption of ACh signaling in either unc-17 mutants or gar-3 mutants led to sensitivity to toxicity caused by chronic paraquat (PQ) treatment. Using RNA seq, they found that approximately 70% of the genes induced by chronic PQ exposure in wild type failed to upregulate in these mutants. The overexpression of gar-3 selectively in cholinergic neurons was sufficient to promote protection against chronic PQ exposure in an ACh-dependent manner. The study points to a previously undescribed role for ACh signaling in providing organism-wide protection from chronic oxidative stress, likely through the transcriptional regulation of numerous oxidative stressresponse genes. The paper is well-written, and the data are robust, though some conclusions seem preliminary and do not fully support the current data. While the study identifies the muscarinic ACh receptor gar-3 as an important regulator of the response to PQ, the specific neurons in which gar-3 functions were not unambiguously identified, and the sources of ACh that regulate GAR-3 signaling and the identities of the tissues targeted by gar-3 were not addressed, limiting the scope of the study.

      We thank the reviewer for their positive assessment. We provide additional data and discussion of the points raised by the reviewer in the revised manuscript. In particular, as suggested by the reviewer, we conducted additional tissue-specific rescue experiments to try to better define GAR-3 site of action. We found that specific rescue of gar-3 expression in either cholinergic motor neurons or muscles each provide partial rescue. In addition, we quantified the expression of the nhr-185 and fbxa-73 genes, identified as upregulated by PQ in our RNA-seq studies, following oxidative stress (new Fig. S4). We observed increased expression of both genes following PQ exposure, providing independent confirmation for transcriptional upregulation of these genes as part of the stress response. See the responses to points #1 and #3 below for additional details.

      Major Comments:

      (1) The site of action of cholinergic signaling for protection from PQ was not adequately explored. The authors' conclusion that cholinergic motor neurons are protective is based on studies using overexpression of gar-3 and an unc-17 allele that may selectively disrupt ACh in cholinergic motor neurons (Figure 9F), but these approaches are indirect. To more directly address the site of action, the authors should conduct rescue experiments using well-defined heterologous promoters. Figure 7G shows that gar-3 expressed under a 7.5 kb promoter fragment fully rescues the defect of gar-3 mutants, but the authors did not report where this promoter fragment is expressed, nor did they conduct rescue experiments of the specific tissues where gar-3 is known to be expressed (cholinergic neurons, GABAergic neurons, pharynx, or muscles). UNC-17 rescue experiments could also be useful to address the site of action. Does expression of unc-17 selectively in cholinergic motor neurons rescue the stress sensitivity of unc-17 mutants (or restore resistance to gar-3(OE); unc-17 mutants)? These experiments may also address whether ACh acts in an autocrine or paracrine manner to activate gar-3, which would be an important mechanistic insight to this study that is currently lacking.

      We performed additional rescue experiments using heterologous promoters to drive gar-3 expression in cholinergic neurons or muscle and found that each provided a small, but significant degree of rescue as assessed from Kaplan-Meier survival curves. These results are presented in Figure 8 of the revised manuscript. We have not conducted similar unc-17 rescue experiments; however, we point out that cellspecific unc-17 knockdown by RNAi using the unc-17b promoter (expression largely restricted to ventral cord ACh motor neurons) increases sensitivity to PQ in our long-term survival assays (Figure 3A). Combined with our analysis of unc-17(e113) mutants, we believe these results support a requirement for unc-17 expression in cholinergic motor neurons.

      (2) The genetic pan-neuronal silencing experiments presented in Figure 1 motivated the subsequent experiments, but the authors did not relate these observations to ACh/gar-3 signaling. For example, the authors did not address whether silencing just the cholinergic motor neurons at the different times tested has the same effects on survival as pan-neuronal silencing.

      We used the pan-neuronal silencing to motivate further analysis of various neurotransmitter systems. Our genetic studies implicate both glutamatergic and cholinergic systems in protective responses to oxidative stress. The effects of pan-neuronal silencing on survival during long-term PQ exposure may therefore be derived solely from cholinergic neurons, glutamatergic neurons, or a combination of both neuronal populations. Distinguishing between these possibilities may be quite complicated and is not central to the main message of our paper. We therefore suggest this additional analysis lies outside the scope of this revision. Nonetheless, to address the reviewer’s point, in the revised text we expand our discussion relating the pan-neuronal silencing results to our analysis of ACh signaling (pages 21-22).

      (3) It is assumed that protection occurs through inter-tissue signaling of ACh to target tissues, where it impacts gene expression. While this is a reasonable assumption, it has not been directly shown here. It is recommended that the authors examine GFP reporter expression of a sampling of the genes identified in this study (including proteasomal genes that the authors highlight) that are regulated by unc-17 and gar-3. This would serve to independently confirm the RNAseq data and to identify target tissues that are subject to gene expression regulation by ACh, which would significantly strengthen the study.

      Agreed. To address this question, we investigated expression of the nhr-185 and fbxa-73 genes implicated as upregulated by oxidative stress in our RNA-seq studies. Consistent with our RNA-seq findings, we observed significantly increased expression of a nhr-185pr::GFP transcriptional reporter, primarily in the pharynx and anterior intestine, following 48 hrs of PQ exposure. These results support transcriptional upregulation of expression in these tissues as part of the stress response. fbxa-73 was among the proteasomal genes implicated as oxidative stress-responsive by RNA-seq. Consistent with this finding, by quantitative RT-PCR we observed a significant increase in fbxa-73 expression in wild type animals following 48 hrs of PQ treatment. These new results provide independent confirmation of the gene expression changes we observed by RNA-seq and are now included in new Figure S4 and discussed on Pages 17-18 of the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) As an independent way of addressing whether enhanced ACh signaling is sufficient for protection, the authors could examine stress resistance in ace mutants, as was reported in PMID: 39097618, or in mutants with increased ACh secretion.

      We thank the reviewer for this suggestion. We are pursuing the impacts of increased cholinergic activation in a separate study. We are pursuing experiments along the lines the reviewer suggests as one facet of this independent study. Our findings here provide evidence that increasing GAR-3 signaling in ACh motor neurons by cell-specific overexpression enhances protection. 

      (2) To address the specificity of ACh signaling by gar-3 for this response, the authors could report survival data for mutants lacking each of the other two mACh receptors, gar-1 and gar-2.

      We thank the reviewer for this suggestion. We now include new data showing that gar-3;gar-2 double mutants have similar survival to gar-3 single mutants in the presence of PQ new Figure 7F). We agree that further studies of additional GPCRs (e.g. gar-1 and metabotropic glutamate receptors) will be required to definitively establish specificity for GAR-3 and we now acknowledge this point on page 15 of the revised text.

      (3) Do carbonylation levels correlate with toxicity? For example, do gar-3 mutants have more carbonylation and gar-3 OE have less?

      This is an interesting question. To try to address this, we performed additional protein carbonylation experiments for unc-17 and gar-3 mutants. We found a similar increase in protein carbonylation following PQ exposure for gar-3 mutants as observed for wild type; however, we also noted a higher level a batch-to-batch variability for gar-3 compared with wild type and are therefore hesitant to draw firm conclusions. We have not included these data in the revised manuscript but provide them for the reviewer’s information here (Author response image 1 shows our prior N2 data for comparison). We were not able to conduct similar experiments for unc-17 mutants because we noted local starvation when the animals were grown at the high density required to obtain the protein quantities needed for these experiments.

      Author response image 1.

      (4) Citations in text for Figures 4A and 8A are missing.

      Fixed. Figures 4A and 8A (now 9A) are cited on pages 10 and 17 of the revised text, respectively.

      (5) Figures 4-6 and 8 have limited information content. Condense or move to supplementary.

      While we acknowledge the reviewer’s viewpoint here, we believe that the analyses of the transcriptional responses described in Figures 4-6 and 8 are central to the study. To address reviewers’ comments, we have included a new Figure 8 and merged previous Figures 8 and 9 (new Figure 9) in the revised manuscript.

      (6) "expression of" is repeated in "Finally, transgenic expression of expression of a wild-type GAR-3::YFP"

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This important study shows that orientation tuning of V1 neurons is suppressed during a continuous flash suppression paradigm, especially when the neurons have a binocular receptive field. However, the evidence presented is incomplete and, in particular, does not distinguish whether this suppression is due to reduced contrast or due to masking.

      This assessment is primarily based on the critique of Reviewer 2 that our results do not distinguish whether the impact of CFS is due to reduced contrast or due to masking. Reviewer 2 referred to Yuval-Greenberg and Heeger (2013), noting that: “V1 activity is, in fact, reduced during CFS … the mask reduces the gain of neural responses to the grating stimulus … making it invisible in the same way that reducing contrast makes a stimulus invisible.” To be precise, Yuval-Greenberg and Heeger (2013) used “akin to”, instead of “the same way”, in their abstract.

      We agree that CFS masking and contrast reduction can both lower the signal-to-noise ratio and thereby reducing visibility. However, these two factors operate in fundamentally different ways. According to gain control models by Heeger and others, reducing the physical contrast of a stimulus decreases the excitatory drive, while dichoptic masking increases the normalization pool. Our findings therefore reflect genuine masking-induced suppression and are not attributable to stimulus contrast reduction.

      Public Reviews:

      Reviewer #1 (Public review):

      Disclaimer: While I am familiar with the CFS method and the CFS literature, I am not familiar with primate research or two-photon calcium imaging. Additionally, I may be biased regarding unconscious processing under CFS, as I have extensively investigated this area but have found no compelling evidence in favor of unconscious processing under CFS.

      This manuscript reports the results of a nonhuman-primate study (N=2 behaving macaque monkeys) investigating V1 responses under continuous flash suppression (CFS). The results show that CFS substantially suppressed V1 orientation responses, albeit slightly differently in the two monkeys. The authors conclude that CFS-suppressed orientation information "may not suffice for high-level visual and cognitive processing" (abstract).

      The manuscript is clearly written and well-organized. The conclusions are supported by the data and analyses presented (but see disclaimer). However, I believe that the manuscript would benefit from a more detailed discussion of the different results observed for monkeys A and B (i.e., inter-individual differences), and how exactly the observed results are related to findings of higher-order cognitive processing under CFS, on the one hand, and the "dorsal-ventral CFS hypothesis", on the other hand.

      Thanks for reviewer’s helpful comments and suggestions. We added new contents discussing the inter-individual differences and the "dorsal-ventral CFS hypothesis" in the revision, and made other changes, which are detailed below.

      Major Comments:

      (1) Some references are imprecise. For example, l.53: "Nevertheless, two fMRI studies reported that V1 activity is either unaffected or only weakly affected (Watanabe et al., 2011; Yuval-Greenberg & Heeger, 2013)". "To the best of my understanding, the second study reaches a conclusion that is entirely opposite to that of the first, specifically that for low-contrast, invisible stimuli, stimulus-evoked fMRI BOLD activity in the early visual cortex (V1-V3) is statistically indistinguishable from activity observed during stimulus-absent (mask-only) trials. Therefore, high-level unconscious processing under CFS should not be possible if Yuval-Greenberg & Heeger are correct. The two studies contradict each other; they do not imply the same thing.

      Sorry we did not make our point clear. Our original concern was that the effects of CFS on V1 activity were underestimated, even in Yuval-Greenberg & Heeger (2013), as both studies compared monocular and dichoptic masking to estimate the influence of visibility. In contrast, in original psychophysical studies, the CFS effect was compared with or with dichoptic masking, which is expected to be stronger. We rewrote the paragraph to clarify.

      “Two prominent fMRI studies have examined the impact of CFS on V1 activity (Watanabe et al., 2011; Yuval-Greenberg & Heeger, 2013). Watanabe et al. (2011) compared monocular CFS masking (stimulus visible) and dichoptic CFS masking (stimulus invisible), and reported that V1 BOLD responses were largely insensitive to stimulus visibility when attention was carefully controlled. However, using similar experimental design, Yuval-Greenberg and Heeger (2013) observed reduced BOLD responses in V1 under dichoptic masking, suggesting that V1 activity changed with stimulus visibility. They attributed the difference of results between two studies mainly to differences in statistical power (~250 trials per condition vs. ~90 trials per condition). Nevertheless, these studies were not designed to quantify the pure effect of CFS on stimulus-evoked V1 responses, as they contrasted monocular and dichoptic masking conditions to equate stimulus input while manipulating perceptual visibility. In contrast, original psychophysical studies (Tsuchiya & Koch, 2005; Tsuchiya, Koch, Gilroy, & Blake, 2006) demonstrated CFS masking by contrasting the visibility of the target stimulus with and without the presence of dichoptic mask. It is apparent that the pure CFS impact in above fMRI studies would be the difference of BOLD signals between binocular masking and stimulus alone conditions. In other words, the impact of CFS on V1 activity should be larger than what has been reported by Yuval-Greenberg and Heeger (2013).” (lines 55-71)

      (2) Line 354: "The flashing masker was a circular white noise pattern with a diameter of 1.89°, a contrast of 0.5, and a flickering rate of 10 Hz. The white noise consisted of randomly generated black and white blocks (0.07 × 0.07 each)." Why did the authors choose a white noise stimulus as the CFS mask? It has previously been shown that the depth of suppression engendered by CFS depends jointly on the spatiotemporal composition of the CFS and the stimulus it is competing with (Yang & Blake, 2012). For example, Hesselmann et al. (2016) compared Mondrian versus random dot masks using the probe detection technique (see Supplementary Figure S4 in the reference below) and found only a poor masking performance of the random dot masks.

      Yang, E., & Blake, R. (2012). Deconstructing continuous flash suppression. Journal of Vision, 12(3), 8. https://doi.org/10.1167/12.3.8

      Hesselmann, G., Darcy, N., Ludwig, K., & Sterzer, P. (2016). Priming in a shape task but not in a category task under continuous flash suppression. Journal of Vision, 16, 1-17.

      In a previous human psychophysical study, we also used the same noise pattern and the CFS effect appeared to be robust (Xiong et al., 2016, https://doi.org/10.7554/eLife.14614). However, we believe that the reviewer made a good point, and weaker suppression due to the use of our stimulus pattern may have contributed to the weaker suppression in Monkey B. This issue is now discussed in the revision regarding the individual variability in our results.

      “In addition, the random-noise masker we used might not be as effective as Mondrian patterns (G. Hesselmann, Darcy, Ludwig, & Sterzer, 2016). If reduced stimulus contrast and a Mondrian masker were used, we predict that CFS suppression in Monkey B would strengthen, potentially approaching the level observed in Monkey A. Nevertheless, it is worth emphasizing that our main conclusions are primarily based on data from Monkey A, who exhibited much stronger CFS suppression.” (lines 321-327)

      (3) Related to my previous point: I guess we do not know whether the monkeys saw the CF-suppressed grating stimuli or not? Therefore, could it be that the differences between monkey A and B are due to a different individual visibility of the suppressed stimuli? Interocular suppression has been shown to be extremely variable between participants (see reference below). This inter-individual variability may, in fact, be one of the reasons why the CFS literature is so heterogeneous in terms of unconscious cognitive processing: due to the variability in interocular suppression, a significant amount of data is often excluded prior to analysis, leading to statistical inconsistencies.

      Yamashiro, H., Yamamoto, H., Mano, H., Umeda, M., Higuchi, T., & Saiki, J. (2014). Activity in early visual areas predicts interindividual differences in binocular rivalry dynamics. Journal of Neurophysiology, 111(6), 1190-1202. https://doi.org/10.1152/jn.00509.2013

      The individual difference issue is now explicitly addressed in the Discussion:

      “Interocular suppression under CFS is known to vary substantially across individuals (Blake, Goodman, Tomarken, & Kim, 2019; Gayet & Stein, 2017; Yamashiro et al., 2013). This inter-individual variability may contribute to the heterogeneity observed in the CFS literature. We also found that the strength of V1 response suppression during CFS differed between two monkeys, as reflected by population orientation tuning functions (Fig. 2C), Fisher information (Fig. 2F), and reconstruction performance by the transformer (Fig. 3E). Several experimental factors may have contributed to the relatively weaker suppression observed in Monkey B. Because monkeys viewed the stimuli passively, we could not determine the dominant eye for each monkey (instead we switched the eyes and averaged the results), and the target was presented at relatively high contrast. Both factors are known to reduce the effectiveness of CFS suppression (Yang, Blake, & McDonald, 2010; Yuval-Greenberg & Heeger, 2013). In addition, the random-noise masker we used might not be as effective as Mondrian patterns (G. Hesselmann, Darcy, Ludwig, & Sterzer, 2016). If reduced stimulus contrast and a Mondrian masker were used, we predict that CFS suppression in Monkey B would strengthen, potentially approaching the level observed in Monkey A. Nevertheless, it is worth emphasizing that our main conclusions are primarily based on data from Monkey A, who exhibited much stronger CFS suppression.” (lines 311-327)

      Moreover, the authors' main conclusion (lines 305-307) builds on the assumption that the stimuli were rendered invisible, but isn't this speculation without a measure of awareness?

      We agree. To correct, we have removed the original lines 305-307 discussing the consciousness perception and reframed the manuscript throughout to focus on the impact of CFS on neural coding rather than on perceptual awareness. For example, the title has been changed to:

      “Continuous flashing suppression of neural responses and population orientation coding in macaque V1”,

      and the ending line of Introduction was changed to:

      “This approach enabled us to investigate the potentially differential impacts of CFS on the responses of V1 neurons with varying ocular preferences, as well as apply machine learning tools to understand the impacts of CFS on V1 stimulus coding at the population level.” (lines 81-83)

      (4) The authors refer to the "tool priming" CFS studies by Almeida et al. (l.33, l.280, and elsewhere) and Sakuraba et al. (l.284). A thorough critique of this line of research can be found here:

      Hesselmann, G., Darcy, N., Rothkirch, M., & Sterzer, P. (2018). Investigating Masked Priming Along the "Vision-for-Perception" and "Vision-for-Action" Dimensions of Unconscious Processing. Journal of Experimental Psychology. General. https://doi.org/10.1037/xge0000420

      This line of research ("dorsal-ventral CFS hypothesis") has inspired a significant body of behavioral and fMRI/EEG studies (see reference for a review below). The manuscript would benefit from a brief paragraph in the discussion section that addresses how the observed results contribute to this area of research.

      Ludwig, K., & Hesselmann, G. (2015). Weighing the evidence for a dorsal processing bias under continuous flash suppression. Consciousness and Cognition, 35, 251-259. https://doi.org/10.1016/j.concog.2014.12.010

      In the revision, we added a new paragraph to discussion issues related to the dorsal-ventral CFS hypothesis.

      “A related issue is the dorsal-ventral CFS hypothesis, which proposes that CFS suppression may disproportionately affect ventral visual processing while relatively preserving dorsal pathways involved in visuomotor functions, potentially allowing category- or action-related information to remain accessible under suppression (Fang & He, 2005). However, subsequent fMRI studies have failed to provide consistent support for this dissociation, reporting either stream-invariant awareness effects (Guido Hesselmann & Malach, 2011; Ludwig et al., 2015; Tettamanti et al., 2017), residual signal in ventral rather than dorsal regions (Fogelson et al., 2014; Guido Hesselmann et al., 2011), or residual low-level feature information/partial visibility rather than preserved dorsal processing (Ludwig et al., 2015). Although our study does not directly test dorsal-ventral dissociations, our V1 results provide a constraint on what information downstream visual pathways could access under suppression. When CFS- induced interocular suppression was strong enough and stimuli reconstruction was markedly reduced, as in the case of Monkey A, the information required for category-level or action-related processing may not be sufficient for high-level cortical representation.” (lines 297-310)

      Reviewer #2 (Public review):

      Summary:

      The goal of this study was to investigate the degree to which low-level stimulus features (i.e., grating orientation) are processed in V1 when stimuli are not consciously perceived under conditions of continuous flash suppression (CFS). The authors measured the activity of a population of V1 neurons at single neuron resolution in awake fixating monkeys while they viewed dichoptic stimuli that consisted of an oriented grating presented to one eye and a noise stimulus to the other eye. Under such conditions, the mask stimulus can prevent conscious perception of the grating stimulus. By measuring the activity of neurons (with Ca2+ imaging) that preferred one or the other eye, the authors tested the degree of orientation processing that occurs during CFS.

      Strengths:

      The greatest strength of this study is the spatial resolution of the measurement and the ability to quantify stimulus representations during CSF in populations of neurons, preferring the eye stimulated by either the grating or the mask. There have been a number of prominent fMRI studies of CFS, but all of them have had the limitation of pooling responses across neurons preferring either eye, effectively measuring the summed response across ocular dominance columns. The ability to isolate separate populations offers an exciting opportunity to study the precise neural mechanisms that give rise to CFS, and potentially provide insights into nonconscious stimulus processing.

      Weaknesses:

      While this is an impressive experimental setup, the major weakness of this study is that the experiments don't advance any theoretical account of why CFS occurs or what CFS implies for conscious visual perception. There are two broad camps of thinking with regard to CFS. On the one hand, Watanabe et al. (2011) reported that V1 activity remained intact during CFS, implying that CFS interrupts stimulus processing downstream of V1. On the other hand, Yuval-Greenberg and Heeger (2013) showed that V1 activity is, in fact, reduced during CFS. By using a parametric experimental design, they measured the impact of the mask on the stimulus response as a function of contrast and concluded that the mask reduces the gain of neural responses to the grating stimulus. They presented a theoretical model in which the mask effectively reduced the SNR of the grating, making it invisible in the same way that reducing contrast makes a stimulus invisible.

      We used multi-class SVM (as suggested by reviewer 3) and a transformer-based model to examine the impact of CFS on the classification of 12 orientations spaced in 15o gaps, which resembles coarse orientation discrimination, as well as on stimulus reconstruction, which resembles stimulus perception necessary for high-level cognitive tasks, respectively. The results suggest that under CFS, an observer may still be able to perform coarse orientation discrimination but not high-level cognitive tasks. These findings provide new insights into the implications of CFS for conscious visual perception from a population decoding perspective.

      In the revision, we also added a new paragraph discussing the implications of our findings for the dorsal-ventral CFS hypothesis, as suggested by reviewer 1. We previously presented a gain control model for our neuronal data in a VSS talk. However, we later decided that, since there are already nice models by Heeger and others, it would be better present something more unique and novel (i.e., machine learning results), which has now become a major component of the manuscript. We welcome the reviewer’s comments on this part.

      An important discussion point of Yuval-Greenberg and Heeger is that null results (such as those presented by Watanabe et al.) are difficult to interpret, as the lack of an effect may be simply due to insufficient data. I am afraid that this critique also applies to the present study.

      We are very much puzzled by the reviewer’s critique. First, our main result is not a null effect. A null effect would mean that CFS masking had no impact on population orientation responses. Instead, we observed a significant suppression or abolished tuning, which clearly indicates a strong effect of dichoptic masking. Second, our findings are based on large neural populations recorded using two-photon imaging, providing extensive sampling and statistical power. Thus, we believe that the reviewer’s critique about “insufficient data” are not applicable to our study.

      Here, the authors report that CFS effectively 'abolishes' tuning for stimuli in neurons preferring the eye with the grating stimulus. The authors would have been in a much stronger position to make this claim if they had varied the contrast of the stimulus to show that the loss of tuning was not simply due to masking.

      We are sorry that we cannot follow the logic here either. Even if “the mask effectively reduced the SNR of the grating, making it invisible in the same way that (“akin to”, to be more precise according to the abstract of Yuval-Greenberg and Heeger (2013)) reducing contrast makes a stimulus invisible”, it does not necessarily mean that dichoptic masking and contrast reduction are the same process or are based on the same neuronal mechanisms. According to gain control models by Heeger and others, reducing the stimulus contrast decreases the excitatory drive, while dichoptic masking increases the normalization pool via interocular suppression, both of which lower SNR, but are two fundamentally distinct processes.

      Therefore, varying the stimulus contrast might reveal a main effect of contrast, and possibly an interaction between contrast and dichoptic masking, but it would neither prove nor disprove the main effect of dichoptic masking.

      So, while this is an incredibly impressive set of measurements that in many ways raises the bar for in vivo Ca2+ imaging in behaving macaques, there isn't anything in the results that constitutes a real theoretical advance.

      We sincerely hope that the reviewer would have a better judgment after reading our responses.

      Reviewer #3 (Public review):

      Summary:

      In this study, Tang, Yu & colleagues investigate the impact of continuous flash suppression (CFS) on the responses of V1 neurons using 2-photon calcium imaging. The report that CFS substantially suppressed V1 orientation responses. This suppression happens in a graded fashion depending on the binocular preference of the neuron: neurons preferring the eye that was presented with the marker stimuli were most suppressed, while the neurons preferring the eye to which the grating stimuli were presented were least suppressed. The binocular neuron exhibited an intermediate level of suppression.

      Strengths:

      The imaging techniques are cutting-edge, and the imaging results are convincing and consistent across animals.

      Weaknesses:

      I am not totally convinced by the conclusions that the authors draw based on their machine learning models.

      Thanks for pointing this issue. We have used a new multi-class SVM suggested by the reviewer to reanalyze the data and found similar results, which is detailed later.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Lines 56-63: "As a result, the dichoptic CFS masking, which is cortical, could be substantially stronger than monocular masking when accounting for the pre-cortical effects of monocular masking." I don't quite understand this argument. Could you please elaborate?

      We have revised our writing to address the reviewer’s first major comment, which the current issue is related. The elaboration is highlighted in the paragraph below.

      “Two prominent fMRI studies have examined the impact of CFS on V1 activity (Watanabe et al., 2011; Yuval-Greenberg & Heeger, 2013). Watanabe et al. (2011) compared monocular CFS masking (stimulus visible) and dichoptic CFS masking (stimulus invisible), and reported that V1 BOLD responses were largely insensitive to stimulus visibility when attention was carefully controlled. However, using similar experimental design, Yuval-Greenberg and Heeger (2013) observed reduced BOLD responses in V1 under dichoptic masking, suggesting that V1 activity changed with stimulus visibility. They attributed the difference of results between two studies mainly to differences in statistical power (~250 trials per condition vs. ~90 trials per condition). Nevertheless, these studies were not designed to quantify the pure effect of CFS on stimulus-evoked V1 responses, as they contrasted monocular and dichoptic masking conditions to equate stimulus input while manipulating perceptual visibility. In contrast, original psychophysical studies (Tsuchiya & Koch, 2005; Tsuchiya, Koch, Gilroy, & Blake, 2006) demonstrated CFS masking by contrasting the visibility of the target stimulus with and without the presence of dichoptic mask. It is apparent that the pure CFS impact in above fMRI studies would be the difference of BOLD signals between binocular masking and stimulus alone conditions. In other words, the impact of CFS on V1 activity should be larger than what has been reported by Yuval-Greenberg and Heeger (2013).” (lines 55-71)

      (2) Line 13 low-level stimulus (properties).

      Fixed, thanks.

      Reviewer #3 (Recommendations for the authors):

      Major comments:

      (1) My main comment is regarding the SVM classifiers. The pair-wise (adjacent orientation pairs) decoding approach is unrealistic in my opinion and likely explains the very high accuracies that are reported. I believe that a multi-way classification approach - Linear Discriminant Analysis, Decision Trees, etc. - is needed to draw reasonable conclusions. Even SVMs can be adapted for multi-way classification (e.g., Allwein et al., 2000, J. Machine Learning Research).

      Following the reviewer’s advice, we reanalyzed the data using a multi-class SVM with a one-vs-one (OvO) scheme to classify 12 orientations (Allwein et al., 2000), which yielded similar results.

      “For orientation classification, we trained an all-pair multiclass support vector machine (SVM) classifier to discriminate 12 orientations based on trial-by-trial population neural responses from all trials (Allwein, Schapire, & Singer, 2000). Decoders for different FOVs, ipsilateral/contralateral target presentations, and baseline vs. CFS conditions were trained separately. Under the baseline condition, the decoders achieved mean classification accuracies of 89.5 ± 2.0% and 91.5 ± 2.1% across ipsilateral and contralateral eye conditions in Monkeys A and B, respectively, in contrast to a chance level of 8.3% (1 out of 12). Under CFS, decoding accuracy slightly decreased in Monkey A (81.7 ± 1.9%) but remained stable in Monkey B (90.4 ± 2.1%, Fig. 3A). These results suggest that under CFS, there is still sufficient information for coarse orientation discrimination, even for Monkey A whose V1 neuronal responses were substantially suppressed.” (lines 171-181)

      (2) The inconsistent modeling results (Figure 3E,F) are puzzling and need to be adequately addressed.

      SSIM and orientation error in original Fig. 3E, F measured the same reconstruction quality, but these two indices go in opposite directions for the same modeling results. To avoid confusion, we have removed the orientation error metric and now only report SSIM.

      “We used a structural similarity index (SSIM) (Brunet, Vrscay, & Wang, 2012) to quantify the reconstruction performances. Across the grating-presenting ipsilateral and contralateral eyes, the baseline models reconstructed the grating with median SSIMs of 0.52 and 0.61 for the two FOVs of Monkey A, and 0.57 and 0.63 for the two FOVs of Monkey B, respectively, while the corresponding SSIMs for the CFS models were 0.16 and 0.19 for Monkey A, and 0.55 and 0.53 for Monkey B (Fig. 3E).” (lines 200-206)

      Minor points:

      (1) The phrase "perceptual consequences" in the title is somewhat strong and possibly misleading, since there are no behavioral measures in this study.

      To address this concern from this reviewer and reviewer 1, we now focus on the impact of CSF on population orientation coding rather than perceptual consequences, which is more appropriate describing our modeling results. For example, we changed the title to: “Continuous flashing suppression of neural responses and population orientation coding in macaque V1“. Other changes are also made throughout the manuscript accordingly.

      (2) Figure 4: Panel "F" is not marked in the figure.

      Fixed, thanks.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This study by Li and colleagues examines how defensive responses to visual threats during foraging are modulated by both reward level and social hierarchy. Using a naturalistic paradigm, the authors test how the availability of water or sucrose, with sucrose being more rewarding than water, shapes escape behavior in mice exposed to looming stimuli of different intensities, which are used to probe perceived threat level and defensive responses. In parallel, the study compares dominant and subordinate animals to assess how social rank biases the trade off between reward seeking and threat avoidance. By combining detailed behavioral analyses with computational modeling, the work addresses how reward level and social context jointly influence escape decisions in an ethologically relevant setting.

      Across the different experimental conditions, perceived threat level is the main determinant of behavior. The authors show that looming stimuli associated with higher threat (contrast) consistently elicit faster and more robust escape responses than lower threat stimuli. This effect is particularly evident during early exposures, when animals are highly vigilant and have not yet habituated to the looming stimulus (learned that it is not dangerous). Later they described that as animals gain experience and habituate, behavior becomes more flexible, and reward level begins to exert a graded modulation of the escape response. Importantly, the authors show that under high threat conditions increasing reward value leads to more frequent and faster escape rather than greater reward pursuit. This finding is particularly relevant, as it suggests that highly valued rewards can heighten vigilance and thereby enhance responsiveness to threat, highlighting that reward does not simply compete with defensive behavior but can also reshape it depending on the perceived level of danger, in contrast to low threat conditions, where threat can be more easily outweighed by reward. Thus, an important conceptual contribution of the study is the introduction of vigilance as a useful framework to interpret these effects. Vigilance is treated as a behavioral state reflecting heightened attention to potential danger. In line with what is known from natural foraging, mice initially maintain high vigilance when confronted with an innate threat. This perspective helps clarify a finding that might otherwise appear counterintuitive. One might expect higher rewards to motivate animals to tolerate risk, explore more, and habituate faster in any scenario. Instead, the data suggest that highly rewarding outcomes can elevate vigilance, making animals more responsive to threat and leading to faster or more frequent escape under high threat conditions. In this sense, reward does not simply compete with threat but can also amplify sensitivity to it, depending on the internal state of the animal.

      The social results are particularly interesting in this context as well. Dominant mice consistently prioritize avoidance over reward, showing stronger escape responses and slower habituation than subordinates. This behavior is well captured by the vigilance framework proposed by the authors: dominant animals appear to maintain higher vigilance, which biases decisions toward threat avoidance. The authors further suggest that stable social relationships sustain high vigilance and slow habituation, framing this as an evolutionarily conserved strategy that may enhance survival. This interpretation provides a valuable perspective on how social structure shapes defensive behavior beyond immediate physical interactions. At the same time, there are important limitations to this interpretation. All experiments were conducted in male mice, and it is possible that the relationship between social hierarchy, vigilance, and defensive behavior would differ substantially in females. In addition, the idea that stable social relationships maintain elevated vigilance does not straightforwardly align with broader views of social stability as protective for mental health and as a buffer against anxiety and stress. These points do not undermine the findings but suggest that the social effects described here should be interpreted with caution and within the specific context of the task and sex studied.

      We thank the reviewer for raising this important point. In the context of repeated looming exposure, slower habituation reflects more sustained vigilance over time. Compared to individually housed mice, group-housed mice exhibit slower habituation (Lenz et al., 2022), and pair-housed mice showed even slower habituation in our current work. Importantly, this pattern does not indicate that pair-housed mice have higher overall vigilance than individually housed animals. Although individually housed mice habituate more quickly, they display higher initial vigilance, as reflected by their increased probability of escaping in response to looming stimuli (Lenz et al., 2022). Thus, pair-housed mice exhibited reduced defensive responses compared to individually housed animals, consistent with a social buffering effect.

      Furthermore, in a separate study (Rank- and Threat-Dependent Social Modulation of Innate Defensive Behaviors; Li, Gao, Li, 2026, eLife 15:RP109571), we directly compared responses to looming stimuli when mice were tested alone versus in the presence of a social partner and observed clear evidence of social buffering.

      Another important limitation is that the neural mechanisms underlying these effects remain speculative. The manuscript includes an extensive discussion of candidate circuits, particularly involving the superior colliculus and downstream structures, but this section is necessarily based on prior literature rather than on data presented in the study. Given the complexity of the circuits involved in integrating internal state, reward, social context, and vigilance, the current work should be viewed as providing a strong behavioral and conceptual framework rather than direct insight into underlying neural mechanisms.

      We fully agree that the proposed neural mechanisms remain speculative and that the circuits involved in integrating internal state, reward, and social context are likely far more complex. We have revised the manuscript to acknowledge this limitation.

      Methodologically, the behavioral paradigm is well suited for studying escape decisions in socially housed animals, and the machine learning based classification of defensive responses is a clear strength. The computational model provides a useful formalization of how threat level, reward level, and vigilance interact and may be valuable for other laboratories studying escape, approach avoidance, or conflict situations, particularly as a way to classify behavioral outcomes after pose estimation. More generally, the work will be of interest to the neuroethology community for its detailed characterization of escape behavior under naturalistic conditions.

      Given the ethological nature of the study and the high inter individual variability reported by the authors, clarity and precision in the methods are especially important for reproducibility. While the revised manuscript addresses many earlier concerns, some aspects remain slightly difficult to follow. For example, the main text states that animals were not water deprived to avoid differences in internal state, whereas parts of the methods describe conditions in which animals were water deprived, suggesting that internal state manipulation may differ across experiments. Clearer separation and explanation of these conditions would further strengthen confidence in the work.

      To improve clarity, we have revised the Methods section to clearly distinguish between experimental conditions that involved water deprivation and those that did not.

      Overall, this study provides a rich and thoughtful analysis of how reward level and social hierarchy modulate defensive behavior through changes in vigilance. It offers a useful conceptual advance for thinking about escape behavior in naturalistic settings and lays a solid foundation for future work aimed at linking these behavioral states to underlying neural circuits.

      Reviewer #2 (Public review):

      Zhe Li and colleagues investigate how mice exposed to visual threats and rewards balance their decisions in favour of consuming rewards or engaging in defensive actions. By varying threat intensity and reward value, they first confirm previous findings showing that defensive responses increase with threat intensity and that there is habituation to the threat stimulus. They then find that water-deprived mice have a reduced probability of escaping from low contrast visual looming stimuli when water or sucrose are offered in the environment, but that when the stimulus contrast is high, the presence of sucrose or water increases the probability of escape. By analysing behaviour metrics such as the latency to flee from the threat stimulus, they suggest that this increase in threat sensitivity is due to increased vigilance. Analysis of this behaviour as a function of social hierarchy shows that dominant mice have higher threat sensitivity, which is also interpreted as being due to increased vigilance. These results are captured by a drift diffusion model variant that incorporates threat intensity and reward value.

      The main contribution of this work is quantifying how the presence of water or sucrose in water-deprived mice affects escape behaviour. The differential effects of reward between the low and high contrast conditions are intriguing, but I find the interpretation that vigilance plays a major in this process not supported by the data. The idea that reward value exerts some form of graded modulation of the escape response is also not supported by the data. In addition, there is very limited methodological information, which makes assessing the quality of some of the analyses difficult, and there is no quantification on the quality of the model fits.

      (1) The main measure of vigilance in this work is reaction time. While reaction time can indeed be affected by vigilance, reaction times can vary as a function of many variables, and be different for the same level of vigilance. For example, a primate performing the random dot motion task exhibits differences in reaction times that can be explained entirely by the stimulus strength. Reaction time is therefore not a sound measure of vigilance, and if a goal of this work is to investigate this parameter, then it should be measured. There is some attempt at doing this for a subset of the data in Figure 3H, by looking at differences in the action of monitoring the visual field (presumably a rearing motion, though this is not described) between the first and second trials in the presence of sucrose. I find this an extremely contrived measure. What is the rationale for analysing only the difference between the first and second trials? Also, the results are only statistically significant because the first trial in the sucrose condition happens to have zero up action bouts, in contrast to all other conditions. I am afraid that the statistics are not solid here. When analysing the effects of dominance, a vigilance metric is the time spent in the reward zone. Why is this a measure of vigilance? More generally, measuring vigilance of threats in mice requires monitoring the position of the eyes, which previous work has shown is biased to the upper visual field, consistent with the threat ecology of rodents.

      We agree that reaction time can be influenced by multiple factors, including stimulus strength. Consistent with this, reaction times (i.e. latencies to flee) were substantially shorter under high-contrast conditions (Figure 3E). However, even under the same high-contrast condition, reaction times were significantly shorter in the water condition compared to the no-reward condition, suggesting that other factors such as vigilance may contribute.

      Upward-directed attention includes rearing, up-stretching, and upward head orientation, which will be clarified in the Method section. To address concerns about statistical validity, we will quantify these behaviors across the first 10 trials rather than limiting the analysis to the first two.

      As for the dominance-related results, we interpret them as reflecting both enhanced vigilance and reduced reward-seeking behavior. Time spent in the reward zone is not a measure of vigilance but an indicator of reward-seeking motivation. We will clarify this in the revised manuscript.

      (2) In both low and high contrast conditions, there are differences in escape behaviour between no reward and water or sucrose presence, but no statistically significant differences between water and sucrose (eg: Figure 3B). I therefore find that statements about reward value are not supported by the data, which only show differences between the presence or absence of reward. Furthermore, there is a confound in these experiments, because according to the methods, mice in the no-reward condition were not water-deprived. It is thus possible that the differences in behaviour arise from differences in the underlying state.

      In Figure 3B, the difference between water and sucrose conditions did not reach statistical significance (p = 0.08). We plan to collect additional data to determine whether this is due to limited statistical power. It is also possible that some behavioral readouts are more sensitive to the differences between water and sucrose conditions. For example, Figure 3F shows that escape speed was significantly higher in the sucrose than in the water condition under high-contrast stimulation.

      Thank you for pointing this out. To control for the potential confounds related to internal state, mice were not water-deprived under any of the three conditions in Figures 3A-3H. We will clarify this in the main text and Methods. For Figures 3I-3M, which compare decision-making under no-reward and water conditions, we will conduct additional experiments using non-deprived mice in the water condition.

      (3) There is very little methodological information on behavioural quantification. For example, what is hiding latency? Is this the same are reaction time? Time to reach the safe zone? What exactly is distance fled? I don't understand how this can vary between 20 and 100cm. Presumably, the 20cm flights don't reach the safe place, since the threat is roughly at the same location for each trial? How is the end of a flight determined? How is duration measured in reward zone measures, e.g., from when to when? How is fleeing onset determined?

      Hiding latency was defined as the time from stimulus onset to the animal’s arrival at the safe zone. Reaction time was quantified as the latency to flee, measured from stimulus onset to the initiation of the first flight state. The flight state was defined as locomotion exceeding 10 cm at a speed greater than 10 cm/s. Distance fled was defined as the distance covered between stimulus onset and offset for all trials. However, in trials classified as no reaction or freezing, this measure does not accurately reflect escape behavior. We will therefore rename it as distance under threat to better capture its meaning. The reward zone was defined as the region within 15 cm of the reward port at the end of the arena. Duration in the reward zone was measured as the time spent within this region during the 20 seconds following stimulus onset. In Figure 4E, the percentage of time spent in the reward zone was calculated relative to the total time the mouse remained in the arena during the 2-hour social session.

      All definitions and additional details on behavioral quantification will be included in the revised Methods section.

      (4) There is little methodological information on how the model was fit (for example, it is surprising that in the no reward condition, the r parameter is exactly 0. What this constrained in any way), and none of the fit parameters have uncertainty measures so it is not possible to assess whether there are actually any differences in parameters that are statistically significant.

      We appreciate the comment and agree that further clarification is needed. We will provide a more detailed description of the model fitting procedure in the revised Methods section. Specifically, the drift rate parameter (r), which reflects the perceived reward value, was constrained to zero in the no-reward condition. To enable statistical comparison across conditions, we will report uncertainty measures for all fit parameters.

      Comments on the revised manuscript:

      The manuscript has been revised and improved significantly by the addition of methodological details and new analysis. I remain, however, unconvinced by the argument that increased vigilance in the presence of reward leads to heightened escape behaviour.

      In response to my criticism that the work does not measure vigilance directly, the authors have included measures of foraging interval and foraging speed, which they state are "two direct behavioral analyses of vigilance". I disagree - like reaction time, foraging speed and foraging interval can be modulated, for example, by changes in threat sensitivity. Increased threat sensitivity comes with diverse behavioral changes that may well include increased vigilance, but foraging interval and foraging speed can certainly change without the animal expressing increased vigilance behaviors. A bigger issue I still have though, is with the conclusion that the presence of reward increases "direct escape behaviors". Comparing the no reward, water and sucrose groups indeed shows a difference (which is now clear after the split into early and late phases), but the issue is that these are different mice. As the text is written, is sounds like introducing reward will acutely increase escape. But if we look at the raw data show in Figure 2C, what I think is happening is that the presence of reward is decreasing habituation to the stimulus. The data for trials 1 and 10 in the three conditions show this - there is habituation with no reward (reaction times are all shifting to the right), a bit less with water and very little with sucrose. This is interesting in its own right and we can speculate why it might be happening, but I think this is conceptually different from what the authors are proposing.

      We agree that vigilance is not directly observable as a single variable. Our intent was not to claim that foraging speed and foraging interval provide a direct measure of vigilance, but rather to suggest that they may serve as indirect behavioral correlates.

      We also considered an alternative interpretation: these two measures could reflect perceived reward value under high-threat conditions across distinct reward types. If that were the case, animals would be expected to exhibit shorter intervals and faster speeds across no reward, water, and sucrose conditions. However, our data do not support this interpretation (Figures 3L and 3M), suggesting that these measures are more likely correlated with vigilance. 

      Furthermore, it is unlikely that changes in foraging interval and speed are driven by altered threat sensitivity, as animals could not see the threat during most of the foraging bout and only encountered it at the end.

      Regarding the conclusion that the presence of reward increases direct escape behaviors, our interpretation is that increased reward value reduces habituation, thereby maintaining higher vigilance during the late phase. This was discussed in the second-to-last paragraph of the "Economic and social modulations of innate decision-making under threat" subsection in the Discussion.

      Reviewer #3 (Public review):

      Male mice were tested in a classic behavioral "flee the looming stimulus" paradigm. This is a purely behavioral study; no neural analyses were done. Mice were housed socially, but faced the looming stimulus individually, using an elegant automated tunnel (see videos for clarity).

      The additional changes made to the paper clarify the work done. While there are some limitations (male mice, weird stimulus), the general results are interesting and a valuable addition to the experimental literature. The main claim of the paper is that the different rewards (none, water, sucrose) did not change the escape properties early in learning, but did late, particularly that in the late (already experienced) conditions, reward value (assuming sucrose > water > no reward) interacted with the salience of the looming stimulus (light gray, dark gray). (Panels 3D, 3G, 3K, 3N).

      For readers, I want to note that one of the most interesting results is actually in Figure S2, where they find that a looming stimulus behind the mouse still makes a mouse run to the nest. In these conditions, the mouse runs past the looming stimulus to get to safety! (I also do love the video of the mouse running around the barriers like a snake to get home.)

      I have a few minor clarification questions and a few notes that I think would be useful additions for authors and readers to think about.

      Dominance: What does the mouse social science literature say about the "test tube" test? What can we conclude from this test? This would be useful when trying to understand what is causing the dominance/submissive difference in responses. Figure 4 shows that the dominant mice are more risk-averse than the submissive mice. Is "dominance" in the test-tube actually a measure of risk-seeking? Is the issue that the submissive mice don't think they can get back to the food-site easily, so they are less willing to sacrifice the current (if dangerous) foraging opportunity? Is the issue that the submissive mice can't get back to the nest? As I understand it, the nest was always available to all the mice, so I suspect inability to get to the nest is an unlikely hypotheses. Is the issue that the submissive mice also don't feel safe in the nest?

      The tube test is a widely used assay in the rodent social behavior literature to assess dominance hierarchies, operationally defined by the ability of one animal to force its opponent to retreat from a narrow tube. Importantly, this assay does not directly measure risk-seeking or anxiety-related traits, but rather competitive outcomes during social conflict. Furthermore, our data indicate that the behavioral responses of subordinate mice to looming stimuli are primarily driven by the visual threat itself rather than by social avoidance. This point was elaborated in the second paragraph of the “Social modulation of innate decision-making” subsection in the Results section.

      Limitations of the study: There is an acknowledged limitation to male mice, and the limitations of the small data sets that are typical of such experiments. In addition, however, it is also worth noting the strangeness of the looming stimulus, which is revealed clearly in the videos. The stimulus is a repeating growing circle, growing in a single location within the environment. The stimulus repeats 10 times, once per second. This is not what an attacking hawk or owl would look like. (I now have this image of an owl diving down, and then teleporting up and diving down again.) Note - I am fine with this stimulus. It produces an interesting experiment and interesting results. I do not think the authors need to change anything in their paper, but readers need to recognize that this is not a "looming predator".

      These "limitations" are better seen as "caveats" when folding these results in with the rest of the literature that has gone before and the literature to come. (Generally, I do not believe that science works by studies making discoveries that change how we think about problems - instead, science works by studies adding to the literature that we integrate in with the rest of the literature.) Thus, these caveats should not be taken as problems with the study or as fixes that need to be done. Instead, they are notes for future researchers to notice if differences are found in any future studies.

      Thus, my only suggestion is that I think authors could write a more careful paper by using the past and subjunctive tense appropriately. Experimental observations should be in past tense, as in "the influence of reward was context-dependent and emerged in the late phase" instead of "the influence of reward is context-dependent and emerges in the late phase" - it emerged in the late phase this once - it might not in future experiments, not due to any fault in this experiment nor due to replicability problems, but rather due to unexpected differences between this and those future experiments. At which point, it will be up to those future experiments to determine the difference. Similarly, large conclusions should be in the subjunctive tense, as in "these data suggest that threat intensity is likely to be the primary determinant of decision making" rather than "threat intensity is the primary determinant of decision making", because those are hypotheses not facts.

      We thank the reviewer for the helpful suggestions and have revised the Abstract accordingly.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates how mice make defensive decisions when exposed to visual threats and how those decisions are influenced by reward value and social hierarchy. Using a naturalistic foraging setup and looming stimuli, the authors show that higher threat leads to faster escape, while lower threat allows mice to weigh reward value. Dominant mice behave more cautiously, showing higher vigilance. The behavioral findings are further supported by a computational model aimed at capturing how different factors shape decisions.

      Strengths:

      (1) The behavioral paradigm is well-designed and ethologically relevant, capturing instinctive responses in a controlled setting.

      (2) The paper addresses an important question: how defensive behaviors are influenced by social and value-based factors.

      (3) The classification of behavioral responses using machine learning is a solid methodological choice that improves reproducibility.

      Weaknesses:

      (1) Key parts of the methods are hard to follow, especially how trials are selected and whether learning across trials is fully controlled for. For example, it is unclear whether animals are in the nest during the looming stimulus presentations. The main text and methods should clarify whether multiple mice are in the nest simultaneously and whether only one mouse is in the arena during looming exposure. From the description, it seems that all mice may be freely exploring during some phases, but only one is allowed in the arena at a time during stimulus presentation. This point is important for understanding the social context and potential interactions, and should be clearly explained in both the main text and methods.

      We agree that these details are essential and have clarified them in the Methods. When the door system operated normally, only one mouse was allowed in the arena during looming exposure. Specifically, when all mice were in the nest, the nest-tunnel door was open and the tunnel-arena door was closed. Once a single mouse entered the tunnel, as detected by an OpenMV camera, the nest-tunnel door closed and the tunnel-arena opened, ensuring that only that mouse could enter the arena.

      Habituation was conducted over two days. On day 1, five mice were placed together in the nest for 30 minutes with all doors closed. Each mouse was then placed individually in the nest and allowed to freely explore the arena for 10 minutes under normal door operation. Finally, all mice were returned to the nest with all doors open and allowed for free exploration for 2 hours. On day 2, each mouse was placed individually in the nest and given an additional 1 hour of exploration under normal door operation.

      (2) It is often unclear whether the data shown (especially in the main summary figures) come from the first trial or are averages across several exposures. When is the cut-off for trials of each animal? How do we know how many trial presentations were considered, and how learning at different rates between individuals is taken into account when plotting all animals together? This is important because the looming stimulus is learned to be harmless very quickly, so the trial number strongly affects interpretation.

      We observed substantial inter-individual variability in habituation to looming stimuli, with a sharp decline in defensive responses over the first few trials followed by more gradual changes. To account for this, we segmented trials for each animal into two phases: an early rapidhabituation phase and a later stable phase. Analyzing these phases separately revealed that threat intensity dominates behavior in the early phase, whereas both threat and reward significantly influence behavior in the late phase. These results are now presented in revised Figures 2 and 3. Analyses restricted to first trials are included in Figure S5.

      (3) The reward-related effects are difficult to interpret without a clearer separation of learning vs first responses.

      As noted above, we have re-analyzed our data to account for learning effects.

      (4) The model reproduces observed patterns but adds limited explanatory or predictive power. It does not integrate major findings like social hierarchy. Its impact would be greatly improved if the authors used it to predict outcomes under novel or intermediate conditions.

      We have substantially revised the modeling analysis. The model is now fitted to behavioral data from the late phase and used to predict outcomes across additional conditions, including the early phase behavior and rank-dependent behavioral differences. The model successfully captures behavioral patterns across these conditions, supporting its predictive value beyond descriptive fitting.

      (5) Some conclusions (e.g., about vigilance increasing with reward) are counterintuitive and need stronger support or alternative explanations. Regarding the interpretation of social differences in area coverage, it's also possible that the observed behavioral differences reflect access to the nesting space. Dominant mice may control the nest, forcing subordinates to remain in the open arena even during or after looming stimuli. In this case, subordinates may be choosing between the threat of the dominant mouse and the external visual threat. The current data do not distinguish between these possibilities, and the authors do not provide evidence to support one interpretation over the other. Including this alternative explanation or providing data that addresses it would strengthen the conclusions.

      To support the interpretation of increased vigilance with reward under high-threat conditions, we analyzed additional behavioral measures beyond latency to flee. Rewarded mice showed longer foraging interval and slower foraging speed, both consistent with elevated vigilance (Figures 3L and 3M).

      To address the alternative explanation that subordinate mice may remain in the arena due to restricted nest access, we compared arena occupancy before, during, and after looming exposure. Although subordinates spent more time in the arena before looming, this difference disappeared during and after looming exposure (Figures 4C). Moreover, dominant and subordinate mice were

      equally likely to flee to the nest during escape trials. These findings rule out nest access restrictions as an explanation for the observed rank-dependent differences in defensive behaviors.

      (6) While potential neural circuits are mentioned in the discussion, an earlier introduction of candidate brain regions and their relevance to threat and value processing would help ground the study in existing systems neuroscience.

      We have revised the Introduction to incorporate relevant brain regions and neural circuits.

      (7) Some figures are difficult to interpret without clearer trial/mouse labeling, and a few claims in the text are stronger than what the data fully support. Figure 3H is done for low contrast, but the interesting findings will be to do this experiment with high contrast. Figure 4H - I don't understand this part. If the amount of time in the center after the loom changes for subordinate mice, how does this lead to the conclusion that they spend most of their time in the reward zone?. Figure 3A - The example shown does not seem representative of the claim that high contrast stimuli are more likely to trigger escape. In particular, the 10% sucrose condition appears to show more arena visits under low contrast than high contrast, which seems to contradict that interpretation. Also, the plot currently uses trials on the Y-axis, but it would be more informative to show one line per animal, using only the first trial for each. This would help separate initial threat responses from learning effects and clarify individual variability.

      We have substantially revised the figures. Results from trial segmentation based on individual habituation are now explicitly presented in Figures 2 and 3, and analyses using only the first trials are provided in Figure S5 to separate initial responses from learning effects.

      Regarding the original Figure 4H, we are not entirely certain about the concern. In this panel, we measured time spent in the reward zone, which is defined as the region within 10 cm of the reward port at the end of the arena, not the center of the arena, during looming exposure. Subordinate mice spent significantly more time in the reward zone than dominant mice. We have further clarified this in the revised manuscript.

      (8) The analysis does not explore individual variability in behavior, which could be an important source of structure in the data. Without this, it is difficult to know whether social hierarchy alone explains behavioral differences or if other stable traits (e.g., anxiety level, prior experiences) also contribute.

      We observed substantial individual variability in both dominant and subordinate mice, even on the first trial (Figure S7). Paired dominant–subordinate comparisons were used to isolate rankdependent effects.

      (9) The study shows robust looming responses in group-housed animals, which contrasts with other studies that often require single housing to elicit reliable defensive responses. It would be valuable for the authors to discuss why their results differ in this regard and whether housing conditions might interact with social rank or habituation.

      Robust looming-evoked defensive responses have been reported in both group- and singlehoused mice (Yilmaz and Meister, 2013, Lenzi et al., 2022), although single-housed mice habituate more rapidly. We have now discussed the potential interactions between housing conditions, social rank, and habituation in defensive behaviors in the revised manuscript.

      Reviewer #2 (Public review):

      Zhe Li and colleagues investigate how mice exposed to visual threats and rewards balance their decisions in favour of consuming rewards or engaging in defensive actions. By varying threat intensity and reward value, they first confirm previous findings showing that defensive responses increase with threat intensity and that there is habituation to the threat stimulus. They then find that water-deprived mice have a reduced probability of escaping from low contrast visual looming stimuli when water or sucrose are offered in the environment, but that when the stimulus contrast is high, the presence of sucrose or water increases the probability of escape. By analysing behaviour metrics such as the latency to flee from the threat stimulus, they suggest that this increase in threat sensitivity is due to increased vigilance. Analysis of this behaviour as a function of social hierarchy shows that dominant mice have higher threat sensitivity, which is also interpreted as being due to increased vigilance. These results are captured by a drift diffusion model variant that incorporates threat intensity and reward value.

      The main contribution of this work is to quantify how the presence of water or sucrose in waterdeprived mice affects escape behaviour. The differential effects of reward between the low and high contrast conditions are intriguing, but I find the interpretation that vigilance plays a major role in this process is not supported by the data. The idea that reward value exerts some form of graded modulation of the escape response is also not supported by the data. In addition, there is very limited methodological information, which makes assessing the quality of some of the analyses difficult, and there is no quantification of the quality of the model fits.

      (1) The main measure of vigilance in this work is reaction time. While reaction time can indeed be affected by vigilance, reaction times can vary as a function of many variables, and be different for the same level of vigilance. For example, a primate performing the random dot motion task exhibits differences in reaction times that can be explained entirely by the stimulus strength. Reaction time is therefore not a sound measure of vigilance, and if a goal of this work is to investigate this parameter, then it should be measured. There is some attempt at doing this for a subset of the data in Figure 3H, by looking at differences in the action of monitoring the visual field (presumably a rearing motion, though this is not described) between the first and second trials in the presence of sucrose. I find this an extremely contrived measure. What is the rationale for analysing only the difference between the first and second trials? Also, the results are only statistically significant because the first trial in the sucrose condition happens to have zero up action bouts, in contrast to all other conditions. I am afraid that the statistics are not solid here. When analysing the effects of dominance, a vigilance metric is the time spent in the reward zone. Why is this a measure of vigilance? More generally, measuring vigilance of threats in mice requires monitoring the position of the eyes, which previous work has shown is biased to the upper visual field, consistent with the threat ecology of rodents.

      We agree that reaction time can be influenced by multiple factors, including stimulus strength. Consistent with this, reaction times (i.e. latencies to flee) were substantially shorter under highcontrast conditions. However, even under the same high-contrast condition, reaction times were significantly shorter in the reward conditions compared to the no-reward condition, suggesting that other factors such as vigilance may contribute.

      Regarding the measurement of vigilance, in addition to the latency to flee, we analyzed two additional behavioral measures related to vigilance. First, we examined the foraging interval. Our hypothesis was that more vigilant animals would wait longer before re-entering the reward zone following threat exposure. Consistent with this prediction, mice under sucrose and water reward conditions showed significantly longer foraging intervals than those under no-reward conditions (Figure 3L). Second, we analyzed the foraging speed as mice approached the reward. Increased vigilance should lead to more cautious and therefore slower movements. Our results support this, as mice moved more slowly towards the reward under sucrose conditions (Figure 3M). Taken together, these three measures consistently indicate that mice exhibit increased vigilance under sucrose reward in high-threat conditions.

      (2) In both low and high contrast conditions, there are differences in escape behaviour between no reward and water or sucrose presence, but no statistically significant differences between water and sucrose (eg, Figure 3B). I therefore find that statements about reward value are not supported by the data, which only show differences between the presence or absence of reward. Furthermore, there is a confound in these experiments, because according to the methods, mice in the no-reward condition were not water deprived. It is thus possible that the differences in behaviour arise from differences in the underlying state.

      Our new analysis, which segments behavior into an early adaptive phase and a late stable phase, reveals a statistically significant difference between water and sucrose rewards in the late phase (Figure 3H), supporting a graded effect of reward value.

      To control for the potential confounds related to internal state, mice were not water-deprived in all reward conditions. We have clarified this in the revised manuscript.

      (3) There is very little methodological information on behavioural quantification. For example, what is hiding latency? Is this the same are reaction time? Time to reach the safe zone? What exactly is distance fled? I don't understand how this can vary between 20 and 100cm. Presumably, the 20cm flights don't reach the safe place, since the threat is roughly at the same location for each trial? How is the end of a flight determined? How is duration measured in reward zone measures, e.g., from when to when? How is fleeing onset determined?

      Hiding latency was defined as the time from stimulus onset to the animal’s arrival at the safe zone. Reaction time was quantified as the latency to flee, measured from stimulus onset to the initiation of the first flight state. The flight state was defined as locomotion exceeding 10 cm at a speed greater than 10 cm/s. Distance fled was defined as the distance covered between stimulus onset and offset for all trials. However, in trials classified as no reaction or freezing, this measure does not accurately reflect escape behavior. We will therefore rename it as distance under threat to better capture its meaning. The reward zone was defined as the region within 10 cm of the reward port at the end of the arena. Duration in the reward zone was measured as the time spent within this region during the 20 seconds following stimulus onset. In Figure 4E, the percentage of time spent in the reward zone was calculated relative to the total time the mouse remained in the arena during the 2-hour social session.

      All definitions and additional details on behavioral quantification have been included in the revised Methods section.

      (4) There is little methodological information on how the model was fit (for example, it is surprising that in the no reward condition, the r parameter is exactly 0. What this constrained in any way), and none of the fit parameters have uncertainty measures so it is not possible to assess whether there are actually any differences in parameters that are statistically significant.

      We have provided a detailed description of the model fitting procedure in the revised Methods section. Specifically, the reward-value parameter (r) was constrained to zero in the no-reward condition. We have plotted how the overall loss varies with differeent parameters (Figure S9).

      Reviewer #3 (Public review):

      Male mice were tested in a classic behavioral "flee the looming stimulus" paradigm. This is a purely behavioral study; no neural analyses were done. Mice were housed socially, but faced the looming stimulus individually. Drift-diffusion modeling found that reward-level interacted with threat level such that at low-threat levels, reward contrasted with threat as classically expected (high reward overwhelms low threat, low threat overwhelms low reward), but that reward aligned with threat at higher threat levels.

      Note that they define threat level by the darkness of the looming stimulus. I am not sure that darker stimuli are more threatening to mice. But maybe. Figure 3 shows that mice react more quickly to high contrast looming stimuli, but can the authors distinguish between the ability to detect the visual signal from considering it a more dangerous threat? (The fact that vigilance makes a difference in the high contrast condition, not the low contrast condition, actually supports the author's hypotheses here.)

      Regarding the interpretation of stimulus contrast as a proxy for threat level, we agree it is crucial to distinguish improved detection from heightened threat perception. To address this, we examined not only latency to flee but also escape distance and peak escape speed, two measures that reflect the intensity of the defensive response. If contrast only influenced detection, we would expect differences in latency but not in escape distance or speed. All three measures differed significantly across contrast conditions, supporting the interpretation that high-contrast stimuli are perceived as more threatening rather than simply more detectable. Furthermore, manual review of "no response" trials confirmed reliable detection in both conditions, with only three potential "missed" trials out of 117 under low contrast (Figure S3B). We have included this discussion in the revised manuscript.

      The drift-diffusion model (DDM) is fine. I note that the authors included a "leakage rate", which is not a standard DDM parameter (although I like including it). I would have liked to see more about the parameters. What were the distributions? What did the parameters correlate with behaviorally? I would have liked to see distributions of the parameters under the different conditions and different animals. Figure 2C shows the progression of learning. How do the fit parameters change over time as mice shift from choice to choice? How do the parameters change over mice? How do the parameters change over distance to the threat/distance to safety (as per Fanselow and Lester 1988)? They did a supplemental experiment where the threat arrived halfway along the corridor - we could get a lot more detail about that experiment - how did it change the modeling?

      Because our model is fit to the variance of latency distributions, it cannot be applied to singletrial data. Instead, we analyzed how decisions and latencies vary as functions of the fitted threat gain and reward value parameters (Figures 5G and 5H). We have also introduced a simplified deterministic model to further elucidate the decision-making process.

      Regarding the influence of distance to the threat, we conducted additional experiments, presenting the looming stimulus at the end of the arena when the mouse was at different distances from it (Figures S2C–G). We found that as the prey-threat distance increased, mice showed less direct escape behavior, with longer latencies to flee and slower escape speeds. This is consistent with the predatory imminence continuum theory (Fanselow and Lester, 1988), which describes graded defensive behaviors tuned to perceived threat level.

      Regarding the influence of distance to safety, our data indicate that it did not significantly affect defensive responses (Figures S2H and S2I). To test this further, we introduced barriers that lengthened the return path to the safe zone. We found that defensive decisions were not correlated with the distance to the safe zone (Figures S2J and S2K), suggesting that once a threat is detected, animals prioritize escape initiation over evaluating the exact path to safety.

      Overall, this is a reasonable study showing mostly unsurprising results. I think the authors could do more to connect the vigilance question to their results (which seems somewhat new to me).

      We have expanded our analysis of vigilance. In addition to escape latency, we examined the foraging interval and foraging speed. We hypothesized that more vigilant animals would wait longer before re-entering the reward zone following a threat and would approach the reward more slowly. Consistent with this prediction, mice in the sucrose- and water-reward conditions exhibited significantly longer foraging intervals and slower foraging speeds compared to those in the no-reward condition (Figures 3M and 3N). Together, these three measures consistently demonstrate that mice display heightened vigilance under high-threat, high-reward conditions.

      Although the data appear generally fine and the modeling reasonable, the authors do not do the necessary work to set themselves within the extensive literature on decision-making in mice retreating from threats.

      First of all, this is not a new paradigm; variants of this paradigm have been used since at least the 1980s. There is an *extensive* literature on this, including extensive theoretical work on the relation of fear and other motivational factors. I recommend starting with the classic Fanselow and Lester 1988 paper (which they cite, but only in passing), and the reviews by Dean Mobbs and Jeansok Kim, and by Denis Paré and Greg Quirk, which have explicit theoretical proposals that the authors can compare their results to. I would also recommend that the authors look into the "active avoidance" literature. Moreover, to talk about a mouse running from a looming stimulus without addressing the other "flee the predator" tasks is to miss a huge space for understanding their results. Again, I would start with the reviews above, but also strongly urge the authors to look at the Robogator task (work by June-Seek Choi and Jeansok Kim, work by Denis Paré, and others).

      Similarly, in their anatomical review, they do not mention the amygdala. Given the extensive literature on the role of the amygdala in retreating from danger, both in terms of active avoidance and in terms of encoding the danger itself, it would surprise me greatly if this behavior does not involve amygdala processing. (If there is evidence that the amygdala does not play a role here, but that the superior colliculus does, then that would be a *very* important result that needs to be folded into our understanding of decision-making systems and neural computational processing.)

      Second, there is an extensive economic literature on non-human animals in general and on rodents in particular. Again, the authors seem unaware of this work, which would provide them with important data and theories to broaden the impact of their results (by placing them within the literature). First, there are explicit economic literatures in terms of positively-valenced conflicts (e.g., neuroeconomics within the primate literature, sequential foraging and delaydiscounting tasks within the rodent literature), but also there is a long history within the rodent conditioning world, such as the classic work by Len Green and Peter Shizgal. I would strongly urge the authors to explore the motivational conflict literature by people like Gavin McNally, Greg Quirk, and Mark Andermann. Again, putting their results into this literature will increase the impact of their experiment and modeling.

      We have substantially revised the manuscript to contextualize our findings within the extensive literature on defensive behavior and decision-making. The revised Introduction and Discussion now integrate key theoretical frameworks, such as the predatory imminence continuum, and cite relevant work on active avoidance and other "flee the predator" paradigms (e.g., the Robogator task).

      We have also incorporated perspectives from neuroeconomics and motivational conflict, including literature on sequential foraging, delay-discounting tasks, and relevant rodent studies. Furthermore, we now discuss the potential contributions of specific brain regions, including the superior colliculus and the amygdala, to the economic and social modulation of innate defensive decisions in response to visual threats.

      Recommendations for the authors:

      Reviewing Editor Comments:

      These additional recommendations are generally consistent and overlapping across reviewers, particularly Reviewer #1 and 2, so it is advisable to undertake these changes/additions.

      Reviewer #1 (Recommendations for the authors):

      (1) Experimental methods and trial structure need clarification: It is often unclear how many trials were included per condition, per mouse, and whether the key behavioral effects (especially reward-related changes) were observed early in the session or after repeated stimulus exposure. For example, in several reward-related plots (e.g., Figure 3), it is not specified whether results are driven by early or later trials. Since the authors themselves report rapid learning of the looming stimulus (habituation), it is critical to state how many trials were included in each comparison, and to analyze whether effects hold on the first exposure and not the rest. Otherwise, conclusions about value-based behavior are hard to separate from learning effects, which may also differ between individuals. Specifically, the methods section is vague and hard to follow.

      We have substantially expanded the Methods section with additional details to improve clarity.

      To account for individual variability in habituation to the looming stimulus, we segmented trials for each animal into early and late phases. We demonstrate that threat level is the dominant factor driving behavioral responses in the early phase, while both threat level and reward condition shape behavior in the late phase. We have substantially revised Figures 2 and 3 to reflect these changes.

      (2) Add a summary of experimental design: A table or schematic summarizing the trial structure, experimental groups, reward/threat conditions, and the timeline of exposures would greatly improve clarity.

      We have added a schematic to Figure 2 summarizing the trial structure, experimental groups, reward and threat conditions, and the overall timeline.

      (3) Replot key results using only the first trial per mouse: This would allow readers to assess the first (not learned) responses and help control for habituation/suppression.

      We have replotted behavioral results using only the first trial from each mouse and included these analyses in Figure S5. These results confirm that threat level is the dominant factor driving the initial response to looming stimuli.

      (4) The model needs stronger justification and predictive value: As it stands, the model primarily fits the existing data and does not offer new insights beyond what is already evident from the behavioral results.

      Important findings, such as social hierarchy effects and habituation dynamics, are not captured in the model, reducing its relevance to the full dataset.

      The drift-diffusion framework is widely used, and in this implementation appears to have been adjusted post hoc to fit the observed data rather than generating new conceptual advances. No comparison with simpler models is included. Without testing simpler or alternative models, it is not clear whether the added complexity is necessary or justified.

      Use the model to generate and test predictions: to increase the model's contribution, the authors could simulate new conditions. Suggested experiments include:

      a) Predicting escape probability and latency at intermediate threat intensities to test whether behavior shifts gradually or abruptly.

      b) Using the model's habituation parameters to predict changes in escape behavior over repeated exposures.

      c) Adjusting vigilance or threat gain parameters to simulate dominant versus subordinate animals, and comparing model predictions to actual behavioral differences based on social rank.

      We have substantially revised the modeling section to address these concerns. The updated model is now fitted to behavioral data from the late phase of the reward–threat experiments and used to generate predictions for the early phase and for rank-dependent behavioral differences.

      The model accurately captures behavioral patterns across these conditions, demonstrating predictive power beyond descriptive fitting. Accordingly, we have removed the habituation component. Furthermore, we have introduced a simplified deterministic model in the revised manuscript to further understand the decision-making process.

      (5) Clarify housing and arena access conditions: It is unclear from the text whether all mice are in the nest during looming presentations and whether only one mouse is in the arena during the stimulus. This is important for understanding the social context of each trial and should be explained in the main text and methods.

      We have clarified this point in the Methods section. Under normal door operation, only one mouse was allowed in the arena during looming exposure. Specifically, when all mice were in the nest, the nest-tunnel door was open and the tunnel-arena door was closed. Once a single mouse entered the tunnel, as detected by an OpenMV camera, the nest-tunnel door closed and the tunnel-arena opened, ensuring that only that mouse could enter the arena.

      (6) Alternative interpretation of subordinate behavior: differences in area coverage and time in the reward zone may not reflect reduced vigilance, but rather avoidance of dominant mice. Subordinates may remain in the open arena to avoid conflict. The authors do not provide evidence distinguishing between these interpretations, and this should be addressed.

      To address the alternative explanation that subordinate mice may remain in the arena due to restricted nest access, we compared arena occupancy before, during, and after looming exposure (Figure 4C). Before looming exposure, subordinate mice spent significantly more time in the arena, consistent with the idea that they may perceive a social threat from the dominant mouse in the absence of any external threat. However, this difference disappeared during and after looming exposure. This shift suggests that the presence of an external threat alters the social dynamic, reducing the influence of dominance on nest access.

      To further assess whether dominant mice blocked subordinate access to the nest during threatdriven escapes, we analyzed the fraction of escape trials in which mice returned to the nest (Figure 4D). We found no significant difference between dominant and subordinate mice, indicating that dominant mice did not restrict nest access during these trials. Importantly, rank differences in reward-zone occupancy cannot be explained by nest exclusion, as mice do not need to return to the nest when escaping the threat—they can flee directly to the safe zone. Thus, nest access limitations do not account for the observed rank-dependent patterns.

      We agree with the reviewer that reward-zone occupancy should not be interpreted as reduced vigilance in subordinate mice; instead, it likely reflects higher perceived reward value. The manuscript has been revised accordingly.

      (7) Address why robust looming responses were observed in group-housed mice: previous studies often require single housing to elicit strong defensive responses. The authors should explain why their setup yields robust results in group-housed animals and whether housing conditions may interact with dominance or habituation.

      Looming exposure elicits robust defensive behaviors in both group- and single-housed mice (Yilmaz and Meister, 2013, Lenzi et al., 2022), with single-housed animals habituating more quickly to the stimulus (Lenzi et al., 2022). We have now discussed how housing conditions may interact with social rank and habituation to shape defensive behaviors in the revised manuscript.

      For the social-rank experiments, we intentionally co-housed dominant and subordinate mice to maintain a stable hierarchy. This choice was motivated by two considerations. First, our goal was to investigate how social rank modulates defensive responses under ethologically relevant conditions, where mice naturally live in groups. Single housing would remove this social context. Second, singly housing mice can destabilize or eliminate rank relationships, making it difficult to interpret rank-dependent behavioral differences.

      (8) Add analysis of individual variability: trial-by-trial variability or stable behavioral tendencies in individual animals are not explored. This could explain part of the variation currently attributed to social rank.

      We have analyzed individual variability in both dominant and subordinate mice. We observed substantial variability across all behavioral measurements for each group (Figure S7). To attribute the observed behavioral differences to social hierarchy rather than to other individual traits, we conducted paired comparisons between dominant and subordinate mice (Figure 4).

      (9)  Improve figure labeling and readability: some plots are ambiguous in terms of whether rows represent trials or animals. Overlapping points obscure the data in several figures, for example, Figure 3H, sucrose is n=4?- consider using jittered scatter plots, boxplots, or individual traces to improve clarity. Also same Figure axis Y is missing an 'e'.

      We have revised figures to improve clarity and corrected the typos.

      (10) Avoid overinterpretation of causal explanations: Statements such as "reward increases vigilance due to evolutionary pressure" or that "subordinates are less vigilant" go beyond what the current data can demonstrate and should be rephrased more cautiously.

      We have revised the manuscript to tone down the statement.

      Reviewer #2 (Recommendations for the authors):

      (1) Provide much more extensive methodological details on analyses and model fitting

      We have thoroughly revised the Methods section to provide extensive detail on both behavioral analyses and computational modeling, as outlined in our responses to points (3) and (4) of the Public Review.

      (2) Perform experiments or analyses that directly measure vigilance, if vigilance is to remain as a key explanation for the data.

      As detailed in our response to point (1) of the Public Review, we have supplemented the escape latency measure with two direct behavioral analyses of vigilance: foraging interval and foraging speed. This multi-metric approach robustly supports the interpretation of heightened vigilance.

      (3) Provide extra evidence for an effect of reward value, as opposed to the presence or absence of reward. Control for differences arising from the water deprivation state by performing the no reward condition experiments in water-deprived mice.

      All behavioral data in the reward–threat experiment were collected on normal (non-deprived) mice (Figures 2 and 3), which have been clarified in the revised manuscript. We have reanalyzed the data by segmenting trials into early and late phases for each animal. In the late phase, under low-threat conditions, the effect of reward value is reflected in significant differences between water and sucrose in terms of escape distance and time spent in the reward zone (Figures 3I and 3J). Under high-threat conditions, the reward value effect is reflected in significant differences in latency to flee and peak escape speed (Figures 3K and 3N).

      (4)  Using drift rate to describe the "r" variable is confusing because the drift rate of the drift diffusion process is also determined by terms alpha, beta, and h-terms.

      We have termed “r” as the reward value in the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) I would tone down some of the extreme statements about the problems of previous experiments (such as that most decision-making is on 2AFC). Lots of people do decision-making in serial foraging, fleeing, and other behavioral tasks. The classic Morris water-maze or Barnesmaze are decision-making tasks that aren't 2AFC. Serial foraging tasks, such as the Restaurant Row task aren't 2AFC. And, actually, lots of mouse behavior tasks are deciding when to stop on a treadmill for a reward. And, for that matter, your task isn't all that "realistic" - mice aren't evolved to flee looming disks, they are evolved to flee hawks and owls. This doesn't invalidate your task at all. I just recommend making it about your work in a positive way rather than others in a negative way.

      We have revised the manuscript to adopt a more positive framing of our work.

      (2) I also don't think there's much use in bringing in crayfish in a mouse task. Spend your time connecting to the other rodent data (mice and rats) instead.

      We agree and have revised the manuscript accordingly, focusing our discussion on relevant rodent literature to provide a more appropriate context for our findings.

      Minor concerns:

      (1) The authors use the term "cognitive control" without making clear what they mean. In general, the authors seem to have a view on decision-making as either being "reflexes" or "cognitive control". This is a very outdated perspective. Modern perspectives include multiple decision-making systems competing, separating these based on their computational properties, such as planning, procedural, instinctual, and, yes, reflexive. Current views on the kinds of behaviors they are discussing generally see fleeing as a transition from reflexive (tonic immobility, freezing) and instinctual responses (freezing, fleeing) to deliberative (anxiety) and procedural (habit). The authors might take a look at the recent Calvin and Redish (2025) paper for some ideas on this.

      We appreciate the reviewer’s insight regarding the term “cognitive control.” In our study, we used this term to emphasize that defensive responses to looming threats are not purely reflexive. Mice exhibit four distinct types of defensive decisions within a short time window, and these decisions are systematically modulated by reward value and social rank. Notably, reward modulation is bidirectional: high reward suppresses defensive responses under low-threat conditions but enhances them under high-threat conditions, indicating that animals integrate multiple sources of information rather than relying solely on instinctive mechanisms.

      We did not observe mid-trajectory aborts in mice, as reported in rats by Calvin & Redish (2025). This difference may reflect species-specific behavior or the nature of the threat: our looming stimulus is purely visual and non-harmful, whereas the robotic predator in their study presents a physical threat. We have revised the Discussion to clarify our use of “cognitive control” and to incorporate these perspectives.

      (2) Only male mice were used. This limits the conclusions that can be drawn.

      We acknowledge the limitation of using only male mice and have discussed this limitation in the revised manuscript.

      (3) Did the authors observe darting behavior? (Gruene...Shansky 2015).

      We did not observe darting behavior, characterized by rapid movement, as reported during inescapable fear conditioning. In our experiment, the mice consistently escaped towards the nest, in most trials, ran directly to the nest without stopping. Occasionally, under low contrast conditions, mice paused once or twice but never moved towards the reward.

      (4) How was only one mouse allowed into the linear arena at a time?

      When all mice were in the nest, the nest-tunnel door was open while the tunnel-arena door remained closed. When a single mouse entered the tunnel, as detected by the RFID and OpenMV camera system, the nest-tunnel door closed and the tunnel-arena door opened, allowing only that mouse to enter the arena. We have clarified this protocol in the Methods section.

      (5) I would like to see more extensive analyses of the animal's responses as a function of distance to the threat (as per Fanselow and Lester 1988).

      As detailed in our response to the public review, we conducted new experiments analyzing behavior as a function of prey–threat distance. The finding that defensive responsiveness decreases with increasing prey–threat distance is now presented in Figures S2C–G and discussed in the context of the predatory imminence continuum.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors report that GPR55 activation in presynaptic terminals of Purkinje cells decrease GABA release at the PC-DCN synapse. The authors use an impressive array of techniques (including highly challenging presynaptic recordings) to show that GPR55 activation reduces the readily releasable pool of vesicle without affecting presynaptic AP waveform and presynaptic Ca<sup>2+</sup> influx. This is an interesting study, which is seemingly well-executed and proposes a novel mechanism for the control of neurotransmitter release. However, the authors' main conclusions are heavily, if not solely, based on pharmacological agents that most often than not demonstrate affinity at multiple targets. Below are points that the authors should consider in a revised version.

      We are happy to hear the encouraging comments from this reviewer, and thank for pointing out the important issues including the previous study design depending only on pharmacological agents. To address these, we have performed additional experiments, as detailed below.

      Major points:

      (1) There is no clear evidence that GPR55 is specifically expressed in presynaptic terminals at the PC-DCN synapse. The authors cited Ryberg 2007 and Wu 2013 in the introduction, mentioning that GPR55 is potentially expressed in PCs. Ryberg (2007) offers no such evidence, and the expression in PC suggested by Wu (2013) does not necessarily correlate with presynaptic expression. The authors should perform additional experiments to demonstrate the presynaptic expression of GPR55 at PC-DCN synapse.

      We completely agree with the reviewer in that our previous manuscript lacked the reliable information regarding presynaptic expression of GPR55 at PC boutons.

      To clarify the localization, we first tried immunostaining of GPR55 using commercially available antibodies, but unfortunately they did not provide clear labeling of neurons and also even in GPR55-transfected HEK cells (used as positive control). Thus, we gave up the direct immunostaining. Alternatively, we attempted to label PC axonal boutons by GPR55-targeting dye together with a complementary strategy based on gene knock-down. Specifically, we used T1117, a fluorescent derivative of AM251 which is a GPR55 ligand used in the manuscript, and clear fluorescent signals were evident at GFP-labeled PC terminals. Still, by itself it was not clear whether the labeling was mediated by association with GPR55. Therefore, we also attempted to specifically suppress gene expression of GPR55 using CRISPR/Cas9-mediated genome editing in PCs, based on acute DNA micro-injection of plasmids into nuclei of PCs to express gRNAs targeting GPR55 together with Cas9. As a result, 5 days after the knock-down, T1117 labeling at axon terminals was reduced by ~50% compared to Cas9-alone controls. All these data are now shown in new Figure 2, and explained in the text p5-6, lines 141-159. Further, the reduction of GPR55 expression abolished the AM251-mediated reduction of vesicular exocytosis, as shown in new Figure 3D, E.

      Taken together, these results essentially convince our main conclusions by strongly suggesting that GPR55 is present at PC axon terminals, where it negatively regulates the exocytosis upon activation by AM251.  

      (2) The authors' conclusions rest heavily on pharmacological experiments, with compounds that are sometimes not selective for single targets. Genetic deletion of GPR55 would be a more appropriate control. The authors should also expand their experiments with occlusion experiments, showing if the effects of LPI are absent after AM251 or O-1602 treatment. In addition, the authors may want to consider AM281 as a CB1R antagonist without reported effects at GPR55.

      We thank the reviewer for pointing out these important issues. First, as noted above to confirm the presence of GPR55 at axon terminals of PCs, we performed genetic deletion of GPR55 using CRISPR/Cas9 system. In PCs co-expressing Cas9 and two gRNAs targeting the ligand-binding domain of GPR55, AM251 failed to suppress the exocytosis at PC boutons, together with decreased T1117 labeling. Therefore, the idea that GPR55 negatively regulates transmitter release at PC boutons has now been strengthened. The new data is shown in Figure 3D and E, and explained in the text p6, lines 173-178.  

      As suggested, we also carried out the occlusion experiments with LPI and AM251. First, LPI similarly reduced the readily releasable pool (RRP) size as AM251 did. Then, applied together, LPI and AM251 did not further reduce the RRP size compared with the effect by either compound alone. Thus, LPI and AM251 seem to act through the same pathway, consistent with the idea for role of GPR55 activation. The data is shown in new Figure 5—figure supplement 1 and explained in the text, p7-8, lines 215-221.

      Regarding another point suggested by the reviewer, we applied AM281 and observed no effect on transmission at the PC–target neuron synapses (shown in new Figure 1F and I; explained in the text p5, lines 117-123), indicating that the effect of AM251 is likely to be mediated by GPR55, but not by CB1R.

      Taken together, our additional experiments based on genetic and pharmacological experiments have consolidated our conclusion that GPR55 suppresses the presynaptic neurotransmitter release in PC boutons.

      (3) It is not clear how long the different drugs were applied, and at what time the recordings were performed during or following drug application. It appears that GPR55 agonists can have transient effects (Sylantyev, 2013; Rosenberg, 2023), possibly due to receptor internalization. The timeline of drug application should be reported, where IPSC amplitude is shown as a function of time and drug application windows are illustrated.

      Thank you for suggesting the better presentation of data. Accordingly, we have re-organized figures showing time course of changes in IPSCs before and after the drug application (new Figure 1 and 4; p4, lines 94-97; p5, lines 110-115; p7, lines 193-197). The current data presentation clearly shows that the effect of AM251 becomes evident in a few minutes after application, and somehow reaches a saturated level.

      (4) A previous investigation on the role of GPR55 in the control of neurotransmitter release is not cited nor discussed (Sylantyev et al., (2013, PNAS, Cannabinoid- and lysophosphatidylinositolsensitive receptor GPR55 boosts neurotransmitter release at central synapses). Similarities and differences should be discussed.

      We are really sorry for failing to adequately discuss this important work in our previous manuscript, and deeply appreciate the reviewer for pointing this out. We have now cited and discussed the work by Sylantyev et al. (2013), in the text (p12, lines 380-389), as following:

      ‘Pioneering studies clarified an important role of GPR55 in synaptic transmission at hippocampal excitatory synapses, demonstrating presynaptic enhancement of glutamate release presumably by elevating the cytoplasmic residual Ca<sup>2+</sup> via release from intracellular stores (Sylantyev et al., 2013; Rosenberg et al., 2023), in contrast to the suppression of release in our observation. The lack of positive modulation of AP-triggered release through residual Ca<sup>2+</sup> in PC terminals might be due to abundant amount of potent Ca<sup>2+</sup> buffer calbindin (Fierro and Llano, 1996). Indeed, increased vesicular fusion only for the AP-insensitive spontaneous vesicular release (as mIPSCs) was observed upon the IP<sub>3</sub>-mediated Ca<sup>2+</sup> release from internal store (Gomez et al., 2020). Thus, minimal sensitivity of AP-triggered release to residual Ca<sup>2+</sup> in PC boutons would underlie the distinct effects of GPR55 activation at the presynaptic side.’  

      Minor point:

      (1) What is the source of LPI? What isoform was used? The multiple isoforms of LPI have different affinities for GPR55.

      Thank you for letting us know about the lack of important information in the previous manuscript. In our experiments, we used a soybean-derived LPI mixture containing approximately 58% C16:0 and 42% C18:0 or C18:2 species. According to Brenneman et al. (2025), these isoforms show moderate or strong effects in cultured DRG neurons, whereas the C20:4 isoform, reported to promote neuroinflammatory signaling, was contained only at very low levels. We have added this information to the revised manuscript and briefly discussed the influence of different LPI isoforms on the physiological outcomes of GPR55 activation (p5, lines 127-131; p15, lines 493-496).

      Reviewer #2 (Public review):

      Summary:

      This paper investigates the mode of action of GPR55, a relatively understudied type of cannabinoid receptor, in presynaptic terminals of Purkinje cells. The authors use demanding techniques of patch clamp recording of the terminals, sometimes coupled with another recording of the postsynaptic cell. They find a lower release probability of synaptic vesicles after activation of GPR55 receptors, while presynaptic voltage-dependent calcium currents are unaffected. They propose that the size of a specific pool of synaptic vesicles supplying release sites is decreased upon activation of GPR55 receptors.

      Strengths:

      The paper uses cutting-edge techniques to shed light on a little-studied, potentially important type of cannabinoid receptor. The results are clearly presented, and the conclusions are for the most part sound.

      We feel very happy to see the positive comments from the reviewer.  

      Weaknesses:

      The nature of the vesicular pool that is modified following activation of GPR55 is not definitively characterized.

      We agree with the reviewer in that our data cannot fully address the changes of vesicle pools caused by GPR55. As detailed in responses to comments in ‘Recommendations for the authors’ from the reviewer, we have added explanation and discussion in the main text of the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      Inoshita and Kawaguchi investigated the effects of GPR55 activation on synaptic transmission in vitro. To address this question, they performed direct patch-clamp recordings from axon terminals of cerebellar Purkinje cells and fluorescent imaging of vesicular exocytosis utilizing synaptopHluorin. They found that exogenous activation of GPR55 suppresses GABA release at Purkinje cell to deep cerebellar nuclei (PC-DCN) synapses by reducing the readily releasable pool (RRP) of vesicles. This mechanism may also operate at other synapses.

      Strengths:

      The main strength of this study lies in combining patch-clamp recordings from axon terminals with imaging of presynaptic vesicular exocytosis to reveal a novel mechanism by which activation of GPR55 suppresses inhibitory synaptic strength. The results strongly suggest that GPR55 activation reduces the RRP size without altering presynaptic calcium influx.

      We thank the reviewer for giving the encouraging comments on our study.

      Weaknesses:

      The study relies on the exogenous application of GPR55 agonists. It remains unclear whether endogenous ligands released due to physiological or pathological activities would have similar effects. There is no information regarding the time course of the agonist-induced suppression. There is also little evidence that GPR55 is expressed in Purkinje cells. This study would benefit from using GPR55 knockout (KO) mice. The downstream mechanism by which GPR55 mediates the suppression of GABA release remains unknown.

      We thank the reviewer for pointing out all of these important issues to be ideally addressed. As detailed in the responses to comments in the ‘Recommendations for the authors’ from the reviewers, we have addressed most of these weak points, and also added careful discussion in the text about the open questions to be solved in the future study.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      This is a high-quality paper that reports novel and interesting results. The authors should consider one main critique, related to Figure 6, as well as a number of minor points.

      We thank the reviewer for making very positive assessment of our study. We have carefully considered the main critique regarding presynaptic vesicle pools (related to previous Figure 6), as well as other points, and accordingly revised manuscript.

      Main critique:

      In Figure 6, it is said that GPR55 locks SVs in a state that is insensitive to VGCCs, based on a series of experiments with synapto-pHluorin. This conclusion is open to several critiques:

      The authors' model is shown in the diagram of Figure 6A. In this scheme, it appears as if recycled SVs eventually re-acidify in spite of the presence of bafilomycin, and that they are directed to a location close to the plasma membrane, but away from VGCCs. In fact, there is no evidence that the effects of bafilomycin could be limited in time. And there is a lot of evidence indicating that recycled SVs move back to release sites, close to VGCCs.

      We are so sorry for presenting misleading figure panel in the previous Figure 6A. As the reviewer says, the effect of bafilomycin should be expected to last for long, and then the endocytosed vesicles cannot be re-acidified. Now, in new Figure 8A, we have changed the panel for explanation about the experimental situation of vesicles in the presence of bafilomycin. Another insightful point, kindly suggested by the reviewer, regarding the quick recruitment of newly endocytosed vesicles to release sites, is highly related to the interpretation of our data, but is a different issue from the situation explained in new Figure 8A. To avoid confusion, the arrow drawn in the previous version indicating the endocytosed vesicle movement back to the docked situation has been omitted in the new panel, and this critical issue is now carefully discussed in terms of the mechanism of GPR55 action on the release machinery (p15, lines 480-482).

      The saturation of the train-induced signals is interpreted as reflecting an exhaustion of SVs initially close to VGCCs or more generally, susceptible to being released following VGCC activation.

      In an alternative scenario, saturation occurs because AP trains, or KCl applications, become unable to activate VGCCs. This could occur either because long illumination causes photodamage of VGCCs, or because repeated activation of VGCCs leads to their inactivation. The latter explanation is possible in spite of a publication from the authors' laboratory describing the facilitation of presynaptic VGCCs following paired stimulations in this synapse (Diaz-Rojas et al., 2015).

      We agree that it is an important control experiment to demonstrate that Ca<sup>2+</sup> increase upon repetitive AP trains is intact even during or after the long photo-illumination for imaging. To test this possibility, we have performed additional fluorescent Ca<sup>2+</sup> imaging at PC varicosities during individual 400-AP trains and also in response to 50 mM KCl following the series of AP trains. Now new data demonstrated that Ca<sup>2+</sup> influx remains constant across all AP trains (shown in Figure 8— figure supplement 1), arguing against VGCC inactivation or photodamage as a major factor underlying the saturated signal increase in the synapto-pHluorin. We have added explanation regarding this issue in the text p11, lines 327-329.

      The authors explain the larger effect of ionomycin compared with AP trains and KCl applications as reflecting a better capacity to increase the bulk calcium concentration. The above proposal for the inactivation of VGCCs offers an alternative explanation, in my view more likely.

      As noted above, our newly added Ca<sup>2+</sup> imaging data clearly showed that individual AP trains induced similar Ca<sup>2+</sup> influxes during repetitive trials, in line with our original interpretation. In addition, the Ca<sup>2+</sup> increase by KCl was shown to be more potent and broader in axon terminals and trunks. Nevertheless, the exocytic signal caused by ionomycin was clearly large, implying a critical effect of the source of Ca<sup>2+</sup> influx in PC boutons. Therefore, we suppose that the marked effect of ionomycin on release reflects higher elevation of bulk Ca<sup>2+</sup> in the cytoplasm arising from non-site selective Ca<sup>2+</sup>-ionophore (Figure 8—figure supplement 1, p11, lines 327-334; lines 342-349).

      In yet another scenario, recycled SVs in bafilomycin retain their fluorescence since they do not reacidify, but they come back to release sites to undergo new rounds of exocytosis. The new exocytosis events do not increase the fluorescence since the pH in the vicinity of synapto-pHluorin does not change. NH4Cl would then increase the fluorescence by revealing SVs that had not undergone exocytosis-endocytosis cycles during AP trains or KCl exposure. In this last scenario, the GPR55-sensitive SV pool would be a specific sub-pool of SVs that can be recycled by repetitive 400 AP trains.

      We deeply appreciate the reviewer for pointing out this important possibility. We completely agree that this scenario can also explain the pool which is sensitive to GPR55. Therefore, we have added explanation of this possibility in the text (p15, lines 474–482).

      Figure 6F shows calcium imaging measurements of PC varicosities. Unfortunately, crucial measurements are missing. It would have been revealing to compare calcium rises for the first and the last of the 8 400-AP trains. And to compare calcium rises elicited by 60 mM KCl before and after the series of 8 400-AP trains.

      This is an important control experiment. Therefore, we have performed additional Ca<sup>2+</sup> imaging during the eight 400-AP trains and KCl application. The new results shown in the present Figure 8—figure supplement 1 clearly suggest that Ca<sup>2+</sup> rises are comparable between the first and eighth trains, and that additional Ca<sup>2+</sup> influx (which was large in amplitude and wide in area) could still be evoked by KCl after the eight trains. The experiments are explained in the text p11, lines 327336.

      Minor points:

      (1) Introduction: The Introduction would benefit from a more substantial description of what is known about GPR55 and downstream signaling pathways. Right now, it is stated that GPR55 is 'potentially expressed in PCs': What are the arguments behind this statement? Also, the signaling pathway is discussed on p.12, much too late in the ms. Why not move this section to the Introduction?

      We thank the reviewer for the helpful suggestion. As recommended, in the revised manuscript, we have changed the Introduction by moving the sentences from other sections, including speculation about the expression of GPR55 in Purkinje cells (Ryberg et al., 2007; Wu et al., 2013) (p3-4, lines 71-75) and downstream signaling pathways (Gα<sub>q</sub>/PLC/IP<sub>3</sub>/Ca<sup>2+</sup> and Gα<sub>13</sub>/RhoA/ROCK) (p3, 63-68).  

      (2) Legend to Figures 1, 2, and 4: What is the EGTA concentration in these experiments?

      As suggested, the EGTA concentrations (0.5 or 5 mM) used in the individual experiments have now been clearly indicated both in the figure legends and in the Methods section (p18, lines 585586).

      (3) Fig. 3C: These experiments show that some SV pool is depleted by AM251. The authors state that this is the RRP, but other options are possible. In the calyx of Held, similar experiments are supposed to deplete not only the FRP (=RRP, presumably) but also the SRP.

      We thank the reviewer for pointing out the important aspect related to category for vesicle pools. In PC boutons, the membrane capacitance increases in response to different duration of depolarization pulses in a manner fitted by a single exponential curve (see Figure 5C for example). Our previous study (Kawaguchi and Sakaba, 2015) noted that the vesicle pools corresponding to FRP and SRP may not be easy to distinguish in PCs, suggesting apparently single component. That’s the reason why we simply describe the component as RRP in the present manuscript. Still, as suggested, careful discussion about typical fast- and slow components would be helpful to interpret our present findings. Therefore in the revised manuscript, we have added a sentence to explain this issue (p7, lines 211-214).

      (4) p. 8: When the 400 APs protocol is introduced, the corresponding frequency (20 Hz?) should be mentioned. This information comes only much later in the ms.

      We are sorry for our insufficient explanation in the previous manuscript. As suggested, we have clearly written the stimulation frequency ‘20 Hz’ in the main text where the 400 APs protocol first appears (p9, lines 277-278).

      (5) Figure 5, panels B and F: synapto-pHluorin is labelled twice 'synapto-pHluolin'.

      Sorry for careless typos. Now, those are corrected (new Figure 7).

      (6) Legend to Figure 5, last line: 'x' is missing in the last equation.

      Thank you for the careful and kind check. Now, ‘x’ has been added to the last equation in the legend for new Figure 7.

      (7) p. 7, Interpretation of EGTA effects: The authors frame their interpretation of EGTA effects around the distance between release sites and VGCCs. However since AM251 appears to alter the recruitment of SVs, a more parsimonious interpretation would be that EGTA modifies the calciumdependent movement of SVs towards release sites.

      Thank you for suggesting an insightful scenario. We agree that the capacitance jump upon long depolarization pulse would include exocytosis of substantial amount of vesicles which are newly recruited during the Ca<sup>2+</sup> increase. Then, as the reviewer states, EGTA possibly lowers the Ca<sup>2+</sup>dependent replenishment of synaptic vesicles, and this replenishment system might be the target of GPR55 activation. Therefore, we have now clearly added an explanation about this possibility in the text (p15, lines 474-482).

      (8) p. 13, Interpretation of GPR55 sensitive SV pool: The authors suggest a larger distance to VGCCs for this pool compared to naïve SVs. An alternative could be that in the presence of GPR55, the recruitment to release sites would be less efficient.

      This is also an insightful suggestion to speculate the causal relationship between the GPR55mediated reduction of vesicular release and the vesicle pools. Accordingly, we have revised the Discussion (see “Dynamics of synaptic vesicles among distinct functional pools”) by clearly telling about the possibility of decreased recruitment of vesicles to release sites after the GPR55 activation (p15, lines 474-482). By totally considering all the suggested scenario, we believe that the possible mechanisms for GPR55-mediated reduction of release are much more clearly explained in the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) The time course of the agonist-induced suppression should be reported (Figure 1).

      This is an important point to show data clearly, as suggested also by the reviewer 1. Accordingly, we have changed the figure panels to show time courses of agonist-induced suppression (shown in new Figures 1 and 4).  

      (2) Show that the suppression of GABAergic transmission mediated by AM251 and LPI is eliminated in GPR55 KO mice.

      We appreciate the reviewer for putting us to try this important experiment. Owing to the suggestion, we attempted to knock-down the GPR55 expression using CRISPR/Cas9 in cultured Purkinje cells. To avoid potential developmental compensations, here we adopted the CRISPR/Cas9-based genome editing approach, rather than using global knock out mice. Those GPR55-KO cells, as noted above in response to the comment #2 of reviewer #1, showed decreased fluorescent labeling of PC axon terminals to fluorescent-variant of AM251 (shown in new Figure 2) and abolishment of AM251-mediated suppression of vesicle exocytosis (Figure 3D and E). These results are explained in the text p5-6, lines 141-159; p6, lines 173-178.  

      (3) Include references supporting AM251 and LPI as GPR55 agonists and specify the E50 concentrations for each agonist. Furthermore, provide details about the GPR55 antagonist CID16600046.

      As suggested, we have added references regarding GPR55 agonists, AM251 and LPI. In the text, the following information was added: AM251, originally characterized as an inverse agonist for CB1, has also been reported to act as a GPR55 agonist (Ryberg et al., 2007; Henstridge et al., 2009) (p5, lines 115-116). LPI is an established endogenous GPR55 agonist (Oka et al., 2007; Henstridge et al., 2009) (p5, lines 127-129). The reported EC<sub>50</sub> values are ~ 30 nM for LPI (Oka et al., 2007, HEK cell assay) and 39 nM for AM251 (Ryberg et al., 2007, HEK cell assay) (p4, lines 94-95; p5, lines 127-129). Regarding the GPR55 antagonist CID16020046, detailed information (IC<sub>50</sub> = 0.21 µM for GPR55 without significant effect on CB1 receptor) was added in the text with an appropriate citation (Kargl et al., 2013) (p5, lines 123-127). These points have also been added to the Methods section (p17, lines 587-589).

      (4) Regarding the onset delay (Figure 4C; page 8, lines 3-4), consider the following: "AM251 induced a modest yet significant synaptic delay, estimated by the time to the onset of release" (or something similar).

      We thank the reviewer for suggesting helpful explanation. Accordingly, we have changed the sentence to explain the delayed onset (p9, lines 264-265).

      These three points should be properly acknowledged in the Discussion:

      (1) Are action potentials (APs)/depolarizations and ionomycin applications comparable? Ionomycin mediates a large calcium rise significantly slower than the calcium rise mediated by fast depolarization. Such presynaptic calcium dynamics could account, in part, for the different results.

      The qualitative difference of Ca<sup>2+</sup> increase between APs/depolarization-mediated ones and ionomycin-mediated one is an important point. Thank you for pointing out this issue. In the revised manuscript, we have added an explanation about the possible difference arising from the distinct dynamics of Ca<sup>2+</sup> increases caused by direct depolarization of axon terminals or by ionomycin (p14, lines 452-453).

      (2) Previous studies on hippocampal CA3-CA1 pyramidal cell synapses indicate that GPR55 activation enhances glutamate release through presynaptic calcium modulation while diminishing inhibitory postsynaptic strength by reducing GABAA receptors (Sylantyev et al., PNAS 2013; Rosenberg et al., Neuron 2023). In contrast, Inoshita and Kawaguchi discovered that GPR35 suppresses PC-DCN inhibitory transmission by decreasing GABA release without affecting inhibitory postsynaptic strength. Some potential explanation for this discrepancy is warranted.

      We appreciate the reviewer for pointing out this important issue, and feel sorry for not providing an appropriate discussion about the possible interpretation in the previous manuscript. In the revised manuscript, we have added explanations for this discrepancy. First, PC terminals show only limited influence by elevated cytoplasmic Ca<sup>2+</sup> through ER store on GABA release (Gomez et al., 2020) probably due to abundant calbindin. Second, our present data clearly show the GPR55 signals at PC terminals (although indirect, see Figure 2), while hippocampal inhibitory neuronal boutons somehow showed lower GPR55 levels compared with excitatory neuronal boutons (Rosenberg et al., Neuron, 2023). Third, the subtypes and/or anchoring mechanism for postsynaptic GABA<sub>A</sub> receptors might be different between two distinct postsynaptic neurons in the hippocampus and the cerebellum. These factors are now clearly discussed in the text (p12, lines 380-396).

      (3) Earlier work has suggested that CB1 receptor activation can alter the release machinery. Therefore, the observation that GPR55 activation induces changes in the RRP is not entirely surprising.

      As pointed out, previous studies showed that CB1R influences the synaptic release machinery, rather than Ca<sup>2+</sup> influx (Ramirez-Franco et al., 2014). In that context, as the reviewer says, the GPR55-mediated RRP change can be regarded as a similar synaptic modulation mechanism as the CB1-mediated one. However, considering the different downstream signaling pathways, G<sub>12/13</sub>- or G<sub>q</sub>-mediated one and G<sub>i/o</sub>-mediated one, our findings would provide an important scope about the regulation mechanisms of release machinery, which should be further analyzed in the future study. Now we have added these points in discussion (p13-14, lines 435-439).

      (4) Add a section about the limitations of this study (see Weaknesses above).

      As suggested, we have added a section about the limitations of this study at present, which we could not address in the revision and should be addressed in the future (p15, lines 488-508). Particularly, the actual endogenous agonist to activate GPR55, and the physiological situation in which the agonist is produced, much more direct evidence for GPR55 presence at PC boutons, and the downstream mechanisms of GPR55-mediated suppression of GABA release are now clearly notified in that section.

      (5) Double-check grammar and typos ("anandamid").

      We are really sorry for the poor writings in the previous manuscript. Now, we have carefully checked the text.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The "number sense" refers to an imprecise and noisy representation of number. Many researchers propose that the number sense confers a fixed (exogenous) subjective representation of number that adheres to scalar variability, whereby the variance of the representation of number is linear in the number.

      This manuscript investigates whether the representation of number is fixed, as usually assumed in the literature, or whether it is endogenous. The two dimensions on which the authors investigate this endogeneity are the subject's prior beliefs about stimuli values and the task objective. Using two experimental tasks, the authors collect data that are shown to violate scalar variability and are instead consistent with a model of optimal encoding and decoding, where the encoding phase depends endogenously on prior and task objectives. I believe the paper asks a critically important question. The literature in cognitive science, psychology, and increasingly in economics, has provided growing empirical evidence of decisionmaking consistent with efficient coding. However, the precise model mechanics can differ substantially across studies. This point was made forcefully in a paper by Ma and Woodford (2020, Behavioral & Brain Sciences), who argue that different researchers make different assumptions about the objective function and resource constraints across efficient coding models, leading to a proliferation of different models with ad-hoc assumptions. Thus, the possibility that optimal coding depends endogenously on the prior and the objective of the task, opens the door to a more parsimonious framework in which assumptions of the model can be constrained by environmental features. Along these lines, one of the authors' conclusions is that the degree of variability in subjective responses increases sublinearly in the width of the prior. And importantly, the degree of this sublinearity differs across the two tasks, in a manner that is consistent with a unified efficient coding model.

      We thank Reviewer #1 for her/his comments and for placing our work in a broader context.

      Comments:

      (1) Modeling and implementation of estimation task

      The biggest concern I have with the paper is about the experimental implementation and theoretical account of the estimation task. The salient features of the experimental data (Figure 1C) are that the standard deviations of subjects' estimated quantities are hump-shaped in the true stimulus x and that the standard deviation, conditional on the true stimulus x, is increasing in prior width. The authors attribute these features to a Bayesian encoding and decoding model in which the internal representation of the quantity is noisy, and the degree of noise depends on the prior - as in models of efficient coding (Wei and Stocker 2015 Nature Neuro; Bhui and Gershman 2018 Psych Review; Hahn and Wei 2024 Nature Neuro).

      The concern I have is about the final "step" in the model, where the authors assume there is an additional layer of motor noise in selecting the response. The authors posit that the subject's selection of the response is drawn from a Gaussian with a mean set to the optimally decoded estimate x*(r), and variance set to a free parameter sigma_0^2. However, the authors also assume that the Gaussian distribution is "truncated to the prior range." This truncation is a nontrivial assumption, and I believe that on its own, it can explain many features of the data.

      To see this, assume that there is no noise in the internal representation of x, there is only motor noise. This corresponds to a special case of the authors' model in which υ is set to 0. The model then reduces to a simple account in which responses are drawn from a Gaussian distribution centered at the true value of x, but with asymmetric noise due to the truncation. I simulated such a model with sigma_0=7. The resulting standard deviations of responses for each value of x (based on 1000 draws for each value of x), across the three different priors, reproduce the salient patterns of the standard deviation in Figure 1C: i) within each condition, the standard deviation is hump-shaped and peaks at x=60 and ii) conditional on x, standard deviation increases in prior width. The takeaway is that this simple model with only truncated motor noise - and without any noisy or efficient coding of internal representations - provides an alternative channel through which the prior affects behavior.

      Of course, this does not imply that subjects' coding is not described by the efficient encoding and decoding model posited by the authors. However, it does suggest an important alternative mechanism for the authors' theoretical results in the estimation task. Moreover, some of the quantitative conclusions about the differences in behavior with the discrimination task would be greatly affected by the assumption of truncated motor noise.

      Turning to the experiment, a basic question is whether such a truncation was actually implemented in the design. That is, was the range of the slider bar set to the range of the prior? (The methods section states that the size on the screen of the slider was proportional to the prior width, but it was unclear whether the bounds of the slider bar changed with the prior). If the slider bar range did depend on the prior, then it becomes difficult to interpret the data. If not, then perhaps one can perform analyses to understand how much the motor noise is responsible for the dependence of the standard deviation on both x and the prior width. Indeed, the authors emphasize that their model is best fit at α=0.48, which would seem to imply that the best fitting value of υ is strictly positive. However, it would be important to clarify whether the estimation procedure allowed for υ=0, or whether this noise parameter was constrained to be positive (i.e., clarify whether the estimation assumed noisy and efficient coding of internal representations).

      We thank Reviewer #1 for her/his close attention to the motor-noise component of our model, in particular its truncation at the border of the prior. We agree that the truncated motor noise should be examined more closely as it affects the variance of responses. We address here the questions raised by the reviewer, and we detail the new analyses we have conducted.

      First, regarding the experimental paradigm, we note that this truncation was indeed implemented in the design, i.e., the range of the slider bar corresponded to the range of the prior (we now indicate this more clearly in the manuscript). Subjects thus were not able to select an estimate that was not in the support of the prior, and it is precisely for this reason that we model the selection step with a truncated distribution, so that the model is consistent with the experimental setup. This truncation naturally decreases the response variability near the bounds, and this may affect differently the overall variability for the different priors, as noted by the reviewer in her/his simulations. We have conducted a series of analysis to investigate this question.

      First, we consider a model in which there is no cognitive noise, but only motor noise. To answer one of the reviewer’s questions, the model-fitting procedure did allow for a vanishing cognitive noise (𝜈 = 0), i.e., it allowed for such a “motor-noise-only” mechanism to be the main account of the data. This value (𝜈 = 0), however, does not maximize the likelihood of the model, and thus this hypothesis is not the best account of the data. Nevertheless, we fit a model that enforces the absence of cognitive noise (i.e., with 𝜈 = 0). The BIC of this “motor-noise-only” model is higher than that of our best-fitting model by more than 1100, indicating very strong support for the best-fitting model, which features a positive cognitive noise (𝜈 > 0), and 𝛼 = 1/2, as in our theoretical proposal.

      Furthermore, the standard deviation of responses predicted by the motor-noise-only model overestimates substantially the variability of subjects' responses in the Narrow and Medium conditions (Figure 4, panel b), while the predictions of the best-fitting model are much closer to the behavioral data (panel a). Finally, the variances predicted by this model do not increase linearly with the prior width (contrary to the behavioral data). Instead, the variance increases more between the Narrow and the Medium priors than between the Medium and the Wide priors, as the effects of the bounds attenuate with the wider prior (panel c, solid green line).

      To further this analysis we fit in addition a model with no cognitive noise (𝜈 = 0), but in which we now allow the degree of motor noise, 𝜎<sub>0</sub>, to depend on the prior. Our reasoning is that if the truncated motor noise were the sole explanation for the increase in subjects' variance with the prior width, then we would expect the noise levels for the three priors to be roughly equal. We find instead that they are different (with values of 5.9, 8.3, and 9.8, for the prior widths 20, 40, and 60, respectively, when pooling subjects; and when fitting subjects individually the distributions of parameter values exhibit a clear increase; see panels c and d above). This model moreover yields a BIC higher by more than 590 than our best-fitting model. We note in addition that these parameter values differ in such a way that they result in response variances that are a linear function of the prior width, as found in the behavioral data, although they overestimate the subjects' variances (panel c, dotted green line). This linear increase is directly predicted by our best-fitting model, which has one less parameter (2 vs. 3), and which moreover accurately predicts the variability of subjects across priors (panel c, pink line). Hence the data do not support a model with no cognitive noise and with only a constant, truncated motor noise.

      We also consider another possibility, that in addition to truncated motor noise there is in fact a degree of cognitive noise, but one that is insensitive to the width of the prior. In other words, there is cognitive imprecision, but it does not efficiently adapt to the prior range, as in our proposal. This corresponds to setting 𝛼 = 0, in our model; but this specification of the model results in a poor fit, with a BIC higher by more than 300 than that of the best-fitting model, whose cognitive noise scales with the exponent 𝛼 = 1/2, consistent with our theory. Thus our data do not support the hypothesis of a cognitive noise that does not scale with the prior range; instead, subjects' responses support a model in which the variance of the cognitive noise increases linearly with the prior range.

      We note in addition that there is inter-subject variability: different subjects have different degrees of imprecision. But if the source of the imprecision was the truncated motor noise, then different degrees of truncated noise should result in different relationships between the behavioral variance and the prior widths: subjects with smaller noise should be relatively insensitive to the width of the prior, while subjects with greater noise should be more sensitive. In that case, when fitting the subjects with the model in which the imprecision scales as a power of the width, we should expect subjects to exhibit a diversity of best-fitting parameter values 𝛼. Instead, as noted, we find that the data is best captured by a single exponent 𝛼 = 1/2, equal for all the subjects. This suggests that although the “baseline level” of the imprecision may differ per subject, the way that their imprecision increases as a function of the prior width is the same for all the subjects, a behavior that is not explained by truncated noise alone.

      Furthermore, Prat-Carrabin, Harl, and Gershman 2025 present behavioral results obtained in a similar numerosity-estimation task, with the same prior ranges, but with the experimental difference that the slider was not limited to the range of the current prior: instead it had the same width in all three conditions, and covered in all trials a range wider than that of the Wide prior (from 25 to 95). The behavioral variance observed in this study increases linearly with the prior range, as in our results. Thus we conclude that the linear increase in subjects' variability does not originate in the bounds of the experimental slider.

      Finally, Prat-Carrabin et al. 2025 presents an fMRI study involving a similar numerosityestimation experiment. This study shows that numerosity-sensitive neural populations in human parietal cortex adapt their tuning properties to the current numerical range, resulting in less precise neural encoding when the range is wider. This substantiates the notion that the degree of imprecision in cognitive noise adapts to the prior range, as in our proposal.

      Overall, we conclude that the linear increase of behavioral variability that we document originates in the endogenous adaptation, across conditions, of the amount of imprecision in the internal encoding of numerosities.

      We now include these analyses in a new section of the Methods (p. 24-27), which we summarize in the main text (p. 7-8). The Figure above is now included (as Figure 4). We also now cite the references mentioned by Reviewer #1 and which we had not already cited (Bhui and Gershman 2018 Psych Review; Hahn and Wei 2024 Nature Neuro).

      References:

      Prat-Carrabin, A., Harl, M. V., & Gershman, S. J. (2025). Fast efficient coding and sensory adaptation in gain-adaptive recurrent networks (p. 2025.07.11.664261). bioRxiv. https://doi.org/10.1101/2025.07.11.664261

      Prat-Carrabin, A., de Hollander, G., Bedi, S., Gershman, S. J., & Ruff, C. C. (2025). Distributed range adaptation in human parietal encoding of numbers (p. 2025.09.25.675916). bioRxiv. https://doi.org/10.1101/2025.09.25.675916

      (2) Differences across tasks

      A main takeaway from the paper is that optimal coding depends on the expected reward function in each task. This is the explanation for why the degree of sublinearity between standard deviation and prior width changes across the estimation and discrimination task. But besides the two different reward functions, there are also other differences across the two tasks. For example, the estimation task involves a single array of dots, whereas the discrimination task involves a pair of sequences of Arabic numerals. Related to the discussion above, in the estimation task the response scale is continuous whereas in the discrimination task, responses are binary. Is it possible that these other differences in the task could contribute to the observed different degrees of sublinearity? It is likely beyond the scope of the paper to incorporate these differences into the model, but such differences across the two tasks should be discussed as potential drivers of differences in observed behavior.

      If it becomes too difficult to interpret the data from the estimation task due to the slider bar varying with the prior range, then which of the paper's conclusions would still follow when restricting the analysis to the discrimination task?

      There are indeed several differences between the estimation and discrimination tasks that could, in principle, contribute to the quantitative differences observed between them. The fact that the estimation task requires a continuous numerical report whereas the discrimination task involves a binary choice is captured in our model by incorporating distinct loss functions for the two tasks (Eq. 4). This distinction is a key element of the theoretical framework, as it determines the optimal allocation of representational precision. We agree with Reviewer #1 that another important difference is that the estimation task involves non-symbolic dot arrays while the discrimination task uses short sequences of Arabic numerals, which could also affect performance through distinct perceptual or cognitive processes. Although we cannot exclude this possibility, it is unclear why such a difference in stimulus format would produce the specific quantitative patterns that we observe — and that are predicted by our proposal, namely, the sublinear scalings with task-dependent exponents. Each experiment, taken independently, supports the model's central prediction that the precision of internal representations scales sublinearly with the width of the prior distribution. Taken together, the two tasks show that this dependence itself varies with the observer's objective, confirming that perceptual precision is endogenously determined by both the statistical context and the task goal.

      We agree with Reviewer #1 that this point should be mentioned; we now do so in the Discussion (p. 17-18).

      (3) Placement literature

      One closely related experiment to the discrimination task in the current paper can be found in Frydman and Jin (2022 Quarterly Journal of Economics). Those authors also experimentally vary the width of a uniform prior in a discrimination task using Arabic numerals, in order to test principles of efficient coding. Consistent with the current findings, Frydman and Jin find that subjects exhibit greater precision when making judgments about numbers drawn from a narrower distribution. However, what the current manuscript does is it goes beyond Frydman and Jin by modeling and experimentally varying task objectives to understand and test the effects on optimal coding. This contribution should be highlighted and contrasted against the earlier experimental work of Frydman and Jin to better articulate the novelty of the current manuscript.

      We thank Reviewer #1 and we agree that the work of Frydman and Jin is highly relevant to our study. Instead of comparing our contributions to theirs, we have decided to have a close look at their data, in light of our theoretical proposal. This enables us to test the predictions of our theory against human choices made in a rather different decision situation than that of our discrimination task.

      Thus we looked, in their data, at the participants' probability of choosing the risky lottery instead of the certain amount, as a function of the difference between the lottery's expected value (pX) and the certain amount (C; we also added a small bias term to the certain option; such bias was not necessary with our discrimination data, presumably because of the inherent symmetry of our task).

      We find, as did Frydman and Jin, and similarly to our discrimination task, that the participants are more precise when the proposed amounts are sampled from a Narrow prior, in comparison to a Wide prior (see figure above, first panel). But we also find, as in our discrimination task, that when normalizing the value difference by the prior width participants are more sensitive to this normalized difference in the Wide condition than in the Narrow one, suggesting that their imprecision scales across conditions by a smaller factor than the prior width (last panel). And we find, consistent with our discrimination data and with our theory, that choice probabilities in the two conditions match very well when normalizing the difference by the prior width raised to the exponent 3/4 (third panel).

      Model fitting supports this observation. We fit the data to our model (described by Eq. 3), with the addition of a lapse probability and of a bias, and with different values of the exponent 𝛼. The best-fitting model is the one with 𝛼 = 3/4. Its BIC (35,419) is lower than those of the models with 𝛼 = 1, ½, and 0 (by 142, 39, and 514, respectively). It is also lower by 2.14 than a model in which 𝛼 is left as a free parameter (in which case the bestfitting 𝛼 is 0.68, a value not far from 3/4). We emphasize that these BIC values indicate that the hypotheses 𝛼 = 0 and 𝛼 =1 are clearly rejected, i.e., the participants' imprecision increases with the prior width (𝛼 > 0), but sublinearly (𝛼 < 1). In other words, the responses collected by Frydman and Jin in a risky-choice task are quantitatively consistent with our results obtained in a number-discrimination task, and they further substantiate our model of endogenous precision.

      We moreover note that their proposed model is similar to ours, in that the decision-maker is allowed to optimize a noisy encoding scheme to the prior, subject to a ‘capacity constraint’ on the number 𝑛 of encoding signals that can be obtained. Crucially, this capacity constraint is assumed to be a property of the decision-maker that does not change across priors, and thus 𝑛 is fixed across prior widths. Therefore, their model predicts that the participants' imprecision should scale linearly with the prior width (this is also what we obtain in our model if we don’t optimize a similar parameter; see the revised presentation of the model on p. 12-13). We note that when they fit this parameter, 𝑛, separately across conditions, they find that it is larger with the wider prior. This is precisely what our model of endogenous precision predicts. In turn this predicts a sublinear scaling of the imprecision, instead of the linear one that would result from a fixed 𝑛, and indeed we find a sublinear scaling in both their dataset and ours. What is more, in both datasets the sublinear scaling is best captured by the exponent 𝛼 = 3/4, as we predict.

      This analysis of another independent dataset obtained with a different experimental paradigm significantly strengthens our conclusions. Thus we added to the Results section a new subsection discussing this analysis, and the figure above now appears as Figure 3. We also mention it in the Introduction (l. 87-89) and in the Discussion (l. 556-557).

      Reviewer #2 (Public review):

      Summary:

      This paper provides an ingenious experimental test of an efficient coding objective based on optimization as a task success. The key idea is that different tasks (estimation vs discrimination) will, under the proposed model, lead to a different scaling between the encoding precision and the width of the prior distribution. Empirical evidence in two tasks involving number perception supports this idea.

      Strengths:

      The paper provides an elegant test of a prediction made by a certain class of efficient coding models previously investigated theoretically by the authors.

      The results in experiments and modeling suggest that competing efficient coding models, optimizing mutual information alone, may be incomplete by missing the role of the task.

      We thank Reviewer #2 for her/his positive comments on our work.

      Weaknesses:

      The claims would be more strongly validated if data were present at more than two widths in the discrimination experiment.

      We agree that including additional prior widths would allow for a more detailed validation of the predicted scaling law, in particular in the discrimination task. Our design choices across the two experiments reflect a trade-off between the number of prior widths and the number of trials per condition. In the estimation task, we include three widths because this is necessary to identify all three parameters of the model: the variance of the motor noise , the baseline variance of internal imprecision (𝜈<sup>2</sup>), and the scaling exponent (𝛼). Extending both tasks to include additional prior widths would indeed provide a more robust test of the predicted scaling law. We now note this point in the revised Discussion (p. 17).

      A very strong prediction of the model -- which determines encoding entirely from prior and task -- is that Fisher Information is uniform throughout the range, strongly at odds with the traditional assumption of imprecision increasing with the numerosity (Weber/Fechner law). This prediction should be checked against the data collected. It may not be trivial to determine this in the Estimation experiment, but should be feasible in the Discrimination experiment in the Wide condition: Is there really no difference in discriminability at numbers close to 10 vs numbers close to 90? Figure 2 collapses over those, so it's not evident whether such a difference holds or not. I'd have loved to look into this in reviewing, but the authors have not yet made their data publicly available - I strongly encourage them to do so.

      Importantly, the inverse u-shaped pattern in Figure 1 is itself compatible with a Weber's-law-based encoding, as shown by simulation in Figure 5d in Hahn&Wei [1]. This suggests a potential competing variant account, in apparent qualitative agreement with the findings reported: the encoding is compatible with Fisher's law, and only a single scalar, the magnitude of sensory noise, is optimized for the task for the loss function (3). As this account would be substantially more in line with traditional accounts of numerosity perception - while still exhibiting taskdependence of encoding as proposed by the authors - it would be worth investigating if it can be ruled out based on the data gathered for this paper.

      References:

      [1] Hahn & Wei, A unifying theory explains seemingly contradictory biases in perceptual estimation, Nature Neuroscience 2024

      Indeed our efficient-coding model predicts that a uniform should result in a constant Fisher-information function, and we agree with Reviewer #2 that this is at odds with the common assumption that the imprecision increases with the magnitude. To investigate this possibility, we now consider, in the revised manuscript, a more general model of Gaussian encoding, in which the internal representation, 𝑟, is normally distributed around an increasing transformation of the number, 𝜇(𝑥), as

      𝑟|𝑥~𝑁(𝜇(𝑥), 𝜈<sup>2</sup>𝑤<sup>2 𝛼</sup>),

      where the encoding function, 𝜇(𝑥), can be either linear (𝜇(𝑥) = 𝑥) or logarithmic (𝜇(𝑥) = log (𝑥)). This allows us to test whether the data are better captured by a uniform Fisher information (as predicted by the linear encoding under a uniform prior) or by a compressed, Weber-like representation.

      We note, first, that in both tasks our conclusions regarding the dependence of the imprecision on the prior width remain unchanged, whether we choose the linear encoding or the logarithmic encoding. With both choice of encoding, the estimation task is best fit by a model with 𝛼 = 1/2, and the discrimination task by a model with 𝛼 = 3/4, implying a sublinear scaling of the variance with the width of the prior, in quantitative agreement with our theory.

      In the estimation task, the logarithmic encoding yields a significantly lower BIC than the linear one, by more than 380 (see Table 1). The results are less clear in the discrimination task, where the BIC with the logarithmic encoding is lower by 2.1 when pooling together the responses of all the subject, but it is larger by 2.6 when fitting each subject individually. We conduct in addition a “Bayesian model selection” procedure, to estimate the relative prevalence of each encoding among subjects. The resulting estimate of the fraction of the population that is best fit by the logarithmic encoding is 87.6% in the estimation task, and 45.9% in the discrimination task (vs. 12.4% and 54.1% for the linear encoding).

      To further investigate the behavior of subject in the Discrimination task, we look at their proportion of correct choices in the Wide and Narrow conditions, for the trials in which both averages are below the middle value of the prior, and for those in which both are above the middle value. We find no significant difference in the Narrow condition (see Figure below). In the Wide condition, the proportion of correct responses appear larger when the averages are small (with a significant difference when binning together the trials in which the absolute difference between the averages is between 4 and 12; Fisher's exact test p-value: 0.030).

      To complement this analysis, we fit a probit model with lapses, which is equivalent to our Gaussian model with linear encoding, but allowing the noise scale parameter to differ when both averages are above, or below, the middle value of the prior. We fit this model separately in each condition, only on the trials in which both averages are either above or below the middle value; and we test a more constrained model in which the scale parameter is equal for both small and large averages. In the Narrow condition, a likelihood-ratio test does not reject the null hypothesis that the scale parameter is constant (𝜒<sup>2</sup>(1) = 0.026, 𝑝 = 0.87), but in the Wide condition this hypothesis is rejected (𝜒<sup>2</sup> (1) = 7.6, 𝑝 = 0.006). In this condition the best-fitting scale parameter is 29% larger (9.4 vs. 6.3) with the large averages than with the small averages, pointing to a larger imprecision with the larger numbers.

      These results and the prevalence of the Weber/Fechner encoding prompt us to consider, in our efficient-coding model, the hypothesis that a logarithmic compression is an additional constraint on the possible encoding schemes. In our model, the internal representation (𝑟) could take any form as long as its Fisher information verified the constraint in Eq. 5 on the integral of its square-root. We now consider a strong, additional constraint: that over the support of the prior, the Fisher information of the signal must be of the form that one would obtain with a logarithmic encoding, i.e., 𝐼(𝑥) ∝ 1/𝑥<sup>2</sup>. (For the sake of generality we choose this specification instead of directly assuming a logarithmic encoding, because other types of encoding schemes yield a Fisher information of this form, e.g., one with “multiplicative noise” (Zhou et al., 2024); we do not seek, here, to distinguish between these different possibilities). We solve the same efficient-coding optimization problem (Eq. 6), but now with this additional constraint. We find that the resulting optimal Fisher information is approximately:

      , for the estimation task,

      and , for the discrimination task,

      for any 𝑥 on the support of the prior, and where 𝑥<sub>mid</sub> is the middle of the prior and 𝜃 is a constant. These Fisher-information functions differ from the one previously obtained without the additional constraint (Eq. 9), in that they fall off as 1/𝑥<sup>2</sup>, consistent with our additional constraint. However, we note that the dependence on the prior width, 𝑤, is identical: here also, the imprecision is proportional to , in the estimation task, and to 𝑤<sup>3/4</sup>, in the discrimination task.

      In its logarithmic variant (𝜇(𝑥) = log (𝑥)), the Fisher information of the model of Gaussian representations that we have considered throughout is 1/(𝑥 𝜈 𝑤<sup>𝛼</sup>)<sup>2</sup>. It is thus consistent with the predictions just presented, if 𝛼 = 1/2 for the estimation task, and 𝛼 = 3/4 for the discrimination task, i.e., the two values that best fit the data.

      This is precisely the model suggested by Reviewer #2. Overall, we conclude that with both linear and logarithmic encoding schemes, our efficient-coding model — wherein the degree of imprecision is endogenously determined — accounts for the task-dependent sublinear scaling of the imprecision that we observe in behavioral data. As for the imprecision across numbers, a sizable fraction of subjects, particularly in the estimation task, are best fit by the logarithmic encoding, consistent with previous reports that numbers are often represented on a compressed, approximately logarithmic scale. This encoding may itself reflect an efficient adaptation to a long-term environmental prior that is skewed, with smaller numbers occurring more frequently, leading to greater representational precision. This pattern is less clear in the discrimination task. It is possible that the rate at which the precision decreases across numbers itself depends on the task, such that not only the overall level of imprecision, but also its variation across numbers, may be modulated by the task's demands. In this study we have focused on the endogenous choice of the overall precision, but an avenue for future research would be to examine how this adaptation interacts with the detailed shape of the encoding across numbers.

      In the revised manuscript, we have modified the presentation of the model to include the transformation 𝜇(𝑥) (p. 6-7 and 10-11). We have updated accordingly Table 1 (shown above; p. 24), which reports the BICs of all the models for the estimation task (and which now includes the models with logarithmic encoding). There is now a section in the Results dedicated to the question of the logarithmic compression, which includes the efficientcoding model constrained by the logarithmic encoding (p. 15-16). The results on the performance of subjects with larger numbers are presented in Methods (p. 29-31), and mentioned in the main text (p. 14-15). The Methods also provides details about the efficient-coding model with logarithmic encoding (p. 32-33). These results are further commented on in the Discussion (p. 18). Finally, the data and code are now available online at this address: https://osf.io/d6k3m/ , which we note on p. 33.

      Reference

      Zhou, J., Duong, L. R., & Simoncelli, E. P. (2024). A unified framework for perceived magnitude and discriminability of sensory stimuli. Proceedings of the National Academy of Sciences, 121(25), e2312293121. https://doi.org/10.1073/pnas.2312293121

      Reviewer #3 (Public review):

      Summary:

      This work demonstrates that people's imprecision in numeric perception varies with the stimulus context and task goal. By measuring imprecision across different widths of uniform prior distributions in estimation and discrimination tasks, the authors find that imprecision changes sublinearly with prior width, challenging previous range normalization models. They further show that these changes align with the efficient encoding model, where decision-makers balance expected rewards and encoding costs optimally.

      Strengths:

      The experimental design is straightforward, controlling the mean of the number distribution while varying the prior width. By assessing estimation errors and discrimination accuracy, the authors effectively highlight how imprecision adjusts across conditions.

      The model's predictions align well with the data, with the exponential terms (1/2 and 3/4) of imprecision changes matching the empirical results impressively.

      We thank Reviewer #3 for his/her positive comments on our work.

      Weaknesses:

      Some details in the model section are unclear. Specifically, I'm puzzled by the Wiener process assumption where r∣x∼N(m(x)T,s^2T). Does this imply that both the representation of number x and the noise are nearly zero at the beginning, increasing as observation time progresses? This seems counterintuitive, and a clearer explanation would be helpful.

      In the original formulation of the model, indeed both the mean of the representation and its variance are nearly zero when T is also near zero, but in such a way that the Fisher information, 𝑇(𝑚′(𝑥)/𝑠)<sup>2</sup>, is proportional to 𝑇. We note that a different specification, with a mean 𝑚(𝑥) (instead of 𝑚(𝑥)𝑇) and a variance 𝑠<sup>2</sup>/𝑇 (instead of 𝑠<sup>2</sup>𝑇), i.e., 𝑟|𝑥~𝑁(𝑚(𝑥), 𝑠<sup>2</sup>/𝑇), for 𝑇 > 0, would result in the same Fisher information.

      In any event, in the revised manuscript, we now formulate the model differently. Specifically, we assume that the encoding results from an accumulation of independent, identically-distributed signals, but the precision of each signal is limited, and each of them entails a cost. Formally, we posit, first, that the Fisher information of one signal, 𝐼<sub>1</sub>(𝑥), is subject to the constraint:

      This constraint appears in many other efficient-coding models in the literature (Wei & Stocker 2015, 2016; Wang et al. 2016; Morais & Pillow, 2018; etc.), and it arises naturally for unidimensional encoding channels (Prat-Carrabin & Woodford, 2001; e.g., for a neuron with a sigmoidal tuning curve, it is equivalent to assuming that the range of possible firing rates is bounded). Second, we assume that the observer incurs a cost each time a signal is emitted (e.g., the energy resources consumed by action potentials). The total cost is thus proportional to the number of signals, which we denote by 𝑛. More signals, however, allow for a better precision: specifically, under the assumption of independent signals, the total Fisher information resulting from 𝑛 signals is the sum of the Fisher information of each signal, i.e., 𝐼(𝑥) = 𝑛𝐼<sub>1</sub>(𝑥).

      A tradeoff ensues between the increased precision brought by accumulating more signals, and the cost of these signals. We assume that the observer chooses the function 𝐼<sub>1</sub>(.) and the number 𝑛 of signals that solve the minimization problem subject to ,

      where 𝜆 > 0. We can first solve this problem for the Fisher information of one signal, 𝐼<sub>1</sub>(𝑥). In the case of a uniform prior of width 𝑤, we find that it is zero outside of the support of the prior, and

      for any 𝑥 on the support of the prior. This intermediate result corresponds to the optimal Fisher information of an observer who is not allowed to choose the number of signal, 𝑛, (and who receives instead 𝑛 = 1 signal). It is the solution predicted by the efficient-coding models mentioned above, that include the constraint on 𝐼<sub>1</sub>(𝑥), but that do not allow for the observer to choose the amount of signals, 𝑛. With this solution, the scale of the observer's imprecision, , is proportional to 𝑤, and it does not depend on the task — contrary to our experimental results.

      Solving the optimization problem for 𝑛, in addition to 𝐼<sub>1</sub>(𝑥), we find that with a uniform prior the optimal number is proportional to 𝑤 in the estimation task, and to in the discrimination task (specifically, treating 𝑛 as continuous, we obtain ). In other words, the observer chooses to obtain more signals when the prior is wider, and in a way that depends on the task. We give the general solution for the total Fisher information, 𝐼(𝑥) = 𝑛𝐼<sub>1</sub>(𝑥), in the case of a prior 𝜋(𝑥) that is not necessarily uniform:

      where 𝜃 = 𝜆/𝐾. This is of course the same solution that we obtained in the original manuscript.

      We hope that this new formulation of the efficient-coding model will seem more intuitive to the reader (p. 12-13 in the revised manuscript).

      The authors explore range normalization models with Gaussian representation, but another common approach is the logarithmic representation (Barretto-García et al., 2023; Khaw et al., 2021). Could the logarithmic representation similarly lead to sublinearity in noise and distribution width?

      We agree with Reviewer #3 that a common approach when modeling the perception of numbers is to consider a logarithmic encoding. We have conducted several analyzes that examine this proposal. These are presented in detail in our response to a comment of Reviewer #2, above (p. 11-14 of this document). We summarize shortly our findings, here:

      (i) A model with a logarithmic encoding better fits a majority of subjects in the estimation task, but a bit less than half the subjects in the discrimination task.

      (ii) The examination of the performance of subjects in the discrimination task, however, suggests that in the Wide condition they discriminate slightly better the small numbers, as compared to the larger numbers.

      (iii) We consider a constrained version of our efficient-coding model, in which the Fisher information must be consistent with that of a logarithmic encoding (i.e., decreasing as 1/𝑥<sup>2</sup>); we find that the resulting optimal Fisher information depends on the prior width in the same way than without the constraint, i.e., a scaling of the imprecision with , in the estimation task, and with 𝑤<sup>3/4</sup>, in the discrimination task.

      (iv) When considering the model with logarithmic encoding, we find that it best fits the data when its imprecision scales with the width with the same exponents, i.e., , in the estimation task (𝛼 = 1/2), and 𝑤<sup>3/4</sup>, in the discrimination task (𝛼 = 3/4). In other words, the data support the predictions of our theoretical model.

      In the revised manuscript, we have modified accordingly the presentation of the model (p. 6-7 and 10-11), the Tables 1 (p. 24) and 2 (p. 30) which report the BICs. There is now a section in the Results dedicated to the question of the logarithmic compression, including the efficient-coding model constrained by the logarithmic encoding (p. 15-16). The results on the performance of subjects with larger numbers are presented in Methods (p. 29-31), and mentioned in the main text (p. 15-16). The Methods also provides details about the efficient-coding model with logarithmic encoding (p. 32-33). These results are further commented on in the Discussion (p. 18). Finally, we now cite the articles mentioned by Reviewer #3 (Barretto-García et al., 2023; Khaw et al., 2021).

      Additionally, Heng et al. (2020) found that subjects did not alter their encoding strategy across different task goals, which seems inconsistent with the fully adaptive representation proposed here. I didn't find the analysis of participants' temporal dynamics of adaptation. The behavioral results in the manuscript seem to imply that the subjects adopted different coding schemes in a very short period of time. Yet in previous studies of adaptation, experimental results seem to be more supportive of a partial adaptive behavior (Bujold et al., 2021; Heng et al., 2020), which might balance experimental and real-world prior distributions. Analyzing temporal dynamics might provide more insight. Noting that the authors informed subjects about the shape of the prior distribution before the experiment, do the results in this manuscript suggest a top-down rapid modulation of number representation?

      We thank Reviewer #3 for his/her comment and for pointing to these articles. The Reviewer raises several points — that of the dynamics of adaptation, that of the adaptation to the prior, and that of the adaptation to the task. We address each of them.

      To investigate the dynamics of the subjects’ adaptation, we examined separately, in each task, the responses obtained in the trials in the first and second halves of each condition. In the estimation task, the standard deviations of responses, as a function of the presented number and of the prior width, are very similar in the two halves (see Figure 8, panel a). The Bonferroni-Holm-corrected p-values of Levene's tests of equality of the variances across the two halves are all above 0.13, and thus we do not reject the hypothesis that the variance in the first half of the trials is equal to the variance in the second half. Moreover, the variance in both halves appear to be a linear function of the width, rather than the squared width (panel b). We conclude that the behavior of subjects in the estimation task is stable across each experimental condition, including the sublinear scaling of their imprecision.

      In the discrimination task, the subjects' choice probabilities, as a function of the difference between the averages of the red and blue numbers, are similar in the first and second halves of trials (panel c). The Bonferroni-Holm-corrected p-values of Fisher exact tests of equality of proportions (in bins of the average difference that contain about 500 trials each) are all above 0.9, and thus we do not reject the hypothesis that the choice probabilities are equal, in the first and second halves of the trials. Furthermore, the choice probabilities as a function of the absolute average difference normalized by the prior width raised to the exponent 3/4 are all similar, across session halves and across prior widths, suggesting that the sublinear scaling that we find is a stable behavior of subjects (panel d).

      Overall, we conclude that the behavior we exhibit in both tasks is stable over the course of each experimental condition. We note that in both experiments, subjects were explicitly informed of the prior distribution at the beginning of each condition, and each condition included two preliminary training phases that familiarized them with the prior (the specifics for each task are detailed in the Methods section).

      As pointed out by Reviewer #3, Heng et al. (2020) and Bujold et al. (2021) report a partial adaptation of encoding to recently experienced distributions. We note that in our study, a sizable fraction of subjects, particularly in the estimation task, are best fit by the logarithmic encoding. This suggests that, while subjects adapt to the experimental prior, they retain a residual logarithmic compression — an encoding that itself would be efficient under a long-term, skewed prior in which smaller numbers are more frequent. In that sense our findings are thus consistent with the partial adaptation of Heng et al. (2020) and Bujold et al. (2021). At the same time, the same sublinear scaling of imprecision that we find in our study has been obtained in a numerosity-estimation task in which the prior was changed on every trial (Prat-Carrabin et al., 2025), indicating that the adaptation to the prior can occur quickly (on the order of a second) — possibly through a fast top-down modulation of the encoding, as suggested by Reviewer #3. These findings suggest that on a short timescale the encoding adapts efficiently to the prior (as evidenced by the scaling in imprecision), but within structural constraints (the logarithmic encoding).

      Regarding the adaptation to the task, Heng et al. (2020) indeed do not find subjects to be adapting their encoding, across two discrimination tasks (one in which the subject is rewarded for making the correct choice, and one in which the subject is rewarded with the chosen option). A difference with our paradigm is that their task involves simultaneous presentation of two dot arrays, while our discrimination task uses two interleaved sequences of Arabic numerals. More importantly, we do not directly compare the encoding between the estimation and discrimination tasks. Instead, we show that within each task, the adaptation to the prior is quantitatively consistent with the optimal coding predicted for that task's objective, as reflected in the task-specific sublinear scaling exponents. Directly contrasting the encoding across tasks would be a very interesting direction for future work.

      In the revised manuscript, we present the analysis on the stability of subjects’ behavior in the Methods section (p. 29), and we mention it in the main text when presenting the results of the estimation task (p. 5) and of the discrimination task (p. 8-10). In the Discussion, we cite Heng et al. (2020) and Bujold et al. (2021) and comment on the adaptation to the prior and to the task (p. 18).

      Barretto-García, M., De Hollander, G., Grueschow, M., Polanía, R., Woodford, M., & Ruff, C. C. (2023). Individual risk attitudes arise from noise in neurocognitive magnitude representations. Nature Human Behaviour, 7(9), 15511567. https://doi.org/10.1038/s41562-023-01643-4

      Bujold, P. M., Ferrari-Toniolo, S., & Schultz, W. (2021). Adaptation of utility functions to reward distribution in rhesus monkeys. Cognition, 214, 104764. https://doi.org/10.1016/j.cognition.2021.104764

      Heng, J. A., Woodford, M., & Polania, R. (2020). Efficient sampling and noisy decisions. eLife, 9, e54962. https://doi.org/10.7554/eLife.54962

      Khaw, M. W., Li, Z., & Woodford, M. (2021). Cognitive Imprecision and SmallStakes Risk Aversion. The Review of Economic Studies, 88(4), 19792013. https://doi.org/10.1093/restud/rdaa044

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) As mentioned above, the result of inverse u-shaped variability is in strong qualitative agreement with the predictions of a generic Bayesian encoding-decoding model of a flat prior, even under a standard encoding respecting Weber's law, as shown in Figure 5d in: Hahn & Wei, A unifying theory explains seemingly contradictory biases in perceptual estimation, Nature Neuroscience 2024. This paper should probably be cited.

      We now cite Hahn & Wei, 2024. We comment above on our analyzes regarding the logarithmic encoding.

      (2) "Requests for the data can be sent via email to the corresponding author" Why are the data not made openly available? Barring ethical or legal concerns (which are not apparent for this type of data), there is no reason not to make data and code open.

      "Requests for the code used for all analyses can be sent via email to the corresponding author." Same: why not make them open?

      We agree that it is good practice to make the data and code publicly available. They are now available here: https://osf.io/d6k3m/

      Reviewer #3 (Recommendations for the authors):

      The orange dot in Figure 1C does not appear to be described in the figure caption, although an explanation of it is mentioned in the main text.

      We thank Reviewer #3 for pointing out this omission. We now include explanations in the caption.

      I hope the authors will consider making their data publicly available on OSF or another platform.

      The data and code are now publicly available on OSF: https://osf.io/d6k3m/

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to assess the variability in the expression of surface protein multigene families between amastigote and trypomastigote Trypanosoma cruzi, as well as between individuals within each population. The analysis presented shows higher expression of multigene family transcripts in trypomastigotes compared to amastigotes and that there is variation in which copies are expressed between individual parasites. Notably, they find no clear subpopulations expressing previously characterised trans-sialidase groups. The mapping accuracy to these multicopy genes requires demonstration to confirm this, and the analysis could be extended further to probe the features of the top expressed genes and the other multigene families also identified as variable.

      Strengths:

      The authors successfully process methanol-fixed parasites with the 10x Genomics platform. This approach is valuable for other studies where using live parasites for these methods is logistically challenging.

      Weaknesses:

      The authors describe a single experiment, which lacks controls or complementation with other approaches and the investigation is limited to the trans-sialidase transcripts.

      It would be more convincing to show either bioinformatically or by carrying out a controlled experiment, that the sequencing generated has been mapped accurately to different members of multigene families to distinguish their expression. If mapping to the multigene families is inaccurate, this will impact the transcript counts and downstream analysis.

      We thank the reviewer for raising these important points.

      We agree that the analysis of multigene families at the single-cell level is an important question, particularly given the heterogeneity observed across several of them. However, the aim of this short report is not to provide a comprehensive analysis of the entire experiment, but rather to focus on what we consider an important biological phenomenon observed in TcTS genes.

      Regarding the mapping accuracy of the reads, we acknowledge that this can limit the disambiguation of highly similar multicopy transcripts. This is, in fact, a common challenge when analyzing transcriptomic data from T. cruzi.

      To address this issue, we analyzed the sequence identity of the 3′ ends of TcS transcripts (defined as the 3′UTR plus 20% of the CDS region). As shown in Author response image 1, these regions display a median sequence identity of approximately 25%, indicating that sufficient sequence divergence exists for mapping algorithms to use during read assignment.

      In addition, it is important to note that kallisto, the software used in our analysis, was specifically designed to address multimapping reads through pseudoalignment combined with an expectation-maximization algorithm that probabilistically assigns reads across compatible transcripts.

      To directly assess performance, we simulated reads from the T. cruzi transcriptome used in this study (3′UTRs plus 20% of the CDS regions) and compared two mapping/counting strategies: (a) transcriptome pseudoalignment using kallisto, and (b) genome alignment followed by counting using STAR + featureCounts. The latter approximates the strategy implemented in CellRanger, the standard pipeline for quantifying expression levels from 10X Genomics single cell RNA-seq data. We found that kallisto recovered the simulated “true” counts with substantially higher accuracy than STAR + featureCounts (Pearson correlation: all genes, 0.991 vs 0.595; surface protein genes, 0.9996 vs 0.827; trans-sialidase (TcS) genes, 0.9998 vs 0.773). These results indicate that pseudoalignment is currently the optimal strategy for recovering the relative expression of highly similar gene family members (Author response image 1 C).

      Author response image 1

      (A) Distribution of pairwise sequence identity values calculated among the 3′-end regions of all transcripts (defined as the 3′UTR plus 20% of the coding sequence). (B) Distribution of read mapping coordinates over all multigene family transcripts normalized as percentage of the gene length (C) Scatter plots showing the correlation between estimated transcript counts obtained using kallisto (red) and STAR + featureCounts (grey) versus the corresponding simulated ground-truth values.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents a valuable single-cell RNA-seq study on Trypanosoma cruzi, an important human parasite. It investigates the expression heterogeneity of surface proteins, particularly those from the trans-sialidase-like (TcS) superfamily, within amastigote and trypomastigote populations. The findings suggest a previously underappreciated level of diversity in TcS expression, which could have implications for understanding parasite-host interactions and immune evasion strategies. The use of single-cell approaches to delve into population heterogeneity is strong. However, the study does have some limitations that need to be addressed.

      The focus on single-cell transcriptional heterogeneity in surface proteins, especially the TcS family, in T. cruzi is novel. Given the important role of these proteins in parasite biology and host interaction, the findings have potential significance.

      Strengths:

      The key finding of heterogeneous TcS expression in trypomastigotes is well-supported. The analysis comparing multigene families, single-copy genes, and ribosomal proteins highlights the unusual nature of the variation in surface protein-coding genes.

      Weaknesses:

      While the manuscript identifies TcS heterogeneity, the functional implications of the different expression profiles remain speculative. The authors state it may reflect differences in infectivity, but no direct experimental evidence supports this.

      The manuscript lacks any functional validation of the single-cell findings. For instance, do the trypomastigote subpopulations identified based on TcS expression exhibit differences in infectivity, host cell tropism, or immune evasion? Such experiments would greatly strengthen the study.

      We thank the reviewer for their careful reading of the manuscript. We agree that obtaining experimental evidence on the influence of multiple multigene families would represent a significant advancement in the field. However, we would like to emphasize that this study is presented as a short communication centered on a specific and biologically relevant observation within a single multigene family. The aim of the manuscript is to highlight what we consider an important biological phenomenon that raises hypotheses to be tested in future work.

      The influence of phenotypic heterogeneity and its possible advantages under environmental pressures has been previously proposed for Trypanosoma cruzi, related trypanosomatids, and other biological systems, ranging from bacteria to tumors (Seco-Hidalgo 2015, doi: 10.1098/rsob.150190 and Luzak 2021, doi: 10.1146/annurev-micro-040821-012953, for a comprehensive review on this topic). While the reviewer is correct in noting that our model does not demonstrate a functional role for TcTS heterogeneity, the experimental approaches required to address this question in a large multigene family are highly complex. This is particularly challenging in T. cruzi, where the study of multigene families is limited by the restricted set of available molecular biology tools (such as RNAi). Therefore, further experimental validation of these observations falls outside the scope of this short report.

      In this revised version, we have included additional validation and clarification of the results, as well as a more explicit discussion of their limitations. In addition, we present a preliminary analysis exploring potential mechanisms that could coordinate the observed expression patterns of the TcTS family.

      The authors identify a subpopulation of TcS genes that are highly expressed in many cells. However, it is unclear if these correspond to previously characterized TcS members with specific functions.

      The TcS subgroup with a high frequency of detection comprises 31 genes, none of which belong to the catalytically active Group I trans-sialidases. Instead, this subgroup includes members of Groups II, III, IV, V, VI, and VIII. This information has been added to Supplementary Table 3 and is now stated in the revised manuscript.

      The authors hypothesize that observed heterogeneity may relate to chromatin regulation. However, the study does not directly address these mechanisms. There are interesting connections to be made with what they identify as the colocalization of genes within chromatin folding domains, but the authors do not fully explore this. It would be insightful to address these mechanisms in future work.

      In response to the reviewer’s and editorial team’s request for additional mechanistic insight into the regulatory processes that may be involved in the observed patterns, we have expanded the revised manuscript to discuss how the genomic context of TcS loci could contribute to the observed heterogeneity in TcS expression. As noted in the original version of the manuscript, TcS genes and other surface-protein gene families are largely partitioned into discrete genomic compartments, whose expression has been reported to be regulated by epigenetic control of chromatin-folding domains (doi.org/10.1038/s41564-023-01483-y). However, we previously showed that TcS genes detected in a high proportion of cells are, in most cases, dispersed throughout the genome, arguing against a model in which their preferential expression results from colocalization within a small number of ubiquitously activated chromatin domains. In response to the reviewer’s suggestion, we performed a more detailed analysis of the genomic locations of these TcS genes. We found that many of them are localized within the core compartment (new Figure 5). Because the core compartment is enriched for conserved, housekeeping genes that typically display more constitutive expression (doi.org/10.1038/s41564-023-01483-y), whereas the disruptive compartment is enriched for lineage-specific multigene families associated with variable, stage-specific, and recently reported stochastic expression (doi.org/10.1038/s41467-025-64900-2), our results are consistent with a model in which compartment-specific regulatory mechanisms (in addition to post-transcriptional regulation) influence the differential cellular expression of core- versus disruptive-located TcS genes. We have incorporated these results and discussion in the revised manuscript.

      The merging of technical replicates needs further justification and explanation as they were not processed through separate experimental conditions. While barcodes were retained, it would be informative to know how well each technical replicate corresponds with the other. If both datasets were sequenced on the same lane, the inclusion of technical replicates adds noise to the analysis.

      Regarding technical details, we now include the total number of mapped reads and average number of reads mapped per cell (new paragraph in the Methods section.

      The technical replicates consist of a single Illumina library that was sequenced in two separate runs. As this approach is expected to be highly reproducible, we merged both runs into a single count table. To support this decision, we assessed the concordance between the two sequencing runs and observed an almost perfect correlation between them (Author response image 2).

      Author response image 2.

      Correlation analysis of number of reads assigned to cells between technical replicate 1 and technical replicate 2.

      While the number of cells sequenced (3192) seems reasonable, it's not clear how much the conclusions are affected by the depth of sequencing. A more detailed description of the sequencing depth and its impact on gene detection would be valuable.

      We detected a mean of 1088 genes per cell. Based on the 15,319 annotated protein-coding genes in the reference genome, this represents 7.1% of the T. cruzi protein-coding gene complement detected in each cell.

      Across the entire dataset, a total of 14,321 genes were detected in at least one cell, representing 93.5% of all annotated protein-coding genes. This suggests that our experiment captured a broad representation of the parasite's transcriptome.

      This per-cell detection rate is characteristic of droplet-based scRNA-seq and is consistent with other trypanosomatid studies. For example, the T. brucei single-cell atlas (Hutchinson et al., 2021) reported a median detection of 1052 genes per cell. In the case of T. cruzi, the recently published pre-print of the T. cruzi single cell atlas from Laidlaw & García-Sánchez et al. reported a mean between 298 and 928 genes detected per cell (depending on the sample).

      This information is now included in Methods.

      While most of the methods are clear, the way in which the subsampled gene lists were generated could be more thoroughly described, as some details are not clear for the subsampling of single-copy genes.

      The subsampling method was originally described in the Figure 2 legend; to better highlight this approach, we have now moved its description to the Methods section.

      Some of the figures are difficult to interpret. For example, the color scaling in the heatmap of Supplementary Figure 3B is not self-explanatory and it is hard to extract meaningful conclusions from the graph.

      We agree with the reviewer in this assessment. We have now modified the figures to be more self-explanatory and better reflect the conclusions.

      Reviewer #3 (Public review):

      The study aimed to address a fundamental question in T. cruzi and Chagas disease biology - how much variation is there in gene expression between individual parasites? This is particularly important with respect to the surface protein-encoding genes, which are mainly from massive repetitive gene families with 100s to 1000s of variant sequences in the genome. There is very little direct evidence for how the expression of these genes is controlled. The authors conducted a single-cell RNAseq experiment of in vitro cultured parasites with a mixture of amastigotes and trypomastigotes. Most of the analysis focused on the heterogeneity of gene expression patterns amongst trypomastigotes. They show that heterogeneity was very high for all gene classes, but surface-protein encoding genes were the most variable. In the case of the trans-sialidase gene family, many sequence variants were only detected in a small minority of parasites. The biology of the parasite (e.g. extensive post-transcriptional regulation) and potential technical caveats (e.g. high dropout rates across the genome) make it difficult to infer what this might mean for actual protein expression on the parasite surface.

      We thank the reviewer for this important comment, highlighting a central challenge when studying trypanosomatid biology. We acknowledge that in most eukaryotes and particularly in T. cruzi, where there is a predominant role of post-transcriptional regulation, mRNA levels are not always directly correlated with protein abundance, as previously reported by us and others (10.1186/s12864-015-1563-8, 10.1128/msphere.00366-21, 10.1590/S0074-02762011000300002, 10.1042/bse0510031). Nevertheless, steady-state transcript levels obtained by RNA-seq remain informative for assessing differential gene expression, and this approach has been widely used as a proxy for the study of gene expression profiles in T. cruzi (10.7717/peerj.3017, 10.1371/journal.ppat.1005511, 10.1016/j.jbc.2023.104623, 10.3389/fcimb.2023.1138456, 10.1186/s13071-023-05775-4).

      It's also interesting to note that recent proteomic analyses (10.1038/s41467-025-64900-2) have revealed substantial heterogeneity in the expression of surface proteins, including trans-sialidases, supporting the idea that the transcriptional heterogeneity we observe reflects a genuine biological feature that propagates to the protein level.

      We have now added a sentence to the discussion acknowledging this limitation and discussed the results from Cruz-Saavedra, et al. in the revised manuscript.

      (1) Limit of detection and gene dropouts

      An average of ~1100 genes are detected per parasite which indicates a dropout rate of over 90%. It appears that RNA for the "average" single copy 'core' gene is only detected in around 3% of the parasites sampled (Figure 2c: ~100 / 3192). This may be comparable with some other trypanosome scRNAseq studies, but this still seems to be a major caveat to the interpretation that high cell-to-cell variability in gene expression is explained by biological rather than technical factors. The argument would be more convincing if the dropout rates and expression heterogeneity were minimal for well-known highly expressed genes e.g. tubulin, GAPDH, and ribosomal RNAs. Admittedly, in their Final Remarks, the authors are very cautious in their interpretation, but it would be good to see a more thorough discussion of technical factors that might explain the low detection rates and how these could be tested or overcome in future work.

      (2) Heterogeneity across the board

      The authors focus on the relative heterogeneity in RNA abundance for surface proteins from the multicopy gene families vs core genes. While multicopy gene sequences do show more cell-to-cell variability, the differences (Figure 2D) are roughly average Gini values of 0.99 vs 0.97 (single copy) or 0.95 (ribosomal). Other studies that have applied similar approaches in other systems describe Gini values of < 0.2-0.25 for evenly expressed "housekeeping" genes (PMIDs 29428416, 31784565). Values observed here of >0.9 indicate that the distribution for all gene classes is extremely skewed and so the biological relevance of the comparison is uncertain.

      We recognize the limitations imposed by gene dropout in our data, as highlighted by the reviewer. Unfortunately, gene dropout is an inherent limitation of 10x genomics data. Trypanosomatids are not an exception in this regard, and the general metrics of the single-cell RNA-seq data in other reports are equivalent to those obtained in our experiment.

      Despite this important limitation, we believe that our comparative analyses (the contrast between TcS and ribosomal protein expression) provide valuable insights into a biological phenomenon with potential functional relevance for the parasite. Furthermore, we are actively working on generating single-cell RNA-seq data using alternative methodologies that improve gene dropout rates. We anticipate that these future studies will help clarify the extent of the phenomenon described in this work.

      Our results reveal a small subset of TcS genes that are frequently detected across cells, a pattern that is not compatible with random detection unless these genes were highly expressed and preferentially captured by random sampling. However, as shown in Figure 4b, many genes expressed at comparable levels are not detected at high frequencies. In line with this, Figure 4c shows that within individual cells, the detected TcS genes exhibit similar expression levels. Finally, we confirmed that this frequently detected subset shows high read counts at the bulk RNA-seq level (Figure 4 - Figure Supplement 1), consistent with the fact that these TcS are frequent in the population even when they are not specially highly expressed within each cell. Taken together, these findings argue against a purely random sampling of TcS genes and support the interpretation that this pattern reflects an underlying biological feature. We agree that further validation will be required. Accordingly, since the initial submission, we have been careful to frame our conclusions conservatively, explicitly noting that dropout remains a limitation of these data that could influence the observed patterns. In the revised version, we have strengthened this point by including a specific statement in the final remarks. Our interpretation is presented as a working hypothesis that is fully compatible with the observations reported here and may be informative for the field. To better reflect this reasoning, we have revised Figure 4b, expanded the discussion, and explicitly included this limitation in the final remarks of the revised manuscript.

      Nevertheless, this study does provide some tantalising evidence that the expression of surface genes may vary substantially between individual parasites in a single clonal population. The study is also amongst the very first to apply scRNAseq to T. cruzi, so the broader data set will be an important resource for researchers in the field.

      We thank the reviewer for highlighting the relevance of our study and for their positive assessment of the potential significance of these observations. We also agree that the dataset generated here may represent a useful resource for the community.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figures 1c and 1d, it would be useful to include the genes as the plot titles.

      We agree with the reviewer that including gene names in the plot makes the panels more self-explanatory. We have added gene names to the updated version of Figure 1.

      (2) Can you include the read lengths of the sequencing and whether this is sufficient to map accurately to very similar genes of the same multigene family? As stated in the public summary, this would make the data far more convincing as standard 10x chromium cannot distinguish similar gene copies unless a longer read 2 is used. Given that only the 3' end is targeted, is this enough to distinguish the TcS and other mutligene family transcripts?

      We thank the reviewer for raising this important point. We agree that short 3′ biased reads can limit the disambiguation of highly similar multicopy transcripts. This is, in fact, a common challenge when analyzing transcriptomic data from T. cruzi.

      To address this issue, we analyzed the sequence identity of the 3′ ends of TcS transcripts (defined as the 3′UTR plus 20% of the CDS region). As shown in Author response image 1, these regions display a median sequence identity of approximately 25%, indicating that sufficient sequence divergence exists for mapping algorithms to use during read assignment.

      In addition, it is important to note that kallisto, the software used in our analysis, was specifically designed to address multimapping reads through pseudoalignment combined with an expectation-maximization algorithm that probabilistically assigns reads across compatible transcripts.

      To directly assess performance, we simulated reads from the T. cruzi transcriptome used in this study (3′UTRs plus 20% of the CDS regions) and compared two mapping/counting strategies: (a) transcriptome pseudoalignment using kallisto, and (b) genome alignment followed by counting using STAR + featureCounts. The latter approximates the strategy implemented in CellRanger, the standard pipeline for quantifying expression levels from 10X Genomics single cell RNA-seq data. We found that kallisto recovered the simulated “true” counts with substantially higher accuracy than STAR + featureCounts (Pearson correlation: all genes, 0.991 vs 0.595; surface protein genes, 0.9996 vs 0.827; trans-sialidase (TcS) genes, 0.9998 vs 0.773). These results indicate that pseudoalignment is currently the optimal strategy for recovering the relative expression of highly similar gene family members (Author response image 1C).

      The length of the R2 read (91bp) was included in Methods (line 411).

      (3) It is stated that 'single copy' genes also include 'low copy number genes". What does this include exactly? Is it more actuate to say non-surface protein genes?

      The distinction we aim to make is between multigene families and the rest of the genome. Most multigene families encode surface proteins, but not all surface protein genes belong to multigene families. To clarify this point we included a sentence in methods to reflect that when we describe “surface proteins” we are referring to surface proteins coded by multigene families (line 453). In addition, long-read genomic DNA sequencing and assembly have revealed that many genes previously believed to be single-copy are actually duplicated at low copy numbers (doi.org/10.1099/mgen.0.000177). For this reason, we extend the concept of “single-copy” genes to include those that have only a few duplicates.

      (4) It is stated in line 127 that TcS have particular high heterogeneity - it does not look that way by eye compared to the other multigene families. Can statistic be used to prove this, or simply state the decision was made to focus on the TcS?

      As noticed by the reviewer, all multigene families show significantly higher heterogeneity compared to single-copy genes, as stated in the text and shown in figure legends from Figure 2, Supplementary Figure 1 and the new Supplementary Table 2.

      That said, it was not the statistical results that guided our decision to focus on TcS, but rather their well-established biological relevance in T. cruzi. As suggested, we have now emphasized this rationale more clearly in the revised text (lines 160-167).

      Besides, recent work has shown that TcS genes exhibit a bimodal distribution of expression levels using bulk RNA-seq data, in contrast to core genes and other multigene families (doi.org/10.1038/s41467-025-64900-2, doi.org/10.1038/s41564-023-01483-y). This distinct regulatory behavior further justifies our decision to examine TcS separately.

      (5) Expression of different TcS has been investigated between the different life cycle stages for a few individual genes previously (Freitas et al). Can the authors not extend this investigation to all the genes detect by scRNA-seq here to demonstrate those with higher/lower expression in amastigotes vs trypomastigotes building on Figure 2A? Are particular groups linked to either stage?

      We performed this analysis and did not observe any correlation between TcS groups and life cycle stage. In all cases TcS were more frequently detected in trypomastigotes. This difference was statistically significant for all groups except group VII, likely due to the low number of genes analyzed in this group (Author response image 3).

      Author response image 3.

      Per-gene number of expressing cells by TcS group and life-stage. Boxplots show, for each TcS group (I–VIII), the distribution across genes of the number of cells in which the gene is detected. Each point represents a single TcS; Amastigote cells: green points/boxes, Trypomastigote cells: salmon points/boxes. The y-axis is on log10 scale. Asterisks indicate statistically significant differences from the comparison between Amastigote and Trypomastigote within each TcS group, assessed using a paired two-sided Wilcoxon signed-rank test: * p < 0.05, ** p < 0.01, *** p < 0.001.

      (6) What exactly is the Z-score shown in Figure 2B?

      In this analysis num_multigene represents the number of multigene family genes detected in each individual cell. For every cell, we counted how many genes from our predefined multigene family gene list has detectable expression (more than zero UMI counts); in the UMAP plot, this value is reflected by the size of each point. On the other hand, z_multigene captures the relative expression level of multigene family genes within each cell. This metric is calculated by summing the UMI counts of all multigene family genes per cell and then standardizing this value across the dataset using a z-score transformation, such that positive values reflect above-average multigene family expression and negative values reflect below-average levels. In the UMAP plot, this metric determines the color scale of each point. Taking together num_multigene and z_multigene allow us to distinguish cells that express multigene family genes broadly (high gene counts), strongly (high relative expression), both, or neither, and to relate these patterns to identified cell populations.

      We included a short description in legend of the new version of Figure 2 (lines 176-180).

      (7) For the reclustering of trypomastigotes based on TcS genes alone, please show the UMAP and discuss why the resolution giving two clusters is chosen? I assume increasing the resolution does not reveal clusters of cells express one of the 8 groups of TcS for example?

      We appreciate the reviewer’s suggestion. In this analysis, our goal was to test whether the phenotypic heterogeneity previously reported in trypomastigotes could be recapitulated using TcS genes alone, as prior studies described two major transcriptomic phenotypes within this stage.

      Increasing the clustering resolution did not reveal subclusters corresponding to the eight TcS sequence groups. This might reflect the fact that these groups are defined based on sequence similarity rather than on expression patterns, as noted by Freitas et al. (doi:10.1371/journal.pone.0025914).

      (8) In Figure 4B, there may be an upward trend in the level of expression and the number of cells a transcript is detected in? It would be worth showing this is or is not the case with statistics if possible.

      The number of genes detected in a high proportion of cells is low, which limits the statistical power of this analysis. Also, substantial dispersion is observed within the 0-5% interval. Nevertheless, this figure is presented primarily to highlight that a considerable number of highly expressed genes are detected in only a small fraction of cells. If expression level were the main determinant of detection frequency across cells, one would expect very few highly expressed genes to fall within the 0-5% interval. Contrary to this expectation, among the 50 highest expressed TcS genes, 62% are detected in fewer than 5% of cells, and even among the top 10 most highly expressed TcS genes, 40% fall within this lowest detection group. To facilitate this interpretation, we modified the figure (new Figure 4b) to explicitly highlight the top 50 most expressed TcS genes and incorporated this discussion into the main text of the revised manuscript (lines 244-251), making the conclusion clearer to the reader.

      (9) Do the cells group instead by expression of any of the other multigene families not investigated in detail?

      It is possible that additional transcriptional substructure among trypomastigotes is driven by the expression of other multigene families beyond TcS. In this short report (with limited number of figures, words, etc.), we focused specifically on the trans-sialidase family as discussed earlier. A more comprehensive analysis including other large surface gene families (MASPs, mucins, GP63) is planned as part of ongoing work and will be presented in future reports.

      Reviewer #2 (Recommendations for the authors):

      This reviewer suggests the conduction of functional experiments in follow-up studies to establish links between TcS expression profiles and parasite behavior and into potential regulatory mechanisms responsible for the observed TcS heterogeneity, particularly focusing on epigenetic modifications. It would be interesting to correlate the highly expressed TcS members identified here with previously characterized TcS isoforms and provide more description regarding which particular groups and TcS members are driving the findings. It would benefit from further clarification regarding sequencing depth, technical replication merging, subsampling, and specific parameters for alignment methods and more information regarding the specific statistical tests and their applicability to the data.

      This is a promising single-cell study with potentially high significance. The manuscript is well-written, and the analyses are reasonably well-executed. However, the current manuscript is limited by a lack of functional validation and mechanistic insights. The addition of further analyses and experiments, as suggested, will strengthen the conclusions and increase the impact of the work.

      We thank the reviewer for their careful reading of the manuscript. As suggested, we have performed additional validation and clarification of the results, as well as a more explicit discussion of their limitations. In addition, we have included a preliminary analysis exploring potential mechanisms that could be coordinating the observed expression patterns of the TcS family (see below). Even though we consider relevant and interesting to experimentally validate these results, given the inherent difficulties in studying multigene families in T. cruzi, an organism with a very limited set of molecular biology tools (such as RNAi), further experimental validation of these observations is outside of the scope of this short report.

      Regarding the reviewer’s question, we studied if any TcS subgroup could be driving our observations. However, we did not find any correlations indicating that a particular group was associated with any of our findings. We now include TcS group information to Supplementary Table 3.

      Regarding technical details, we now included the total number of mapped reads (line 422) and average number of reads mapped per cell (new paragraph in the Methods section, line 432-436).  

      The technical replicates consist of a single Illumina library that was sequenced in two separate runs. As this approach is expected to be highly reproducible, we merged both runs into a single count table, as stated in line 424. To support this decision, we assessed the concordance between the two sequencing runs and observed an almost perfect correlation between them (Author response image 2).

      The subsampling method was originally described in the Figure 2 legend; to better highlight this approach, we have now moved its description to the Methods section (line 456).

      The specific kallisto parameters used are stated in Methods (line 418-419). We now included that default options were used unless otherwise specified (line 419-420).

      In response to the reviewer’s and editorial team’s request for additional mechanistic insight into the regulatory processes that may be involved in the observed patterns, we have expanded the revised manuscript to discuss how the genomic context of TcS loci could contribute to the observed heterogeneity in TcS expression. As noted in the original version of the manuscript, TcS genes and other surface-protein gene families are largely partitioned into discrete genomic compartments, whose expression has been reported to be regulated by epigenetic control of chromatin-folding domains (doi.org/10.1038/s41564-023-01483-y). However, we previously showed that TcS genes detected in a high proportion of cells are, in most cases, dispersed throughout the genome, arguing against a model in which their preferential expression results from colocalization within a small number of ubiquitously activated chromatin domains. In response to the reviewer’s suggestion, we performed a more detailed analysis of the genomic locations of these TcS genes. We found that many of them are localized within the core compartment (new Figure 5). Because the core compartment is enriched for conserved, housekeeping genes that typically display more constitutive expression (doi.org/10.1038/s41564-023-01483-y), whereas the disruptive compartment is enriched for lineage-specific multigene families associated with variable, stage-specific, and recently reported stochastic expression (doi.org/10.1038/s41467-025-64900-2), our results are consistent with a model in which compartment-specific regulatory mechanisms (in addition to post-transcriptional regulation) influence the differential cellular expression of core- versus disruptive-located TcS genes. We have incorporated these results and discussion in line 301-313 of the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors consistently refer to gene "expression" but somewhere they should acknowledge that in trypanosomes RNA abundance is less predictive of protein than in most other organisms.

      We thank the reviewer for this important comment, highlighting a central challenge when studying trypanosomatid biology. We acknowledge that in most eukaryotes and particularly in T. cruzi, where there is a predominant role of post-transcriptional regulation, mRNA levels are not always directly correlated with protein abundance, as previously reported by us and others (10.1186/s12864-015-1563-8, 10.1128/msphere.00366-21, 10.1590/S0074-02762011000300002, 10.1042/bse0510031). Nevertheless, steady-state transcript levels obtained by RNA-seq remain informative for assessing differential gene expression, and this approach has been widely used as a proxy for the study of gene expression profiles in T. cruzi (10.7717/peerj.3017, 10.1371/journal.ppat.1005511, 10.1016/j.jbc.2023.104623, 10.3389/fcimb.2023.1138456, 10.1186/s13071-023-05775-4).

      It's also interesting to note that recent proteomic analyses (10.1038/s41467-025-64900-2) have revealed substantial heterogeneity in the expression of surface proteins, including trans-sialidases, supporting the idea that the transcriptional heterogeneity we observe reflects a genuine biological feature that propagates to the protein level.

      We have now added a sentence to the discussion acknowledging this limitation and discussed the results from Cruz-Saavedra, et al. in linea 266-271 of the revised manuscript.

      (2) Line 29, in the abstract there is a strong statement that T. cruzi "does not employ antigenic variation". I don't think there is much evidence either way if we are thinking about antigenic variation in the broad sense rather than the extreme model of T. brucei VSG switching. Later in the abstract they state that "no recurrent combinations of TcS genes were observed between individual cells in the population", which sounds very much like a form of antigenic variation.

      We agree with the reviewer. Indeed, we meant to state that T. cruzi does not employ an antigenic variation mechanism such as the one from T. brucei. We change this statement as suggested in lines 28 - 32.

      (3) Line 29, "relies on a diverse array of cell-surface-associated proteins encoded by large multi-copy gene families (multigene families) essential for infectivity and immune evasion" and lines 55-58 "T. cruzi infection relies on a heterogeneous set of membrane proteins, encoded mainly by large multigene families ... most of which are involved in infection, tropism, and immune evasion". It would be worth adding a bit more detail on the nature and strength of the evidence that Tc "relies on" these various genes or that they are "essential" for infectivity, tropism, and immune evasion.

      Because the journal’s short format imposes word limits, we strengthened the original statement by adding specific references that document genomic, transcriptomic and functional evidence linking the major multigene families to infectivity, tropism and immune evasion (doi.org/10.1371/journal.pone.0025914; doi.org/10.1038/nrmicro1351; doi.org/10.1128/iai.05329-11; doi.org/10.1093/nar/gkp172, doi.org/10.1371/journal.ppat.1006767), in line 77.

      (4) Line 89, 1088 genes detected per cell - what is this as a % of genes in the genome?

      We detected a mean of 1088 genes per cell. Based on the 15,319 annotated protein-coding genes in the reference genome, this represents 7.1% of the T. cruzi protein-coding gene complement detected in each cell.

      Across the entire dataset, a total of 14,321 genes were detected in at least one cell, representing 93.5% of all annotated protein-coding genes. This suggests that our experiment captured a broad representation of the parasite's transcriptome.

      This per-cell detection rate is characteristic of droplet-based scRNA-seq and is consistent with other trypanosomatid studies. For example, the T. brucei single-cell atlas (Hutchinson et al., 2021) reported a median detection of 1052 genes per cell. In the case of T. cruzi, the recently published pre-print of the T. cruzi single cell atlas from Laidlaw & García-Sánchez et al. reported a mean between 298 and 928 genes detected per cell (depending on the sample).

      This information is now included in Methods (line 435).

      (5) Line 93-94, how many cells were assigned to clusters 0 and 1?

      Cluster 0 had 2201 cells and cluster 1 had 824 cells assigned.  We have now included these specific numbers in new version of the manuscript (line 114).

      (6) Line 96, cluster 2 ama-trypo transitioning parasites - were these observable by microscopy?

      We did not perform microscopy specifically to observe or quantify the putative ama/trypo transitioning subpopulation: microscopy was only used as a pre-experiment quality check to verify cell morphology and viability. The inference that cluster 2 reflects ama/trypo transitioning parasites is drawn from the transcriptomic profile (particularly from the pattern of stage-associated marker expression observed in that cluster) and should be considered a hypothesis generated by the data, that merits further analysis, as stated in the manuscript.

      (7) Line 106-107, "As expected, single-copy gene expression is high in both amastigotes and trypomastigotes and similar on average between both cell types".

      (8) Why as expected? For a broad journal it would be useful to explain this. Amastigotes are replicative and trypomastigotes are not, so would we not expect to see some differences that reflect this?

      (9) What do you mean by the expression being "high"? High compared to what?

      (10) "Similar on average between both cell types". This does not seem concordant with Figure 1a showing a highly significant difference between ama and trypo.

      We thank the reviewer for this helpful request for clarification for broader readers and the observations regarding global expression of single copy and multigene family genes.

      Figure 2a is intended as an experimental control where we show that our 10X Genomics data shows the previously reported upregulation of surface protein genes in trypomastigotes. We have now modified the text in order to highlight this (line 129). In turn, Supplementary Figure 1a is shown as a control that this upregulation is not a general feature of trypomastigote cells.

      Regarding comment 9, what we meant is that single-copy genes display relatively high expression in both amastigotes and trypomastigotes compared with surface protein-coding genes (see expression values in Figures 2a and Supplementary Figure 1a).

      Finally, differential expression between amastigotes and trypomastigotes at the transcriptomic level has been previously studied and has shown that most single copy genes do not show variation, explaining the overall pattern of Supplementary Figure 1a where average expression is similar between stages (mean fold change = 1.1). This is likely due to the fact that these genes are related to basic cellular functions. Genes related to stage specific functions such as replication in amastigotes or normalization effects may be causing the slight, but statistically significant increase observed in overall expression in amastigotes. This contrasts with the pattern observed for multigene families where there is a clear overexpression in trypomastigotes (mean fold change = 1.5).

      As observations commented on questions 9 and 10 have been described in previous studies and are not novel nor key points in our results, we decided not to focus on them and modified the text accordingly in lines 129-135.

      (11) Line 110, "with high variation". What does "high variation" mean here? Compared to what? For the two metrics (n cells +ve for each gene and total expression level) can they give an average and the SD? It would be useful to know how many parasites the "average" surface (and core) gene is expressed in, or more precisely for which the RNA is above the limit of detection.

      We refer to the comparison with the expression profile observed for single-copy genes. This point has now been clarified in the text, and we have included the mean and standard deviation for both TcS multigene family genes and single-copy genes in trypomastigotes for both metrics in the Figure 2 legend. The average and distribution of the number of cells in which each gene is detected are shown in Figure 2c and Supplementary Figure 1a. We also added a reference to this panel at the point in the text where the phenomenon is first described.

      (12) Line 134, Figure 2b legend needs more detail - what are num_multigene and z_multigene?

      Please see our response to Reviewer 1, Question 6. We have now added a clarification to the legends of Figure 1 and Supplementary Figure 1.

      (13) Figure 2c, correct the y-axis legend because it implies your values are log10 transformed. Also, it would be useful to have more markers on the y axis so the reader can better estimate the data ranges.

      We thank the reviewer for this observation. We have now corrected the y-axis label and markers.

      (14) If the y-axis of Figure 2D started at 0 instead of 0.8 and if Lorenz curves were provided then the reader would probably get a fuller sense of the expression heterogeneity in the dataset. The legend states the differences are statistically significant but the actual p-values are not shown.

      (15) Line 142-3, more precision is needed on the p-values.

      We thank the reviewer for this helpful suggestion. We agree that Lorenz curves provide a clearer representation of expression heterogeneity than the previous plot. Accordingly, we have replaced the original panel (Figure 2d) with Lorenz curves for the groups under comparison, and have made the same change in Supplementary Figure 1d. In addition, we have included gini index values and p-values for all comparisons in Supplementary Table 2.

      (16) Figure 3, as in Figure 1a it would be useful to add another UMAP plot to show the two trypo subpopulations.

      We thank the reviewer for this suggestion. We have now updated Figure 3 to include a UMAP plot showing the two trypomastigote subpopulations.

      (17) What is the observed proportion of broad vs slender trypomastigote morphologies for Dm28c? To be consistent with the speculation at line 162 then wouldn't it need to be approximately 50-50?

      The proportions of each trypomastigote subpopulation in the DM28c strain are currently unknown. The only available relevant data come from Brener, 1965 (doi.org/10.1080/00034983.1965.11686277), in which this strain was not included. In the strains analyzed in that study, the relative proportions of broad and slender trypomastigote morphologies were highly variable: across seven strains, broad forms ranged from 18.0% to 77.3%, while slender forms ranged from 2.3% to 71.6%. Given this wide variability and the lack of DM28c-specific data, we cannot assume any expected proportion for this strain.

      (18) Line 170, please state how many genes are in the TcS subgroup mentioned here. This is an interesting finding - does this include mostly catalytically active trans-sialidase genes or is it a mixture from across all the subfamilies?

      The TcS subgroup with a high frequency of detection comprises 31 genes, none of which belong to the catalytically active Group I trans-sialidases. Instead, this subgroup includes members of Groups II, III, IV, V, VI, and VIII. This information has been added to Supplementary Table 3 and is now stated in the revised manuscript (lines 227 - 228).

      (19) Line 175-176, "Gene dropouts might favor random patterns of gene family's detection in scRNA-seq experiments, particularly affecting genes with low expression" - I'm not sure if the authors mean the detection of a gene (or not) in an individual parasite is truly random (pure luck) or whether the term stochastic would be more appropriate because they seem to be referring to randomness around a certain threshold of RNA abundance/stability? They go on to rule this out, at least for TcS genes, essentially arguing that they have something resembling an ON or OFF pattern rather than a spectrum of expression levels. This is potentially very important and could advance the field in a major way, but the fact that so many core and ribosomal genes, which 'should' be always ON, cannot be detected in most cells is a concern. A version of Figure 4B for core and ribosomal genes could be informative - do they show a different pattern to TcS?

      Our results reveal a small subset of TcS genes that are frequently detected across cells, a pattern that is not compatible with random detection unless these genes were highly expressed and preferentially captured by random sampling. However, as shown in Figure 4b, many genes expressed at comparable levels are not detected at high frequencies. In line with this, Figure 4c shows that within individual cells, the detected TcS genes exhibit similar expression levels. Finally, we confirmed that this frequently detected subset shows high read counts at the bulk RNA-seq level (Supplementary Figure 2), consistent with the fact that these TcS are frequent in the population even when they are not specially highly expressed within each cell. Taken together, these findings argue against a purely random sampling of TcS genes and support the interpretation that this pattern reflects an underlying biological feature. We agree that further validation will be required. Accordingly, since the initial submission, we have been careful to frame our conclusions conservatively, explicitly noting that dropout remains a limitation of these data that could influence the observed patterns. In the revised version, we have strengthened this point by including a specific statement in the final remarks. Our interpretation is presented as a working hypothesis that is fully compatible with the observations reported here and may be informative for the field. To better reflect this reasoning, we have revised Figure 4b, expanded the discussion, and explicitly included this limitation in the final remarks of the revised manuscript.

      (20) Line 238-9, Add details of removing extracellular epimastigotes after cell infections.

      Only cellular trypomastigotes collected from the supernatant on day 6 were used for the secondary infection, at a 10:1 parasite-to-cell ratio. After 24 hours, the cultures were washed twice with PBS to remove any remaining extracellular parasites. Under these conditions, i.e. using exclusively trypomastigotes, at this infection ratio, and maintaining the cultures in mammalian medium, we do not expect the presence or survival of extracellular epimastigotes. We have included a sentence in the Methods section clarifying this information in the revised version of the manuscript, line 382.

      (21) Line 260, was methanol used to directly resuspend the parasite pellet, or was it resuspended first e.g. in a small volume of PBS?

      As described in lines 250-257 of the original manuscript, parasites were washed and resuspended in DPBS before methanol fixation. Methanol fixation was then carried out according to the 10X Genomics Methanol Fixation Protocol. We have now emphasized this more clearly in the revised text in line 400.

      (22) What was the doublet rate?

      We identified and removed 41 doublets, all belonging to cluster 2, and retained 3,151 singlets for downstream analysis (total cells before removal = 3,192). The resulting doublet rate was 1.28%. We have included a sentence in the Methods section clarifying this information in the revised version of the manuscript, line 439 -440.

      (23) What was the frequency of rRNA and kDNA-derived reads?

      Approximately 4.02% of the reads were derived from kDNA sequences, while 1.10% corresponded to rRNA-derived reads (Author response image 4).

      Author response image 4.

      Percentage of mitochondrial and ribosomal rRNA derived reads.

    1. Author response:

      Reviewer #1 (Public review):

      We thank the reviewer for the thoughtful and detailed evaluation of our manuscript. We are pleased that the continuous-time formulation and its methodological contributions were viewed as elegant and broadly applicable, and that the empirical analyses provide meaningful new insights into neural variability across the visual hierarchy. We appreciate the reviewer’s constructive suggestions and clarifications, which will help us improve the precision, clarity, and scope of the manuscript. Below we respond to each point in turn and outline the revisions we will make.

      (1) Extension to neural populations: We thank the reviewer for this important suggestion. We agree that extending the framework to population recordings is a natural next step. In this work, we focus on single-cell data to establish the model and validate inference. In the revised manuscript, we will expand the Discussion to outline how the framework could be generalized to population activity, for example by incorporating shared latent-variable structure.

      (2) Clarification regarding the Modulated Poisson model: We thank the reviewer for pointing this out. We agree that our description was not sufficiently precise and may have been unclear. The modulated Poisson model introduced in Goris et al. (2014) is indeed a generative process model that can be used to generate spike trains, and we apologize for the inaccurate characterization of this framework. Our intended point was that the original formulation assumes gain is constant within a trial (or counting window) and does not provide a principled mechanism for modeling continuously time-varying gain fluctuations within trials. In the revised manuscript, we will clarify this distinction and revise the relevant passages accordingly. We will also cite and discuss related extensions and analyses in Goris et al. (2018) and Hénaff et al. (2020) to provide a more accurate and complete characterization of prior work.

      (3) Continuous extensions of the Goris model: We thank the reviewer for this helpful clarification. We agree that the Goris model is not limited to homogeneous Poisson spiking and can incorporate a stimulus-dependent, time-varying firing rate within trials. We did not intend to imply otherwise, and we will revise the relevant text to avoid this misunderstanding. Our intended point was that, in formulating continuous-time extensions, we explicitly model the time-varying stimulus drive using a GP prior, as in the CMP framework, and then consider different assumptions about the temporal structure of the gain process, including constant and finely sampled gain. This highlights the distinction between piecewise-constant gain assumptions and the fully continuous gain process introduced in our model. We will clarify this distinction in the revised manuscript. We will also acknowledge related variants explored in Hénaff et al. (2020) and more clearly describe how our formulation differs, including the role of smoothness priors on the stimulus drive and gain processes.

      (4) Continuous-time extension: We thank the reviewer for the positive comment and are pleased that the continuous-time formulation was viewed as elegant.

      (5) Parameter recovery analysis: We thank the reviewer for emphasizing the importance of this result. We agree that demonstrating parameter recoverability is foundational to the paper. In the revised manuscript, we will move the Appendix 3 analysis into the main Results section and clearly illustrate how our inference procedure faithfully recovers the generative parameters in simulation studies.

      (6) Validation of gain–stimulus separation: We thank the reviewer for this insightful suggestion. We agree that verifying that the inferred gain does not capture stimulus-driven structure is an important validation of the model. In the revised manuscript, we will compute the trial-averaged inferred gain, to assess whether it exhibits systematic temporal structure. This analysis will provide an additional check that the partitioning between stimulus drive and gain fluctuations operates as intended.

      (7) Temporal evolution of gain variability: We thank the reviewer for this valuable suggestion. We agree that examining whether gain variability decreases following stimulus onset is an important and relevant analysis. In the revised manuscript, we will compute the temporal evolution of cross-trial gain variability from the inferred gain traces and assess whether a quenching effect is observed after stimulus onset. If present, we will report and illustrate this result.

      (8) Clarification of Baseline Poisson and Poisson-GP models: We thank the reviewer for this careful reading. Yes, this understanding is correct. The Baseline Poisson model uses a stimulus-conditioned PSTH as an estimate of the time-dependent firing rate and includes a Gamma prior to regularize rate estimates in conditions with sparse repeats. The Poisson-GP model retains the same structure but models the time-dependent firing rate using a stimulus-specific Gaussian process prior, which substantially improves goodness-of-fit. In the revised manuscript, we will clarify this description. We will also highlight that Figure 4 – figure supplement 2 illustrates how introducing a GP smoothness prior on the stimulus drive markedly improves model fit, even within the Goris-style model.

      Reviewer 2 (Public review):

      We thank the reviewer for the thoughtful and positive assessment of our work. We are pleased that the model development, empirical analyses, and presentation were found to be clear and rigorous. We appreciate the recognition that the continuous-time formulation meaningfully extends prior variability-partitioning approaches and enables a more precise characterization of how stimulus drive and internal gain dynamics evolve across temporal scales. We are also encouraged that the cross-area analyses and model comparisons were viewed as providing new insights and clear empirical improvements. Below, we address the specific suggestions raised by the reviewer.

      Positioning relative to prior work: Regarding the comment on incremental contribution, we agree that our framework builds directly on earlier variability-partitioning approaches. Our goal was to extend these models to continuous time and to develop a principled inference framework capable of characterizing how gain dynamics evolve across temporal scales. We will further clarify this positioning in the revised manuscript.

      Extension to sub-Poisson variability: We thank the reviewer for this suggestion. We agree that sub-Poisson variability is an important phenomenon observed in neural data. Because the CMP model builds on a Poisson observation model with stochastic gain modulation, it naturally captures Poisson and super-Poisson variability but cannot generate sub-Poisson spike count statistics in its existing form. We will clarify this limitation in the revised manuscript and expand the Discussion to outline potential extensions that could address sub-Poisson variability, such as incorporating spike-history effects, renewal-process models, or alternative count distributions.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      …It is unclear whether there are any systematic changes in preferences over the course of testing that could explain the observed changes in correlation with neural responses, such as changes due to learning (e.g., flavor nutrient conditioning, relief of neophobia), changes in deprivation state, or habituation to/proficiency with the BAT setup.

      For the revision, we will add analysis (including either additional panels for Figure 3 or as a new Figure between what are now Figures 3 & 4) testing the hypothesis that preference changes across testing days are non-random. Concretely, we will test: 1) whether the preference for palatable tastes increase with experience (a result that would make sense given research on neophobia; 2) whether the preference for aversive tastes decrease with experience; and 3) whether absolute consumption of any particular taste changes in a reliable direction from session to session.

      A secondary point is whether any changes in preference are attributed to internal individual versus external contextual factors. Both types of variation (i.e., across individuals and across time within an individual) are mentioned in the introduction, but it is not clear what the authors believe about the nature or neural representation of these sources of variation.

      While we assume that differences between rats are due to internal factors (given the controlled home-cage environment), we can’t be sure that some subtle, subthreshold (for us as observers) factor impacts taste preferences. Similarly, while changes across time within an individual is categorically within the individual, we cannot be sure whether some subtle facet of their experiences determines how preferences change (as opposed to it being purely internal). We will add prose to the Discussion session on this topic—including citation of Hilary Schiff’s recent work showing nurture-related preference changes as part of this new prose.

      With respect to neural data analysis, no individual animal/day data are shown, making it difficult to assess the extent to which differences in correlation match individual differences in preferences and/or changes in preference with time within individuals.

      The revision will include Figure panels (with analysis) showing the relationships between individual neural responses and consumption in the first and last BAT tests for 1-2 representative rats.

      The correlation analysis is also lacking control for the fact that there is a certain degree of "chance" associated with behavioral and neural measures having matching ranks.

      Certainly chance cannot explain our results, which consist mainly of within-rat differences in match (i.e., specific enhancement of that match for the most recent behavioral assessment)—a finding that is all the more surprising given that: 1) 2 weeks separate that behavior test and the electrophysiology session; and that 2) that 2-week gap is only 1-3 days less than the gap using the first behavioral test (that reliably correlates less well with the neural data). Nonetheless, we will add an independent, convergent analysis to the revision, testing whether the observed pattern vanishes when we shuffle the preference ranks in the behavioral data—if the result is based on chance, this shuffling should have no impact on the neural-behavioral match.

      Finally, …it is unclear to what extent changes in correlation may be attributed to overall changes in responsiveness of the neural population.

      We will include a new analysis in the revision testing the hypothesis that the reduction in match between the neural and behavioral rankings reflects changes in neural excitability—spontaneous and taste-driven—between the first and second electrophysiology sessions.

      Reviewer #2 (Public review):

      The manuscript could use additional corollary analyses to provide a more complete picture of the phenomenon. For instance, how many neurons (per animal and in total) have significant correlations with the final BAT patterns? And with the first BAT? Can a time course of such counts be provided? Can some decoding analyses be performed at a single session level to reconstruct a rat's behavioral preference pattern from its neural activity?

      These are all really good ideas. We are in the process of implementing all but the last; we will attempt the last as well, but can’t promise that we have large enough ensembles to provide stable results of such a subtle decoding task (reflecting the last BAT session’s preference pattern significantly better than the first session’s pattern).

      The manuscript could benefit from additional polishing, both in the text as well as in the figures.

      It is being done, on the basis of suggestions made by R2 in the non-public comments.

      Reviewer #3 (Public review):

      Without a behavioral measure collected after recording day 1 intraoral exposure, it is not possible to determine whether taste preference was altered by that experience…The authors' conclusion would be strengthened by adding an intervening brief access test between recording days 1 and 2.

      We very much appreciate Reviewer 3’s suggestion, but the primary authors involved in data collection on this project have moved on, and we won’t be able to collect the additional dataset that would be required. Instead, we will soften the conclusion that we reach in the last section, and suggest this experiment as a future direction.

      The current experimental design exposes animals to 3 distinct sets of substances … [that] differ in identity … and concentration. Because palatability is known to be comparative depending on the other substances available and concentration-dependent, this introduces challenges to interpretation, [and] without more clarity, it is difficult to evaluate whether the interaction of different tastes within the sets of stimuli biases the main conclusions.

      This is an interesting point. We hope that some of the work that we are undertaking in response to Reviewers 1 & 2 (see above) will shed light on whether there is any non-randomness in between-session preference changes; such non-randomness would imply that we might want to conclude that preferences change more with one battery than another. But we will perform a more direct test of this hypothesis, breaking the dataset apart and asking whether our phenomena are observed more with one battery than another. If it turns out that the magnitude of the impact of experience does depend on the nature of the taste battery (we predict not, for reasons that are in the manuscript), we shall introduce that complexity into our interpretation, and the Discussion thereof.

      Responses to sweet tastes are not reported in the electrophysiology data. This is seemingly the case because rats given set 1 received no sweet stimulus while rats given set 2 received to 2 distinct sweet tastes. Finally, rats given set 3 did not receive quinine, yet quinine is reported in electrophysiology data.

      We are unsure of the source of this confusion—in every case, the rat received the same tastes in the electrophysiology sessions that were delivered in the BAT preference tests—but we will modify the text to ensure: 1) that panels reflecting data from a single rat (panels that will therefore necessarily include only a subset of possible tastes) are clearly marked as such; and 2) that the nature of which taste batteries were delivered is more explicit.

      The choice of reporting average lick cluster size is problematic because the authors use thirsty rats with 10-second-long trials. Thirsty rats are likely to lick in relatively long clusters, especially for neutral and palatable tastes. If the rat is mid-cluster when the trial ends, the final cluster would be cut off prematurely, resulting in shorter overall average lick cluster size, disproportionately affecting neutral and palatable tastes over aversive tastes.

      We have ourselves been deeply concerned with this issue; we have recently published a paper that includes within it a direct test demonstrating that calculations of lick bout lengths from 10-sec BAT trials result in taste palatability estimates that are identical to (and less noisy than) those generated from more classically-used 15-min ad lib licking. We will cite this paper (Lin, et al., 2026) in the Methods section of the revision, along with text clarifying how we calculated lick clusters. That said, we are also planning to conduct an additional analysis that estimates taste preference after removing these “premature bouts” and will evaluate how this recalculation affects our results.

      Of course, even if 10-sec BAT trial data DIDN’T provide reliable preference measures, the result of clusters being cut short by the end of a trial would be an underestimation of the preference for the palatable tastes (which drive far more licking than aversive tastes and are therefore more likely to be mid-bout at the end of a trial). Such an underestimation would in turn be expected to reduce the observed neural-behavioral correlation. This fact actually highlights the robustness of our findings.

      Canonical palatability rankings may not apply to the concentrations selected in every stimulus set. This is particularly true for set 1, which included two concentrations of citric acid and quinine for the behavior. It is also not clear which concentrations are reported in Figures 3A2 and 3B2. Meanwhile, the concentrations of quinine and citric acid used for electrophysiology are quite low.

      In the revision Methods section, we will explicitly motivate our reasoning behind canonical rankings for each taste battery used (the added text will include citations). We have also added to the Discussion section prose concerning the possible impact of possibly getting those rankings wrong—i.e., the impact is minimal, given that our results are largely driven by differences between rats (and day-to-day differences within rat), and the resultant fact that almost any choice of canonical rankings would poorly reflect the behavior of individual rats on individual days.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors provide extensive immunoreactivity and expression data to map monoaminergic neurotransmitter production sites in Pristionchus pacificus. This nematode is relatively distantly related to the popular model nematode Caenorhabditis elegans, for which such information is already available. They find that dopamine, tyramine, and octopamine are present in the same neurons in both species, but differences are observed for serotonin. This forms the basis for a comparison of serotonergic neurons across 22 nematode species. In addition, they evaluate monoaminergic effects on egg-laying, head movement during reversals, and nictation behavior, to find that monoaminergic control over the latter differs between C. elegans and P. pacificus. This shows that some anatomical flexibility supports similar outcomes, whereas in other cases it is the basis of evolved regulatory differences.

      Strengths:

      The comparative efforts are laudable and valuable, including a thorough revisiting of old data and corrections of what is judged as a historic misannotation. The expected continued value of this work is also appreciated, because nematodes have similar anatomies and behaviors, cellular-resolution data of different species permits the study of functional evolution of neurotransmitter usage in homologous neurons.

      Despite the strong experimental approach, there are some points that require addressing:

      (1) Not all the concepts of the introduction ('feeding behaviors', to a lesser extent also 'evolution of neurotransmitter usage in homologous neurons') are followed up upon in the results or discussion sections.

      We will address the relative treatment of particular topics in the introduction and discussion in a revised version of the article.

      (2) The choice of nematodes ('only' 13 species) may affect what is perceived as ancestral.

      See above regarding ‘13 species’ (actually 22). Most species and genera were specifically selected previously (Loer and Rivard, 2007; Rivard et al., 2010) for broad phylogenetic coverage, representing different species and genera in 4 major clades within ‘clade V’ (Kiontke et al., 2007; Sudhaus, 2011): Anarhabditis (Caenorhabditis, including both the Elegans and Drosophilae species groups), Synrhabditis (Oscheius, Metarhabditis, Reiterina and Rhabditella), Pleiorhabditis (Teratorhabditis, Mesorhabditis, Rhomborhabditis and Pelodera), and Diplogastrids represented by P. pacificus. Among the outgroups to clade V, there are 3 distinct clades represented, each with at least two species and/or genera represented. Therefore, we believe that the determination of an ancestral condition is well-founded. We plan to add this rationale to the revised version to make this clearer.

      (2, continued) Also, identifying their cells based on comparisons with Ce or Ppa identifications only is understandable but mildly risky: there are many cells in the head, and mistakes would go unnoticed until detailed analysis in each species can provide conclusive evidence.

      We agree that there is a mild risk of incorrect identification but believe that appropriate caveats are noted in the text. Furthermore, the recent head EM reconstruction and complete embryonic cell lineage of the P. pacificus (Cook et al., 2025) shows a nearly 1-1 homology correspondence between head neurons (e.g., only a single head neuron is missing in the Ppa head relative to Cel due to altered apoptosis), and a quite high level of conservation of neurite morphology and soma position between Cel and Ppa suggests that identifications are likely correct when examining related nematodes. In cases for which a serotonin-immunoreactive cell is found in the predicted location (and often having apparent associated neurites), its homology to the matching Cel and Ppa cell is the most parsimonious interpretation: otherwise, one cell would have to lose expression and another nearby cell gain it.  

      (3) It is not reported whether the nictation-defective mutants have general locomotion defects; therefore, whether the reported problem is specific to this host-finding behavior or not.

      None of the mutants we tested for nictation behavior, including those that show severe defects in nictation (Ppa-cat-1, Ppa-tph-1, Ppa-tdc-1, Ppa-tbh-1), exhibited noticeable general locomotion defects either as dauers or non-dauers. Further clarification will be provided in a revised version of the article.

      (4) The section on RIP neurons makes sense for Ppa, but not for Ce (dauers in fact have weakened IL2-to-RIP connections) and should be revised. The nictation data also do not support the breadth of the conclusions, which should either be toned down or rephrased as hypothetical.

      We plan to address these concerns in a revised version of the article.

      (5) The discussion mostly reiterates the results, leaving little room for the author's interpretations and opinions. I would suggest reworking in favor of conceptual discussion.

      As noted above, we agree to address the relative treatment of matters in discussion in a revised version of the article.

      Reviewer #2 (Public review):

      Summary:

      This paper makes important contributions to our understanding of how nervous systems evolve, with a particular focus on whether changes in neurotransmitter usage within homologous neurons represent a mechanism for evolutionary adaptation without large-scale changes to circuitry. Comparing the predatory nematode P. pacificus with C. elegans, this study systematically examines monoamine-producing neurons, assesses how their neurotransmitter identities differ between homologous neural types, and determines how these differences relate to behavior.

      Strengths:

      The major strength of this work is its breadth, rigor, and data quality. It combines multiple, independent lines of evidence to assign neurotransmitter identity for neurons with homology grounded in lineage, morphology, and connectomics, which is essential for meaningful cross-species comparisons. Additionally, by extending the analysis beyond P. pacificus and C. elegans to other nematodes, the authors convincingly argue that features observed in P. pacificus likely reflect an ancestral state. This depth greatly enhances the significance of the conclusions.

      This work is likely to have a significant impact on the fields of comparative neurobiology and nervous system evolution. It demonstrates a powerful system and approach for linking molecular identity, cell-type homology, circuit context, and behavior across species. The data generated here will be a valuable resource for the community and provide a strong foundation for future mechanistic studies.

      More broadly, the study reinforces the idea that evolutionary change in nervous systems can occur through modulation of chemical signaling within conserved circuits, rather than through complete rewiring. This conceptual framework is likely to influence how researchers think about neural evolution in other systems.

      Weaknesses:

      Given the availability of detailed connectivity information for both species, a more explicit comparison of the local circuit context of key neurons would further strengthen the link between molecular identity and circuit function.

      We plan to address these concerns in a revised version of the article.

      Reviewer #3 (Public review):

      Summary:

      The study by Hong, Loer, Hobert, and colleagues is a comprehensive description of monoaminergic neurons in the nematode Pristionchus pacificus. The work used multiple, complementary approaches, including immunostaining and expression of genes involved in neurotransmitter synthesis or transport, to identify neurons that express a monoamine neurotransmitter. Moreover, this study characterized the phenotypes of various mutants to study their organismal function. Extensive comparisons are made to C. elegans, the nematode model that, in a way, anchors the model studied here, and new outgroup species were examined for some features so that the polarity of their evolution could be inferred. Although there is no simple or groundbreaking punchline to distill from the manuscript (i.e., other than some things are the same as in C. elegans, and some things are different), and while the study is basically descriptive in nature, the scope of the project warrants broad attention.

      Strengths:

      This manuscript offers a tremendous resource for those who use this species as a model, which, based on the author list alone, includes many labs. This study sets the bar for what can be done in a "satellite" model system.

      Given the complementarity of approaches used, such as the position of cell bodies, the connectivity and morphology of dendrites, and a previously published atlas of the connectome for this species, the identification of specific neurons (which, as the authors point out, can be easily mistaken) is convincing throughout. Likewise, appropriate caution is observed where neuron identities are ambiguous, e.g., unlabeled cells in Figure 5, or ambiguous identities in other species, as shown in Figure 10. There was a lot of data to unpack in this manuscript, but I could not find any obvious flaws in neuron identification.

      Also, the phenotypic assays were straightforward and informative.

      Weaknesses:

      No serious weaknesses were noted. One minor comment is that in general, I think the Methods could use some additional text to describe what the goal of any given technique was. For example, although there is a description of the HCR protocol in the methods, nowhere does it say what genes this method would be used for. In addition to what is shown in Figure 4, this information should be given in the Methods.

      More detailed methods will be provided in a revised version of the article.

    1. Author response:

      Public reviews:

      Reviewer #1 (Public review):

      (1) We agree that the current design does not allow us to cleanly dissociate whether the beneficial effect of retrieval practice on AC inference under stress reflects a selective enhancement of inferential processing or, instead, stronger memory for the underlying AB and BC premise pairs that supports later inference. We plan to revise the manuscript to remove wording that could be read as claiming that retrieval practice specifically protects inference independently of associative-memory strengthening.

      Our intended interpretation is more modest. As shown in Section 3.2.3, retrieval practice improved direct premise-memory performance, consistent with the well-established testing effect. In the present paradigm, successful AC inference necessarily depends on access to the AB and BC premise associations. Accordingly, strengthened premise memory is not an alternative explanation that can be excluded by our data, but rather a plausible mechanism through which retrieval practice may promote more resilient inference performance under stress.

      Because AC inference in our paradigm necessarily depends on retrieving and linking the AB and BC premise pairs, strengthened premise memory is not merely a competing explanation that can be separated from inference performance in the current dataset. Rather, it is a plausible mechanism through which retrieval practice may support inference, especially under stress. We therefore will revise the manuscript to avoid implying that retrieval practice protects inferential processing independently of associative-memory strengthening, and instead interpret the effect more conservatively as reflecting enhanced premise representations and/or more effective reactivation of bridge information during inference.

      We also agree that the post-inference direct memory test, which used a 2AFC format, provides only a coarse measure of premise-memory strength and allows some proportion of correct responses to arise from guessing. Therefore, restricting analyses to trials in which AB and BC were later answered correctly does not fully guarantee that those trials were supported by strong associative memories. We will acknowledge this limitation explicitly in the manuscript and have tempered our interpretation of these “successfully retrieved” premise trials accordingly. More stringent measures, such as cued recall, confidence-based memory judgments, or other continuous indices of premise-memory strength, would be better suited to this question in future work.

      Finally, we agree that the absence of a retrieval-practice benefit in the non-stress condition does not by itself rule out mediation through strengthened premise memory. Because the retrieval-practice manipulation was introduced in a follow-up study after completion of Study 1, the present dataset was not designed as a single fully crossed factorial experiment. In response to the reviewer’s suggestion, we will add an exploratory mediation analysis testing whether premise-memory performance statistically accounts for the relationship between retrieval practice and inference performance. We will report this analysis cautiously, given that premise memory was assessed using a post-inference 2AFC measure, and we note in the manuscript that a future fully crossed design with more sensitive premise-memory measures will be needed for a stronger test.

      (2) We apologize that the presentation of Figure 4A was not sufficiently clear and may have created the impression of below-chance inference performance. The values shown in Figure 4A do not represent raw 3-alternative forced-choice (3AFC) A-C inference accuracy, for which the theoretical chance level would be 0.33. Instead, Figure 4A plots a normalized inference index, calculated as inference performance relative to direct retrieval performance, to account for individual differences in the availability of the directly learned premise pairs. Therefore, the raw 3AFC chance level is not the appropriate reference for interpreting this measure. To avoid this confusion, we will clarify in the revised manuscript and figure legend that Figure 4A shows a normalized inference index rather than raw inference accuracy.

      (3) We agree that implementing retrieval practice in a separate experiment, rather than within a single 2 × 2 factorial design, limits the strength of the causal inference regarding retrieval practice and reduces our ability to formally test the retrieval practice × stress interaction within one unified design.

      In response, we will revise the manuscript to more explicitly acknowledge this limitation and to temper our interpretation throughout. Specifically, we now avoid overstating retrieval practice as definitively preventing the effects of stress, and instead describe the findings more cautiously as evidence that retrieval practice was associated with attenuation of stress-related inference impairments across experiments. We also will add a limitation statement in the Discussion noting that the current design cannot fully rule out cohort-related confounds and that a fully crossed factorial design will be necessary in future work to provide a more rigorous test of the interaction between retrieval practice and stress.

      At the same time, we have clarified that the two experiments were conducted under closely matched conditions: participants were recruited using the same protocol from the same campus population, demographic characteristics were matched, and both experiments were run in the same laboratory using the same EEG system, task procedures, and experimenter team. We agree, however, that these procedural consistencies reduce but do not eliminate the concern about between-experiment confounds.

      (4) We agree that the absence of a matched re-exposure/restudy control condition limits the mechanistic interpretation of the retrieval-practice effect. In the revised manuscript, we will make this limitation more explicit in the Discussion and temper our conclusions accordingly. Specifically, we clarify that the present design shows that a post-encoding retrieval-practice intervention buffered the impact of acute stress on later inference, but it does not allow us to determine whether this benefit is specific to retrieval practice per se, rather than to additional exposure to the AB and BC associations.

      We also agree that it is important to distinguish whether the effect operates at the level of specific practiced items or reflects a more global participant-level effect. In the current study, however, the retrieval-practice phase in Experiment 2 was implemented as a brief timed free-recall procedure rather than a trial-by-trial cued retrieval task, and the available records do not allow us to reliably link retrieval-practice success for individual associations to specific later AC inference trials. Therefore, we cannot directly compare later inference performance for successfully versus unsuccessfully retrieved items on a trial-by-trial basis.

      To address this issue as far as possible with the current dataset, we instead plan to conduct an additional item-level robustness analysis using mixed-effects models that accounted for variability across ABC associations. Specifically, we tested whether the critical stress-by-retrieval-practice effect remained after modeling triad-level variability, and whether there was evidence that this effect differed substantially across triads. This analysis does not provide a direct test of whether successfully retrieved items benefit more than unsuccessfully retrieved items, but it does help assess whether the observed effect is broadly distributed across associations or driven by only a small subset of items.

      (5) We agree that our current decoding approach does not justify a strong claim of item-specific reinstatement of a unique bridge memory. The classifier was trained to discriminate stimulus categories (faces vs. buildings) in the independent localizer and then applied during the inference phase. Therefore, the present analysis is better interpreted as indexing reactivation of bridge-related category information, rather than reinstatement of an item-specific episodic representation.

      Importantly, however, we believe this signal remains theoretically informative for the inferential process examined here. In our design, the bridge element B belonged to one of the trained categories, and the classifier was applied during the cue period when no face or building stimulus was physically present. Thus, successful decoding in this time window suggests that task-relevant bridge-related information was re-expressed online during inference, rather than reflecting concurrent perceptual processing. At the same time, we agree that, because only two categories were used, the decoding analysis cannot fully dissociate bridge-related category reactivation from broader category-level retrieval, strategic task differences, or attentional contributions.

      To address this concern, we plan to revise the manuscript in three ways. First, we will soften the interpretation throughout the Results and Discussion to avoid claims of item-specific bridge-memory reinstatement. Second, we now refer to the decoding effect more conservatively as bridge-related or category-level mnemonic reactivation during inference. Third, we have added an explicit limitation stating that the current design does not allow us to distinguish item-specific episodic reinstatement from category-level reactivation, and that future work using more fine-grained representational analyses and/or a larger stimulus set will be needed to resolve this issue more directly.

      Reviewer #2 (Public review):

      (1) We agree with this important point. The inference task was scheduled to begin approximately 20 minutes after stress onset based on prior human stress literature, with the intention of probing a time window commonly associated with glucocorticoid effects. However, as the reviewer notes, this period may also still reflect residual adrenergic/SAM influences. Because salivary cortisol was not collected due to the COVID-19-related safety protocol, we cannot disentangle the relative contributions of glucocorticoid and adrenergic responses to the observed stress-related effects on inference and neural reactivation. We will revise the manuscript to make this limitation more explicit in the Discussion and to avoid attributing the effects to a specific physiological component of the stress response.

      (2) In the revised manuscript, we will add asterisks (or equivalent significance annotations) to Figures 4 and 6 to improve clarity and readability.

      Reviewer #3 (Public review):

      (1) We thank the reviewer for highlighting this important reporting issue. We agree that the number of trials contributing to the behavioral and EEG analyses should be reported more explicitly, particularly because inference performance was analyzed in relation to direct retrieval performance and because direct retrieval differed across experiments.

      In the revised manuscript, we will report, for each group and experiment, the number of trials presented in the AC inference phase, the number of trials retained for the behavioral analyses, and the number of successfully retrieved direct-memory trials in the AB and BC tasks. These values will be summarized in the revised Results section and in Supplementary Tables.

      To directly address the reviewer’s concern, we will also compared trial counts across groups/experiments and evaluated whether differences in direct retrieval performance could account for the inference and EEG effects. To further address the concern about potential unequal trial numbers, we plan to repeat the analyses such as trial-count-matched subsets analyses to see whether results remained qualitatively unchanged.

      (2) We thank the reviewer for this important comment. We agree that our original title and some parts of the manuscript used language that was stronger than warranted by the data. Our results show that rapid reactivation of the bridge element is associated with successful inference and is modulated by stress and retrieval practice, but they do not by themselves establish a causal mechanistic role for reactivation. We therefore plan to revise the title and softened the relevant wording throughout the manuscript to better reflect the correlational nature of this evidence.

      Specifically, we plan to change the title from “Retrieval practice prevents stress-induced inference impairment by restoring rapid memory reactivation” to “for example, Retrieval practice prevents stress-induced inference impairment and preserves rapid bridge-item memory reactivation” We also revised the Abstract, Results, and Discussion to replace stronger mechanistic wording such as “prevents,” “restoring,” and “essential neural mechanism” with more cautious phrasing such as “buffers” or “attenuates,” “preserves” or “is associated with,” and “neural correlate” or “candidate process,” as appropriate. This revision will led us to temper the overall interpretation of the EEG findings: rather than claiming that reactivation is the mechanism by which retrieval practice prevents stress-related inference deficits, we now conclude that rapid bridge-item reactivation is a neural correlate of successful inference that is sensitive to stress and enhanced by retrieval practice.

      We also appreciate the reviewer’s concern regarding the use of one-tailed follow-up tests and the absence of multiple-comparison correction. With respect to the one-tailed t-tests, these follow-up comparisons were conducted because the relevant hypotheses were directional a priori. Based on prior work and our theoretical framework, we specifically predicted that acute stress would impair inference-related performance and neural reactivation, and that retrieval practice would mitigate these effects. The follow-up tests were therefore not exploratory post-hoc comparisons, but planned tests used to decompose the significant omnibus effects in the predicted direction. For this reason, we considered one-tailed testing appropriate for these comparisons.

      Similarly, we did not apply an additional multiple-comparison correction to these planned follow-up tests because they were limited in number, theory-driven, and conducted to evaluate specific directional predictions rather than to search broadly across many possible contrasts. Importantly, our interpretation does not depend on any isolated post-hoc comparison, but on the consistency of the results across behavioral inference measures, neural decoding of bridge-item reactivation, and theta-band analyses. We have revised the manuscript to make this rationale clearer and to ensure that the follow-up results are interpreted in the context of the full pattern of evidence.

      (3) We agree that, in the previous version, parts of the manuscript were not structured clearly enough, which may have made it difficult for readers to follow the logic of the study and the sequence of analyses without moving back and forth across sections. In the revised manuscript, we will reorganize the presentation to improve the overall narrative flow and readability. Specifically, we plan to clarify the study logic and analysis sequence, strengthened transitions between sections, and revised the relevant text in line with the #reviewer3’s detailed suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their thoughtful comments, which substantially improved the quality and clarity of our manuscript. We have attempted to address each major concern with either new experiments or significant textual revisions.

      Reviewer 1 noted that “this research is conducted exclusively in HEK293 cells… including at least one additional cell line would significantly strengthen the main findings.” To directly address this concern, we repeated our RAB1A/B double-knockdown experiments in H4 neuroglioma cells, which endogenously express a tandem fluorescent-tagged LC3B reporter. Using flow cytometry to quantify autophagic flux, we confirmed that RAB1 depletion in H4 cells recapitulates the flux defects observed in HEK293 cells, thereby validating the generality of our findings across distinct lineages.

      To validate the robustness of the ATG2 DKO phenotype and the localization of ARFGAP1-positive membranes, we acquired an ATG2 double knockout HeLa cell line. We confirmed the presence of the characteristic large ATG2-deficient PAS compartment in HeLa cells, and the recruitment of ARFGAP1 membranes, but note that ARFGAP1 displays a solid distribution through the compartment in these cells, in contrast to the more peripheral enrichment observed in HEK293 cells. These data are now included and discussed in the revised manuscript.

      Multiple reviewers asked for greater clarity around the interaction between ATG2A and RAB1A. Although our original data showed that these proteins co-immunoprecipitate in cells, we had not established whether their association was direct. In response, we attempted in vitro co-immunoprecipitations from purified components.  As we could not detect interactions in this simplified system, we now speculate that the ATG2A–RAB1A interaction is indirect. This clarification is now incorporated into the results section.

      Multiple reviewers also raised questions regarding the nature of the membranes recruiting ARFGAP1 and the potential relationship to Arf1 and Golgi trafficking. In particular, Reviewer 3 asked: “(5) What about Arf1? … one would predict that Arf1 does not localize to these structures and does not affect ATG2A function.” To examine whether ARFGAP1 recruitment depends on Golgi integrity or Arf1-regulated trafficking, we perturbed the Golgi using three mechanistically distinct methods: Brefeldin A, mitotic entry, and SidM expression, each of which dissolves Golgi architecture. In each condition, ARFGAP1 localization to the enlarged PAS compartment in ATG2 DKO cells was unchanged. These results indicate that ARFGAP1 recruitment is independent of Golgi structure and provide indirect support for the notion that Arf1 does not participate in this process. Reviewer 3 also asked: “Is the curvature-sensitive region of ARFGAP1 required for its co-localization with ATG2A?” To address this, we generated ARFGAP1 mutants lacking either GAP catalytic activity or the ALPS curvature-sensing domain. When expressed in ATG2 DKO cells, all mutants retained full recruitment to the PAS compartment. Thus, neither GAP activity nor ALPS-mediated curvature sensing is required for ARFGAP1 localization in this context.

      Response to Reviewer 3 -“(2) Figure 3A/B: … is there another tool/assay to validate this result?”—we quantified autophagic flux following SAR1B(H79G) overexpression using the flow-cytometry tandem-fluorescent LC3 assay. These experiments confirmed that SAR1B(H79G) causes only a modest reduction in autophagic flux, consistent with partial inhibition of COPII, thereby supporting our original interpretation.

      We also took steps to improve the integration of our findings with prior literature. Reviewer 2 requested that we strengthen the manuscript by incorporating studies on ERES–ERGIC remodeling (“It would strengthen the manuscript to discuss previous studies…”). We now cite and discuss the studies corresponding to PMIDs 34561617 and 28754694, aligning our observations with mechanistic models of early secretory pathway remodeling. More broadly, Reviewer 1 commented that our discussion “overlooks some important aspects,” and Reviewer 3 asked, “Are the membranes to which ATG2A is recruited a form of ERGIC?” In response, we substantially rewrote the discussion, expanding our integration of existing literature and explicitly addressing models in which ATG2A acts at an ERGIC-derived membrane.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) I found the bigger picture analysis to be lacking. Let us take stock: in other work, during active cognition, including at least one study from the Authors, TDLM shows significance sequenceness. But the evidence provided here suggests that even very strong localizer patterns injected into the data cannot be detected as replay except at implausible speeds. How can both of these things be true? Assuming these analyses are cogent, do these findings not imply something more destructive about all studies that found positive results with TDLM?

      Our focus here is on advancing methodology. Given the diversity of tasks and cognitive states in the TDLM literature, replay could exceed detection thresholds under specific conditions—especially when true event durations align with short analysis windows. While a comprehensive re-analysis of prior datasets is beyond our scope, we agree a concise synthesis can strengthen the paper.

      The previous TDLM literature uses a diverse set of tasks and addresses a broad spectrum of cognitive constructs/processes. As we acknowledge, it is perfectly possible that replay bursts in short time windows are well detectable by TDLM. However, we acknowledge that some commentary on this is warranted and have added the following paragraph to the discussion that addresses “improving TDLMs sensitivity”:

      “Finally, what do our simulations imply for the broader MEG replay literature? Our implementation successfully detects replay when boundary conditions are met, as shown in the simulation. But sensitivity depends critically on high fidelity between the analysis window and the density of replay events. A systematic evaluation of these conditions as they apply to prior studies remains beyond the scope of the current paper. Instead, our focus is on delineating boundary conditions that we hope will motivate conduct of power analyses in future work as well as inclusion of simulations that approximate realistic experimental conditions.”

      (2) All things considered, TDLM seems like a fairly 'vanilla' and low-assumption algorithm for finding event sequences. It is hard to see intuitively what the breaking factor might be; why do the authors think ground truth patterns cannot be detected by this GLM-based framework at reasonable densities?

      We agree with the overall sentiment of the referee. Our intuition is that one of the principal shortcomings of the method relates to spurious sequenceness induced by unknown factors at baseline, and poor transfer of the decoder to other modalities. and have a rough understanding of how they occur, we are currently not in a position to identify their nature. Note that we believe that these confounders are not exclusive to TDLM but are potentially threatening to all kinds of sequenceness analysis of longer time series that rely on decoders. Indeed, we suspect that classifier training is another bottleneck, as we don’t know the exact nature of the representations that are replayed, including the degree of overlap there is with a commonly used visual localizer. That said, this is not of relevance for the simulation in so far as we insert patterns that exceed the pattern strength in the localizer.

      Finally, a potential major drawback is the permutation test for significance testing. As the original authors of TDLM have noted, the current test which permutes states is overly conservative. It measures fixed effects and as it only considers the group level mean it is accordingly easily biased by individual outliers. This we have tried to account for by z-scoring sequenceness scores. We have also conferred on this with some of the authors of TDLM and discussed a yet unpublished method that aims to address this exact issue. The proposed new method uses a sign-flip permutation test at a group level and therefore implements a random-effects model of the data. This significance test has markedly increased power while still controlling for FWER. However, while we show in our power analysis that the new method is indeed more sensitive, it does not materially change the interpretation of the data. We have included this novel method in the paper and added it into the main analysis and most of the simulations.

      (3) Can the authors sketch any directions for alternative methods? It seems we need an algorithm that outperforms TDLM, but not many clues or speculations are given as to what that might look like. Relatedly, no technical or "internal" critique is provided. What is it about TDLM that causes it to be so weak?

      We believe there are several shortcomings and bottlenecks within TDLM that need to be evaluated and improved. While we highlight these issues in the discussion section titled “Improving TDLMs sensitivity,” we agree that we should provide a clearer outline of its current shortcomings. We have now added to the discussion to expand on that we think needs improvement (‘fixed time lag’) and also add a summary statement at the end of the relevant paragraph to recap the main issues needed for an improved successor method. The new paragraphs read:

      “Lastly, there are certain assumptions that TDLM makes that might not hold (see Methods Study II): Current implementations look for a fixed time lag that is the same across all participants and between all reactivation events. If time lags differ across participants, TDLM will fail to find them. Similarly, TDLM assumes a fixed sequence order and is not robust against slight within-sequence permutations or in-sequencemissing reactivation events. However, from other data sources., such as hippocampal place cell recordings, it is known that such permutations can occur where some states are skipped or fail to decode during replay. Similarly, it is assumed that each reactivation event lasts between 10-30 milliseconds, but the true temporal evolution of reactivation measured by TDLM is currently unknown. Future method development might focus on improving invariance to these assumptions.

      […]

      In summary, there are several areas where TDLM might be improved, including a restriction in its search space, improvement in classifiers, a validation of localizer representation transfer to other domains (e.g. memory representations), and the extension of TDLM to render it more robust against violations of its core assumptions.”

      Reviewer #2 (Public review):

      Weaknesses:

      The sample size is small (n=21, after exclusions), even for TDLM studies (which typically have somewhere between 25-40 participants). The authors address this somewhat through a power analysis of the relationship between replay and behavioural performance in their simulations, but this is very dependent on the assumptions of the simulation. Further, according to their own power analysis, the replay-behaviour correlations are seriously underpowered (~10% power according to Figure 7C), and so if this is to be taken at face value, their own null findings on this point (Figure 3C) could therefore just reflect under sampling as opposed to methodological failure. I think this point needs to be made more clearly earlier in the manuscript.

      We agree with the referee that our sample is smaller than previous studies due to participant exclusion criteria. However, the take-away message from our behavioural simulation and bootstrapping is that even with larger sample sizes, it is difficult to overcome baseline fluctuations of sequenceness, even if very strong replay patterns were detectable and sample sizes were of similar size to that of previous studies. Therefore, we are not convinced that that our null findings are fully explained by the smaller sample size compared to that of previous studies, Additionally, we show that even within the range of other studies, similar power would have been expected (Supplement Figure 11). However, it is true that in general null findings can be explained by under-sampling, under the assumption that an effect is present. To amplify this point, we have added the following to the Figure 3C:

      “[…]. NB, however, as our simulation shows, correlations of sequenceness with behavioural markers are likely to be underpowered and occur only with very high replay rates or much higher sample size. See our simulation discussion for a more detailed explanation on how correlations may be inherently biased, where fluctuations in baseline sequenceness overshadow individual scaling with behavioural markers.”

      Furthermore, we have added the following paragraph to the discussion to highlight this point and refer to a power analysis we have now added to the supplement (see next answer):

      “Sample sizes in previous TDLM literature usually range between 20 to 40 participants. A bootstrap power analysis shows that even at those sample sizes, power would remain low unless unrealistically high replay rates are assumed (Supplement Figure 11). Our bootstrap simulation shows that a correlation analysis between sequenceness and behaviour would in these cases be drastically underpowered, even under an assumption of high replay densities.”

      Finally, we have added a remark about the sample size to the limitations section, as naturally, an increase in sample size would yield higher power:

      “Finally, while initially planning for thirty participants, due to exclusion criteria, our study featured fewer participants than most previous studies using TDLM (i.e. usually 25-40, but 21 in our study). While we are confident that our simulation results hold under these sample sizes, as sample sizes of other studies show comparable power to ours (Fehler! Verweisquelle konnte nicht gefunden werden.), we cannot fully rule out a possibility that our null-findings are explained by a lack in power alone.”

      Relatedly, it would be very useful if one of the recommendations that come out of the simulations in this paper was a power analysis for detecting sequenceness in general, as I suspect that the small sample size impacts this as well, given that sequenceness effects reported in other work are often small with larger sample sizes. Further, I believe that the authors' simulations of basic sequenceness effects would themselves still suffer from having a small number of subjects, thereby impacting statistical power. Perhaps the authors can perform a similar sort of bootstrapping analysis as they perform for the correlation between replay and performance, but over sequenceness itself?

      We agree with the referee that this, in principle, is a great idea. However, the way that significance thresholds are calculated poses a conceptual problem for such an analysis: as for significance threshold we are defining the maximum sequenceness value across all participants, all time lags and all permutations. This sequenceness value is compared against the mean of all participants, disregarding the standard deviation. This maximum threshold would not change if we bootstrapped some of our samples. Additionally, the 95% would also not change significantly. To illustrate this point, we have added this analysis to the supplement, as Supplement Figure 10. However, the new sign-flip permutation test we now include allows for such a comparison, as it takes variance between participants into account as well! We have included all three variants of the power analysis and the figure description now reads:

      “Supplement Figure 11 Power analysis of sequenceness significance for bootstrapped samples sizes. A) Powermap for state-permutation thresholds. However, here the bootstrap approach suffers from a conceptual problem: significance thresholds are defined by the permutation maximum and/or 95-percentile of the maximums across all sequence-permutations across participants. If we resample bootstrap-participants from our existing pool, the maximum thresholds computed will remain relatively stable across resampled participants, as it only compares against the mean and disregards the standard deviation. B) The newly presented statistical approach is significantly more sensitive at higher sample sizes. Note that even then, 80% power is only reached with replay density of higher than 50 min-1 at a sample size of 60 participants. Additionally, the sign-flip permutation test assumes that the mean is at zero. As we observed a non-zero mean due to spurious oscillations, we subtracted the mean sequenceness of the baseline condition from each participant before permuting to achieve a null distribution with mean zero, as otherwise, we would have found significant replay effects in the baseline condition at increasing sample size. Nevertheless, due to the higher sensitivity, the new sign-flip test is recommended over the previous sequence-permutation-based test. Colours indicate the power from 0 to 1 for different bootstrapped sample sizes and densities. 80% power thresholds are outlined in black.”

      The task paradigm may introduce issues in detecting replay that are separate from TDLM. First, the localizer task involves a match/mismatch judgment and a button press during the stimulus presentation, which could add noise to classifier training separate from the semantic/visual processing of the stimulus. This localizer is similar to others that have been used in TDLM studies, but notably in other studies (e.g., Liu, Mattar et al., 2021), the stimulus is presented prior to the match/mismatch judgment. A discussion of variations in different localizers and what seems to work best for decoding would be useful to include in the recommendations section of the discussion.

      We agree and thank the referee for raising this issue. Note, we acknowledge we forgot to mention that these trials were excluded from classifier training. Our rationale of presenting the oddball during stimulus presentation, and not thereafter, was an assumption that by first presenting the audio and then the visual cue we would create more generalized representations that would be less modalitydependent. However, importantly, we excluded all trials that were oddballs from localizer training. Therefore we assume that this particular design choice will not greatly affect the decoder training. If some motor-preparation activity is present during the stimulus presentation, then it should be present equally across all trials and hence be ignored by the classifier as we balanced the transitions between images. We now added this information to the main text:

      “In each trial, a word describing the stimulus was played auditorily, after which the corresponding stimulus was shown. In ~11% of cases, there was a mismatch between word and image (oddball trials), and these trials were excluded from the localizer training.” Additionally in the methods section: “These oddball-trials were excluded from all further analysis and decoder training.”

      Nevertheless, we agree that the extant variety in localizer designs is underdiscussed where many assumptions of classifier training are not, as yet, fully validated. We have added a sentence highlighting different oddball paradigms to the section on the discussion of localizers and also add a summary statement with recommendations. The passage now reads:

      “Additionally, a wide variety of oddballs has been used (e.g. upside-down, scrambled, or mismatched images, cues presented visually, as words, auditorily, etc), and at this time it is unclear if these affect the representations that the classifier learns [...] In summary, we would expect a multimodal categorical localizer, and a classifier that isn’t trained on a specific timepoint, to generalize best.”

      Second, and more seriously, I believe that the task design for training participants about the expected sequences may complicate sequence decoding. Specifically, this is because two images (a "tuple") are shown together and used for prediction, which may encourage participants to develop a single bound representation of the tuple that then predicts a third image (AB -> C rather than A -> B, B -> C). This would obviously make it difficult to i) use a classifier trained on individual images to detect sequences and ii) find evidence for the intended transition matrix using TDLM. Can the authors rule out this possibility?

      We thank the reviewer for raising a possibility we have not considered! While there is some evidence that a single bound representation would have overlap with its constituents (especially before long term-consolidation) and therefore be detectable by the classifiers, we acknowledge the possibility that individual classifiers would fail to be sensitive to such a compound representation. In fact we find in the retrieval data some evidence for a combined replay of representations (where representations are replayed seemingly at the same time, see Kern 2024). We have added such a possibility to the interims-discussion of Study 1 as a qualification . However, this does not change the results or interpretation of our simulation which we consider is a key message of the paper.

      The relevant segment in the discussion section now reads:

      “Additionally, given that the stimuli were presented in combined triplets, participants may have formed a singular representation of associated items and subsequently replayed these (e.g., AB→C), instead of replaying item-by-item transitions (A→B→C). Under such a scenario, a classifier trained on individual items may fail to detect these newly formed bound representations, particularly if they diverge strongly from the single-item patterns. In our previous study where we address retrieval (Kern et al., 2024) we found that states were to varying extent co-reactivated, yet classifiers trained on single items retained sensitivity to detect these combined reactivation events. Consistent with this, prior work suggests that unified representations retain overlap with their constituent item representations (Dennis et al., 2024; Liang et al., 2020), however, there’s also evidence that different brain regions are involved if representational unitization occurs (Staresina & Davachi, 2010), potentially confusing classifiers. Therefore, we cannot exclude that rest-related consolidation replays engendered unitized representations that were insufficiently captured by our singleitem classifiers.“

      Participants only modestly improved (from 76-82% accuracy) following the rest period (which the authors refer to as a consolidation period). If the authors assume that replay leads to improved performance, then this suggests there is little reason to see much taskrelated replay during rest in the first place. This limitation is touched on (lines 228-229), but I think it makes the lack of replay finding here less surprising. However, note that in the supplement, it is shown that the amount of forward sequenceness is marginally related to the performance difference between the last block of training and retrieval, and this is the effect I would probably predict would be most likely to appear. Obviously, my sample size concerns still hold, and this is not a significant effect based on the null hypothesis testing framework the authors employ, but I think this set of results should at least be reported in the main text.

      We disagree that an absence or presence of replay might be inferred from an absolute memory enhancement. While consolidation can lead to absolute improvement of performance in, for example, motor memory domains one formulation is that in declarative learning tasks replay stabilizes latent memory traces, and in such a scenario would not necessarily lead to a boosted performance. While many declarative consolidation studies report an increase of performance compared to a control condition (i.e. without a consolidation window), this does not necessarily entail an absolute performance increase, as replay might just act to protect against loss of memory traces. Therefore, the modest increase we observe does not inference as to the presence of absence of replay absent a proper control condition.

      We did expect to find a correlation between replay and individual behavioural. Indeed, a weak correlation with performance and sequenceness can be detected. However, as we also show any such correlation is overshadowed by baseline fluctuations in sequenceness such that its overall validity is questionable, even under very high replay rates. We are therefore circumspect about this correlation, even if it was significant. Therefore, in the discussion, we chose to refrain from putting much focus on this correlation. Nevertheless, we do add a short statement to the corresponding figure label, discussing this precise issue. The segment now reads:

      “While we found a non-significant relation between a memory performance enhancement and post-learning forward sequenceness we are cautious not to overinterpret these results. As in the section “Correlation with behaviour only present at high replay speeds” the noted correlational measure oscillates heavily with baseline sequenceness fluctuations, and any true replay effect is likely to be overshadowed by such fluctuations.”

      I was also wondering whether the authors could clarify how the criterion over six blocks was 80% but then the performance baseline they use from the last block is 76%? Is it just that participants must reach 80% within the six blocks *at some point* during training, but that they could dip below that again later?

      We thank the reviewer for highlighting this point: The first block wherein participants reached >80% ended the learning blocks. After a maximum of six blocks the learning session was ended regardless of performance. Therefore, some participants’ learning blocks were ended after six blocks and without them reaching a performance of 80%.. While we described this in the Methods section, it was missing from the Results Study I section, which now contains:

      “[...] Participants then learned triplets of associated items according to a graph structure. Within the learning session, participants performed a maximum of six learning blocks, but the session was stopped if participants reached 80% memory performance (criterion learning,, up to a memory performance criterion of 80% (see Methods for details)”

      The Figure 2 description now contains

      “[...] Participants’ completed up to six blocks of learning trials. After reaching 80% in any block, no more learning blocks were performed (criterion learning) [...]”

      Lastly, there was a mistake in the Behavioural results section, which stated “All thirty participants, except one, [..] to criterion of 80%.” This is an error. In our preregistration, we defined to only include participants that successfully learned anything at all above chance. Here,we meant that only one participant failed to reach a criterion that we defined as “successful learning”. We fixed it and it now reads

      “with an accuracy above 50% (which we preregistered beforehand as an exclusion criterion for “successful learning above chance”).”

      Additionally, we have noted this for clarity in the methods section and excuse this mistake:

      “Additionally, as successful above-chance learning was necessary for the paradigm, we ensured all remaining participants had a retrieval performance of at least 50% (one participant had to be excluded, but was already excluded due to low decoding performance).”

      Because most of the conclusions come from the simulation study, there are a few decisions about the simulations that I would like the authors to expand upon before I can fully support their interpretations. First, the authors use a state-to-state lag of 80ms and do not appear to vary this throughout the simulations - can the authors provide context for this choice? Does varying this lag matter at all for the results (i.e., does the noise structure of the data interact with this lag in any way?)

      This was a deliberate choice but we acknowledge the reasoning behind this was not detailed in our initial submission. We chose a lag of 80 millisecond for three reasons: first, it is distant from the 9-11 Hz alpha oscillations we observed in our participants and does not share a harmonic with the alpha rhythm; second, we wanted to get a clear picture of the effect of simulated replay that is as isolated as possible from spurious sequenceness confounders present in the baseline condition. Thus, we chose a lag in which the sequenceness score was close to zero in the baseline condition; thirdly , in this revision, we subtracted the mean sequenceness value of the baseline such that any simulation effects would start, on average, at zero sequenceness. In this way, we could attribute any increase in sequenceness to the experimentally inserted replay, that was independent of spurious oscillations. Finally (but less importantly), as we observed that a correlation of sequenceness with behaviour was fluctuated strongly, for the reason detailed above, we chose a lag in which a correlation was as close as possible to zero. If we had not chosen a lag that adhered to these conditions, we were at risk of measuring simulated replay plus spurious sequenceness confounders.

      We have added a sentence to the main text detailing this justification:

      “We chose this timepoint (80 msec state to state lag) as its sequenceness value was close to zero in the baseline condition as well as being distant to the observed alpha rhythms of the participants (which varied between ~9-11 Hz). Additionally, we subtracted the mean sequenceness value of the baseline at 80 milliseconds lag such that any simulation effects would, on average, start at zero sequenceness “

      Additionally, we now add a more detailed explanation to the methods section.

      “This time lag (80 msec) was chosen in order to isolate precisely an effect of the experimentally inserted sequenceness. Thus, we chose a lag at which the mean baseline sequenceness was close to zero and where the correlation with behaviour was low. Additionally, we subtracted the mean sequenceness value (at 80 milliseconds) at baseline from the specific lag recorded for each participant, such that simulation effects would be initialized at zero sequenceness on average enabling any effects to be attributed purely to inserted replay. Additionally, we excluded time lags too close to the alpha rhythms of participants (which varied between ~9-11 Hz) or lags which would have a harmonic with the rhythm.”

      Second, it seems that the approach to scaling simulated replays with performance is rather coarse. I think a more sensitive measure would be to scale sequence replays based on the participants' responses to *that* specific sequence rather than altering the frequency of all replays by overall memory performance. I think this would help to deliver on the authors' goal of simulating an "increase of replay for less stable memories" (line 246).

      The referee makes an excellent point and our simulations could be rendered more realistic by inserting the actual tuples that participants answered correctly. If we understand the point correctly, there are two different ways replay might be impacted by performance: First, we can conjecture that there is greater replay if memory performance is not saturated. Second, replay only occurs for content that has actually been encoded!

      The main reasons why we chose to simulate the entire sequence being replayed for each participant is based on the following. TDLM is implemented such that the amount of replay alone is relevant, and actual transitions are not affecting the results beyond noise. Under the assumption that class-specific classifiers perform equally well, simulating A->B, B->C or simulating A->B, A->B yields equivalent results. However, results can differ if this assumption is violated. By drawing from the entire space of classes we insert, we minimize the risk of some classifiers being worse than others for some participants. For example, if we simulated only A->B for some participant instead of the whole sequence, and by chance classifier A performs suboptimally, we would then introduce additional unwanted variance into our results.

      Secondly, from our reading of the literature we infer that replay is increased generally (i.e. density of learning-specific replay is increased) for less stable memories. However, we do not have indicators of memory strength, but only a binary “remembered or not”. As TDLM is invariant to the actual transitions being replayed and only indexes the number of transitions, we chose to ignore which transitions we insert and only scaled the amount of replay.

      We have added an analysis to the Appendix that discusses this specific aspect of our study where we show that results are equivalent if we simulate replay of “A->B B->C C->D” or only “A->B A->B A->B A->B”. As we do not know how replay density interacts with memory trace stability, we opted to leave the current simulation as is. The corresponding paragraph and figure description now read:

      “From literature we know that replay is increased after learning and that less stable memories are replayed more often. We simulated this effect by scaling our replay density inversely with performance. However, for simplicity, in our simulation, we inserted sampled transitions from all valid transitions given by the graph structure, i.e., the following transitions were valid: However, this meant that some participants would have transitions inserted that they didn’t actually remember. To show that this would not change results, we simulated two scenarios: In the full sequence scenario, all valid graph transitions are inserted (i.e. all participant’s replay is sampled from 'A->B, B->C, C->D, D->E, E->F, F->G, G->E, E->H, H->I, I->B, B->J, J->A'). In the second scenario (memorized transitions) we only replayed transitions that the participant actually retrieved correctly during the post-resting state testing sessions (i.e. a participant’s replay would have been sampled from ‘A->B, B->C, G->E, E->H, H>I’, if those were the ones he remembered). In both scenarios, the number of events is kept constant. The results are equivalent as can be seen in Appendix A Figure 3. NB this only holds under the assumptions that classifiers are equally good at decoding each class.”

      […]

      “TDLM is insensitive towards which transitions are replayed and only sensitive to how many transitions are detected in total. Here we simulate transitions either sampled from the full graph (light orange/green) or participant-specific transitions of trials that participants correctly remembered (dark orange/green). Shaded areas denote the standard error across participants.”

      On the other hand, I was also wondering whether it is actually necessary to use the real memory performance for each participant in these simulations - couldn't similar goals (with a better/more full sampling of the space of performance) be achieved with simulated memory performance as well, taking only the MEG data from the participant?

      The decision to use real memory performance is indeed arbitrary. We could have also used randomly sampled values. However, as we wanted to understand our nullresults better we opted to use real performance to adhere as close as possible to the findings we previously reported. Using uniformly sampled memory performance would be less explanatory w.r.t to our actual results of the resting state data that are reported in the first study we report in the manuscript (Study I).

      Nevertheless, our current implementation already presents an approach that samples the entire performance range for the sub-analysis focusing on the correlation with behaviour. Here, in the section on “best-case”-scenario, we implement this such that it spans factors from 1 to 0 (i.e., a participant with 100% performance gets a replay scale factor of 0 and hence no replay simulated, and the worst performing participant with 50% performance has a replay rate multiplied by 1). We scale the amount of replay with this factor. As a correlation is invariant to linear scaling, statistically this is equivalent to stretching the performance distribution from 0 to 100%. We have added a sentence to the methods to provide further focus on this point:

      “To assess how performance might affect replay in our specific dataset, we chose to use the original participants’ performance values instead of uniformly sampling the performance space (which ranged from 50 to 100%). However, for the correlation analysis, we additionally added a “best-case” scenario, in which we scale replay from 0 to 1, an approach that is statistically equivalent to scaling values to the full space of possible performance (0 to 100%) (see Results Study II: Simulation).”

      Finally, Figure 7D shows that 70ms was used on the y-axis. Why was this the case, or is this a typo?

      Thanks, this is indeed a typo, we fixed it.

      Because this is a re-analysis of a previous dataset combined with a new simulation study on that data aimed at making recommendations about how to best employ TDLM, I think the usefulness of the paper to the field could be improved in a few places. Specifically, in the discussion/recommendation section, the authors state that "yet unknown confounders" (line 295) lead to non-random fluctuations in the simulated correlations between replay detection and performance at different time lags. Because it is a particularly strong claim that there is the potential to detect sequenceness in the baseline condition where there are no ground-truth sequences, the manuscript could benefit from a more thorough exploration of the cause(s) of this bias in addition to the speculation provided in the current version.

      We are currently working on a theoretical basis to explain these spurious sequenceness confounders in the baseline condition. Indeed, in our preliminary work, in certain contexts we can induce significant sequenceness in the absence of any replay signal during baseline. However, this work is at an early stage and we still have some conceptional problems to solve before we are confident enough with these data. We believe at present it would be premature to add these data to the current manuscript. Nevertheless, we now mention these spurious sequenceness confounders to raise awareness for the field and also add greater context to the discussion, highlighting one of the issues that we think is of importance:

      “[…] For example, if two classifiers’ probabilities oscillate at 10 Hz but at a different phase, a spurious time lag can be found reflecting this phase shift. We speculate that more complex interactions between classifiers oscillating at different phases are also conceivable.”

      In addition, to really provide that a realistic simulation is necessary (one of the primary conclusions of the paper), it would be useful to provide a comparison to a fully synthetic simulation performed on this exact task and transition structure (in addition to the recreation of the original simulation code from the TDLM methods paper).

      Thank you for this suggestion! We have now added a synthetic simulation, trying to keep as close as possible to the original simulation code in Liu et al. (2021), while also incorporating our current means of simulating the data (i.e. scaling by performance). We think this synthetic simulation greatly improves the paper and gives weight to our suggestion about the superiority of a hybrid approach. Additionally, it prompted us to look closer at patterns that are inserted in the synthetic simulation and perform a comparative analysis. We have now added the simulation to the main text, together with a methodological explanation of how we simulated the data in the methods section. We also added a discussion on the results and why we think a hybrid approach is currently superior to synthetic approach. The whole new section is too long to paste here – it is found after the main simulation section in the manuscript. We have also added another sentence to the abstract referring to this new inclusion.

      Finally, I think the authors could do further work to determine whether some of their recommendations for improving the sensitivity of TDLM pan out in the current data - for example, they could report focusing not just on the peak decoding timepoint but incorporating other moments into classifier training.

      While we do understand the desire to test further refinement to TDLM on the data directly, we intentionally do not include such analyses in the current paper. Our experience also informs us that there is an enormous branching factor of parameters when applying TDLM, with implications for significance of results in one or other direction. However, as there are currently only limited ways to know how well parameter changes actually improve the sensitivity to replay versus exacerbate potential underlying confounders that induce spurious sequenceness (e.g., we can get significant replay in the control condition with some parameter changes). To exclude such false positive findings, we opt for a relatively strict adherence to previously published approaches. Thus, in the current paper, we limit ourselves to assessing the reliability and robustness of previous approaches.

      Furthermore, while training on a later timepoint might increase sensitivity for a classifier when transferring between different modalities (e.g. visual to memory representation), this approach does not transfer well in our simulations, as the inserted patterns are from the same modality. We consider other, more bespoke studies, are better suited to improve classifier training. NB also see our recently started Kaggle challenge to tackle this problem: https://www.kaggle.com/competitions/the-imagine-decoding-challenge

      However, we have added a note about this dilemma to the improvement section. The section now includes:

      “Nevertheless, as the considerable branching factor poses a threat of increased falsepositive findings we opt to focus the current simulations on previously published pipelines and parameters. Future studies should systematically evaluate parameter choices on TDLM under different conditions, something that is beyond the remit of the current study.”

      Lastly, I would like the authors to address a point that was raised in a separate public forum by an author of the TDLM method, which is that when replays "happen during rest, they are not uniform or close." Because the simulations in this work assume regularly occurring replay events, I agree that this is an important limitation that should be incorporated into alternative simulations to ensure the lack of findings is not because of this assumption.

      The temporal distribution of replay throughout the resting state should not matter, as TDLM is invariant w.r.t to how replay events are distributed within the analysis window. Specifically, it does not matter if replay events occur in bursts or are uniformly distributed. Only the number of transitions is relevant, where they occur or if they are close to each other is not relevant to the numerical results (as long as the refractory window is kept, too short distances will lead to interactions between events and reduce sensitivity).). To emphasize this point, we have added another simulation which is shown in Appendix A.1 and Appendix A Figure 1. We have referenced it in the text and added the following paragraph in the Methods section

      Additionally, the timepoints of inserting replay within the resting state are sampled from a uniform distribution. Even though TDLM tracks reactivation events over time, at a macro-scale the algorithm is invariant to the temporal distribution. At each time step, the GLM regresses onto a future time step up to the maximum time lag of interest, yielding a predictor per lag. However, these predictors within the GLM are independently assessed, and hence, TDLM is, outside of the time lag window, relatively invariant to the temporal distribution of replay. To demonstrate our claim, we simulated uniform replay vs “bursty” replay that only occurs in some parts of the resting state, both yield equivalent sequenceness results (see Appendix A.1).

      Reviewer #3 (Public review):

      (1) I am still left wondering why other studies were able to detect replay using this method. My takeaway from this paper is that large time windows lead to high significance thresholds/required replay density, making it extremely challenging to detect replay at physiological levels during resting periods. While it is true that some previous studies applying TDLM used smaller time windows (e.g., Kern's previous paper detected replay in 1500ms windows), others, including Liu et al. (2019), successfully detected replay during a 5-minute resting period. Why do the authors believe others have nevertheless been able to detect replay during multi-minute time windows?

      (Due to similarity, we combined our responses with the first question of Reviewer 1)

      We are reluctant to make sweeping judgments in relation to previous literature as we wanted to prioritize on advancing methodology instead. The previous TDLM literature uses a diverse set of tasks and cognitive processes. As we state ourselves, it is possible that replay bursts in short time windows are well detectable by TDLM. We were intentionally cautious to directly critique previous studies without detailed re-analysis of their work and wanted to leave such a conclusion up to the reader. However, we realize that such a “thought-starter” might be warranted and improve the paper. Therefore, we have added the following paragraph to the discussion about “improving TDLMs sensitivity”:

      “Finally, what do our simulations imply for the broader MEG replay literature? Our implementation successfully detects replay when boundary conditions are met, as shown in the simulation. But sensitivity depends critically on high fidelity between the analysis window and the amount of replay events. A systematic evaluation of these conditions across prior studies is beyond the scope of this paper, so we do not want to adjudicate earlier findings and leave this assessment up to the reader. Instead, we delineate the boundary conditions and urge future work to conduct power analyses where possible and include simulations that approximate realistic experimental conditions.”

      For example, some studies using TDLM report evidence of sequenceness as a contrast between evidence of forwards (f) versus backwards (b) sequenceness; sequenceness was defined as ZfΔt - ZbΔt (where Z refers to the sequence alignment coefficient for a transition matrix at a specific time lag). This use case is not discussed in the present paper, despite its prevalence in the literature. If the same logic were applied to the data in this study, would significant sequenceness have been uncovered? Whether it would or not, I believe this point is important for understanding methodological differences between this paper and others.

      This approach was first introduced as part of a TDLM-predecessor that utilized crosscorrelations (Kurth-Nelson 2016), where this step is a necessity to extract any sequenceness signal at all by subtracting signals that are present in both (akin to an EEG reference). However, its validity is less clear when fwd and bkw are estimated separately, as is in the GLM case. The rationale behind subtracting here is the same as for autocorrelations: there are oscillatory confounds present in the data that introduce spurious sequenceness in both directions alike, i.e. at the same time lag, that can simply be removed by subtracting. However, this assumption only holds if the sole confounder is auto-correlations caused by a global signal that oscillates at all sensors at the same phase. In our own experience, and mentioned in the discussion, we do not think this assumption holds. Arguably, there are more complex interactions at play that cannot be removed by such a subtraction such as an increase in false positives if confounders are in an opposite direction at a specific time lag. This assumption-violation can be seen in our baseline condition, where other spurious sequenceness diverges in opposite directions for some time lags (e.g. at ~90 ms where forward sequenceness is negative and backward sequenceness is positive). We reasoned that oscillatory confounds are more stable when comparing pre vs post for the same direction than comparing within session between forward minus backward.

      Finally, we note issues introduced by the various ways that sequenceness has been analysed in previous papers: normalization of sequenceness (z-scoring across time lags or across participants or not at all), normalization of probabilities (taking raw decision scores, z-scoring, soft-max, dividing by mean, subtracting mean), taking a windowed approach and summing sequenceness scores, not to mention the various classifier choices that can be made, and all of this can be applied before subtracting conditions from each other or before subtraction. In our experience there is insufficient regard to control for multiple comparison when running all these analyses risking selectivity in reporting.

      Nevertheless, subtracting forward from backward replay is probably as valid as post minus pre. Therefore, we have added fwd-bkw plots to the supplement and explained some of the reasoning for not reporting them in the main text in the figure label. The figure label and reference now read:

      “Finally, we report forward minus backward sequenceness and our motivation for using an across-session post-pre comparison instead of within-session forwardbackward in Supplement Figure 10.”

      […]

      “Forward minus backward sequenceness within each resting state session. Previous papers often report subtraction of backward from forward sequenceness (fwd-bkw) as a means to remove oscillatory confounds that impact both sequenceness directions in synchrony. While required in early cross-correlation approaches (KurthNelson et al., 2016), its validity in GLM-based frameworks depends on an assumption that confounds are global and in-phase across sensors. We observed this assumption is violated in our baseline data, where spurious sequenceness occasionally diverges in opposite directions at specific time lags (e.g., ~90 ms). In such instances, subtraction would increase the false-positive rate rather than suppress noise. In Figure 3B, we prioritized the comparison of pre-task versus post-task sequenceness within the same direction, as oscillatory confounds appeared more stable across time within a single direction, as opposed to across directions within a single session. However, we consider both approaches are valid. We now provide the fwd-bkw plots for completeness and comparison with previous literature. A) forward minus backwards sequenceness for Control (left) and Post-Learning resting-state (right). B) T-value distribution of the sign-flip permutation test for Control (left) and Post-Learning resting-state (right)”

      (2) Relatedly, while the authors note that smaller time windows are necessary for TDLM to succeed, a more precise description of the appropriate window size would greatly improve the utility of this paper. As it stands, the discussion feels incomplete without this information, as providing explicit guidance on optimal window sizes would help future researchers apply TDLM effectively. Under what window size range can physiological levels of replay actually be detected using TDLM? Or, is there some scaling factor that should be considered, in terms of window size and significance threshold/replay density? If the authors are unable to provide a concrete recommendation, they could add information about time windows used in previous studies (perhaps, is 1500ms as used in their previous paper a good recommendation?).

      We currently do not have an empirical estimate of which window sizes are appropriate. While we used 1500ms in our previous paper, this was solely given by the experiment design which had a 1.5s wait period before the next stimulus. Our recommendation for best guidance on this matter would be to investigate related intracranial literature for SWR rate increases under similar experimental conditions. We have added the following paragraph to the discussion:

      “At this stage we cannot offer a general recommendation for window sizes as they are likely to depend on details of the research paradigm. However, intracranial recordings can be used as proxy to estimate the duration of replay bursts, for example as reported in (Norman et al., 2019) where increased SWRs were seen up to 1500 ms after retrieval cue onset”

      (3) In their simulation, the authors define a replay event as a single transition from one item to another (example: A to B). However, in rodents, replay often traverses more than a single transition (example: A to B to C, even to D and E). Observing multistep sequences increases confidence that true replay is present. How does sequence length impact the authors' conclusions? Similarly, can the authors comment on how the length of the inserted events impacts TDLM sensitivity, if at all?

      Good point! So far, most papers do not seem to include multi-step TDLM and in our experience rightfully, as it is conceptionally difficult to define clear significance thresholds while keeping in mind that shorter sub-sequences are contained within a longer sequence (e.g. ABC contains both AB and BC and a longer dependency of AC) that renders it difficult to define the correct way to create a null distribution for the permutation test. Therefore, we tried to stay as close as possible to previous approaches and only looked for single-step transitions. Nevertheless, we have added an analysis to the supplement comparing how TDLM behaves if we simulate A->B->C or A->B and separate B->C. It shows that TDLM is only sensitive to the number of transitions present in the data, and it does not matter if they are chained or chunked. The segment reads:

      “We intentionally designed our study to encourage replay of triplets. However, this begs the question as to whether it matters if triplets or individual chunks of a sequence are replayed at different time points? Here, we simulated two scenarios. In one, we inserted replay of single transitions alone with a refractory period, e.g. A->B and separate B->C transitions. In a second scenario, we simulate replay of chained triplets, e.g. A->B->C, with a distance of 80 milliseconds each. Importantly, we kept the number of transitions constant (i.e., A->B, … B->C and where A->B->C would both have 2 transitions. This creates a context wherein a four-minute resting state would have ~100 events of A->B->C inserted and ~200 events of A->B or B->C, such that in both cases this results in the same number of single step transitions. We found both are equivalent, with TDLM agnostic to the length of sequence trains, i.e., it does not matter if replay is chunked or chained under the assumption that the number of transitions remains fixed, as can be seen in Appendix A Figure 2”

      And the reference Figure description reads:

      “TDLM is invariant to the length of sequence replay trains under an assumption that the number of target transitions (e.g. single steps) is fixed. We simulated replay either as two temporally separate A->B, B->C events (light orange/green) or as a single A>B->C event (dark orange/green), both yielding equivalent sequenceness. Shaded areas denote the standard error across participants”

      For example, regarding sequence length, is it possible that TDLM would detect multiple parts of a longer sequence independently, meaning that the high density needed to detect replay is actually not quite so dense? (example: if 20 four-step sequences (A to B to C to D to E) were sampled by TDLM such that it recorded each transition separately, that would lead to a density of 80 events/min).

      Indeed, this is an interesting proposal. We intentionally kept our simulation close to the way previous simulations were set-up (i.e. Liu & Dolan et al 2021, Liu & Mattar 2021) by simulating one-step transitions and simulated them such that there is no overlap between separate events (e.g. by defining a refractory period). If the duration of replay is increased then we would also need to increase the length of the refractory period, resulting in a reduced upper limit of how much replay can occur in a 1-minute time window. This in turn would approximate roughly the same number of transitions that can be inserted into the resting state and, as detailed above, would yield the same results. Nevertheless, as we chose to use replay density and not transition density as a marker, the density would be reduced, even if the number of transitions stay the same. We have added an analysis using multi-step replay to the supplement and discuss its implications and caveats. In the main discussion we have added the following segment:

      “Similarly, in our simulation, for simplicity and to keep consistency with previousstimulations, we restricted replay events to span two reactivation events. While the characteristics of replay as measured by TDLM are unknown, it is conceivable that several steps can be replayed within one replay event. We show that the vanilla version of TDLM is fundamentally sensitive to the number of single-step transitions alone, and disregards if these are replayed chained or chunked (Appendix A.2 and Appendix A Figure 2). Nevertheless, if the number of reactivation events chained within a replay event increases, TDLMs sensitivity is increased relative to the replay density and thresholds are reached earlier (see Appendix A Figure 4). See Appendix A.4 for a simulation of multi-step replay events and our discussion of the caveats.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Please label the various significance thresholds in the legend of Figure 3.

      We have labelled all the thresholds in the figure legends.

      Reviewer #2 (Recommendations for the authors):

      I think that some of the clarity is hampered because there is a bit too much reliance on explanations from the previous paper using this task, which hampers clarity in the paper. For example, Figure 1 is not particularly useful for understanding the study in its current form; I found myself relying almost exclusively on Supplementary Figure 1 (which is from the previous paper). I'd recommend presenting some version of SF1 in the main text instead. Another example of this overreliance on the previous paper is that, as far as I can tell, the present paper never explicitly states which transitions are being tested in TDLM. In the prior work, it states "all allowable graph transitions", and so I assumed this was the same here, but the paper should standalone without having to go back to the other study. I'd recommend that the authors revise the paper in these and other places where the previous paper is mentioned.

      Thanks for raising this point! We were uncertain ourselves how to deal with the overlap in content and did not want to bloat the paper or plagiarize ourselves too much. On the advice of the referee have implemented the following to improve the manuscript and reduce a reliance on the previous paper:

      Supplement Figure 1 is indeed crucial to understanding the experiment. We have moved it to the methods section under Methods: Procedure

      Added more stimulus description to the Methods: Localizer section

      Included more details about the localizer and graph learning that were missing before

      We have added the note about which transitions we were looking for in the Methods section. Additionally, we have added this information to the Results section of Study 1.

      There are also a few typos I noticed:

      (1) Line 73: "during in the context of."

      (2) Line 287: " to exploring the."

      We fixed the typos.

      Reviewer #3 (Recommendations for the authors):

      (1) Why did the authors choose an 80ms state-to-state time lag for their simulation? I believe they should make the reason for this decision clear in the main text.

      Indeed, this point was also raised by the other reviewer. We have added a sentence to the main text about the rationale behind this decision:

      “We chose this timepoint (80 millisecond state-to-state lag) as its sequenceness value was close to zero in the baseline condition as well as being distant to the observed alpha rhythms of the participants (which varied between ~9-11 Hz). Additionally, we subtracted the mean sequenceness value of the baseline at 80 millisecond lag such that any simulation effects would, on average, start at zero sequenceness.“

      Additionally, we have added some further explanation to the Methods section.

      “This time lag (80 msec) was chosen in order to isolate precisely an effect of the experimentally inserted sequenceness. Thus, we chose a lag at which the mean baseline sequenceness was close to zero and where the correlation with behaviour was low. Additionally, we subtracted the mean sequenceness value (at 80 milliseconds) at baseline from the specific lag recorded for each participant, such that simulation effects would be initialized at zero sequenceness on average enabling any effects to be attributed purely to inserted replay. Additionally, we excluded time lags too close to the alpha rhythms of participants (which varied between ~9-11 Hz) or lags which would have a harmonic with the rhythm.“

      (2) Line 168: Can the authors define what these conservative and liberal criteria are in the text?

      We have added definitions of the criteria in the text. The text now reads:

      “[..] significance thresholds (conservative, i.e. the maximum sequenceness across all permutations and timepoints or liberal criteria, i.e. the 95% percentile of aforementioned sequenceness).”

      (3) Line 478: "calculate" instead of "calculated".

      (4) Figure 7 D: y-axis is labeled "70 ms" I believe it should be labeled 80 ms.

      Thanks, we fixed the two typos.

      (5) With replay defined as sequential reactivation at a compressed temporal timescale, many of the iEEG citations (lines 54-55) do not demonstrate replay (they show stimulus reinstatement or ripple activity, but not sequential replay). Replay studies in humans using intracranial methods have been mostly limited to those measuring single-unit activity, a good example being Vaz et al., 2020 (https://www.science.org/doi/10.1126/science.aba0672).

      We agree that, under a strict definition articulated by Genzel et al. that defines replay as sequential reactivation, many prior human iEEG studies are better described as stimulus reinstatement or ripple-related activity rather than true sequence replay. We have revised the text accordingly and now highlight the few intracranial microelectrode studies that demonstrate replay of firing sequences at the cellular/ensemble level in humans (Eichenlaub et al., 2020; Vaz et al., 2020), distinguishing these from macro-scale iEEG work providing indirect evidence alone.

      The revised paragraph now reads:

      “Replay has been shown using cellular recordings across a variety of mammalian model organisms (Hoffman & McNaughton, 2002; Lee & Wilson, 2002; Pavlides & Winson, 1989). Replay studies in humans using intracranial recordings are few, but include work demonstrating compressed replay of firing-pattern sequences in motor cortex during rest (Eichenlaub et al., 2020) as well as single-unit replay of trialspecific cortical spiking sequences during episodic retrieval (Vaz et al., 2020). By contrast, most iEEG studies report stimulus-specific reinstatement or ripple-locked activity changes without explicit demonstration of temporally compressed sequential replay (Axmacher et al., 2008; Staresina et al., 2015). As these methods are only applied under restricted clinical circumstances, such as during pre-operative neurosurgical assessments, this limits opportunities to investigate human replay. Therefore, this gives urgency to efforts aimed at developing novel methods to investigate human replay non-invasively.”

      (6) The expectations about replay frequency are grounded in literature on hippocampal replay sequences. However, MEG captures signals from across the entire brain, and the hippocampal contribution is likely relatively weak compared to all other signals. This raises an important question: is TDLM genuinely unable to detect replay at physiological (i.e., hippocampal) levels, or is it instead detecting a different form of sequential reactivation - possibly involving cortex or other regions - that may occur more frequently? More broadly, when we have evidence of replay from TDLM, do we believe it is the same thing as replay of CA1 place cell spiking sequences, as detected in rodents? Commenting on this distinction would help further develop theories of replay and what TDLM is measuring.

      This is indeed an important point that has garnered relatively little attention. While there is some evidence of a relation to hippocampal replay in form of high-frequency power increase in the hippocampus, ultimately it is not possible to know without intracranial recordings, as signal strength from those regions is rather poor in MEG.

      We have added the following segment to the manuscript that discusses these issues:

      “However, while we are using indices of SWRs as a proxy for replay density estimation, the relationship between hippocampal replay and replay detected by TDLM remains uncertain. While current decoding approaches measure replay-like phenomena on cortical sites, previous papers have reported a power increase in hippocampal areas coinciding with replay episodes as detected by TDLM. Nevertheless, it is conceivable that cortical replay found by TDLM could occur independently of hippocampal replay and SWRs and be generated by different mechanisms. Some TDLM-studies find a replay state-to-state time lag of above 100 ms, much slower than e.g. previously reported place cell replay. Future studies should employ simultaneous intracranial and cortical surface recordings to establish the relationship between hippocampal replay and replay found by TDLM.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zeng et al. have investigated the impact of inhibiting lactate dehydrogenase (LDH) on glycolysis and the tricarboxylic acid cycle. LDH is the terminal enzyme of aerobic glycolysis or fermentation that converts pyruvate and NADH to lactate and NAD+ and is essential for the fermentation pathway as it recycles NAD+ needed by upstream glyceraldehyde-3-phosphate dehydrogenase. As the authors point out in the introduction, multiple published reports have shown that inhibition of LDH in cancer cells typically leads to a switch from fermentative ATP production to respiratory ATP production (i.e., glucose uptake and lactate secretion are decreased, and oxygen consumption is increased). The presumed logic of this metabolic rearrangement is that when glycolytic ATP production is inhibited due to LDH inhibition, the cell switches to producing more ATP using respiration. This observation is similar to the well-established Crabtree and Pasteur effects, where cells switch between fermentation and respiration due to the availability of glucose and oxygen. Unexpectedly, the authors observed that inhibition of LDH led to inhibition of respiration and not activation as previously observed. The authors perform rigorous measurements of glycolysis and TCA cycle activity, demonstrating that under their experimental conditions, respiration is indeed inhibited. Given the large body of work reporting the opposite result, it is difficult to reconcile the reasons for the discrepancy. In this reviewer's opinion, a reason for the discrepancy may be that the authors performed their measurements 6 hours after inhibiting LDH. Six hours is a very long time for assessing the direct impact of a perturbation on metabolic pathway activity, which is regulated on a timescale of seconds to minutes. The observed effects are likely the result of a combination of many downstream responses that happen within 6 hours of inhibiting LDH that causes a large decrease in ATP production, inhibition of cell proliferation, and likely a range of stress responses, including gene expression changes.

      Strengths:

      The regulation of metabolic pathways is incompletely understood, and more research is needed, such as the one conducted here. The authors performed an impressive set of measurements of metabolite levels in response to inhibition of LDH using a combination of rigorous approaches.

      Weaknesses:

      Glycolysis, TCA cycle, and respiration are regulated on a timescale of seconds to minutes. The main weakness of this study is the long drug treatment time of 6 hours, which was chosen for all the experiments. In this reviewer's opinion, if the goal was to investigate the direct impact of LDH inhibition on glycolysis and the TCA cycle, most of the experiments should have been performed immediately after or within minutes of LDH inhibition. After 6 hours of inhibiting LDH and ATP production, cells undergo a whole range of responses, and most of the observed effects are likely indirect due to the many downstream effects of LDH and ATP production inhibition, such as decreased cell proliferation, decreased energy demand, activation of stress response pathways, etc.

      We thank reviewer for the careful reading of our manuscript, the accurate summary of the prevailing model, and the positive assessment of the rigor of our measurements. We agree that much prior literature reports increased oxygen consumption following LDH inhibition, and we recognize that our finding—coordinated suppression of glycolysis, the TCA cycle, and OXPHOS—differs from this prevailing interpretation. We address below the reviewer’s main concern regarding the 6-hour time point and clarify the conceptual scope of our study.

      (1) Scope: steady-state metabolic regulation versus immediate transient effects

      The reviewer raises an important point that many metabolic perturbations can trigger rapid, transient responses within seconds to minutes, whereas our measurements were performed after sustained LDH inhibition. We agree that very early time points would be required if the primary goal were to isolate the most immediate, proximal consequence of LDH inhibition before downstream propagation. However, the objective of our study is different: we aim to characterize the metabolic steady state re-established after sustained inhibition of LDH activity, because this adapted steady state is more relevant for understanding long-term metabolic consequences and therapeutic outcomes of LDH inhibition in cancer cells.

      (2) Genetic LDHA/LDHB knockout: comparison of two steady states

      A related point applies to the LDHA/LDHB knockout models. We fully agree that the knockout process necessarily involves a temporal perturbation during cell line generation and adaptation. Nevertheless, the experimental comparison in our study is explicitly between two steady states: the baseline steady state of control cells and the steady state achieved after stable genetic disruption of LDHA or LDHB. The observation that LDHA or LDHB knockout alone had minimal effects on glycolysis and respiration indicates that partial reduction of LDH activity can be compensated in a steady-state manner, consistent with the exceptionally high catalytic capacity of LDH in cancer cells relative to upstream rate-limiting enzymes.

      (3) LDH-activity-dependent quantitative relationships support stable metabolic states

      Importantly, our conclusions do not rely on a single inhibitor condition at a single time point. Rather, we established quantitative steady-state relationships between residual LDH activity and pathway behavior across a wide range of LDH inhibition. These LDH-activity-dependent data strongly support that the system resides in stable metabolic states at different degrees of LDH activity, rather than reflecting non-specific collapse due to prolonged stress.

      Specifically, we observed that when LDH activity was reduced from 100% to approximately ~9% (e.g., by genetic perturbation and partial pharmacologic inhibition), glucose consumption and lactate production remained essentially unchanged, indicating maintenance of a steady-state glycolytic flux despite substantial LDH inhibition. Only when LDH activity was further reduced below this threshold did glycolytic flux decrease in a graded manner, consistent with a nonlinear control structure (Figure 8 A & B)).

      Likewise, the isotope tracing results showed distinct LDH-activity-dependent transitions in TCA cycle labeling patterns. Over the range in which LDH activity decreased from 100% to ~9%, the [<sup>13</sup>C<sub>6</sub>]glucose-derived labeling pattern of citrate remained largely unchanged, whereas deeper inhibition led to a decrease in m2 citrate with a compensatory rise in higher-order citrate isotopologues, consistent with altered flux entry versus cycling/retention in the TCA cycle (Figure 8C). Similarly, [<sup>13</sup>C<sub>5</sub>]glutamine tracing revealed that deeper LDH inhibition reduced the direct m5 contribution, accompanied by corresponding shifts in other isotopologues (Figure 8D). These graded, quantitative transitions—rather than an abrupt global failure—support the interpretation of distinct metabolic steady states across LDH activity levels, linking LDH inhibition to changes in both glycolysis and mitochondrial metabolism.

      (4) Reconciling discrepancies with prior studies

      We agree that multiple prior studies have reported increased oxygen consumption or enhanced oxidative metabolism following LDH inhibition in cancer cells. However, we note that this prevailing notion often persists because LDH inhibition is frequently discussed by analogy to the classical Pasteur and Crabtree effects, in which cells toggle between fermentation and respiration depending on oxygen and glucose availability. We believe this analogy can be misleading.

      In the Pasteur effect, the metabolic shift is primarily driven by oxygen limitation, i.e., restriction of the terminal electron acceptor for the mitochondrial electron transport chain, which enforces reliance on fermentation. In the Crabtree effect, high glucose availability suppresses respiration through regulatory mechanisms while glycolysis is strongly activated. Both phenomena are fundamentally controlled by oxygen availability and respiratory capacity, rather than by inhibition of a specific cytosolic enzyme.

      By contrast, LDH inhibition is mechanistically distinct: it directly perturbs cytosolic redox recycling by limiting NADH-to-NAD<sup>+</sup> regeneration and can therefore constrain upstream glycolytic flux (particularly at GAPDH) and reshape pathway thermodynamics. Under conditions where LDH inhibition sufficiently limits effective NAD<sup>+</sup> availability and reduces glycolytic flux into pyruvate, the downstream consequence is reduced carbon input into the TCA cycle and suppressed OXPHOS—consistent with our experimental measurements. We therefore suggest that divergent outcomes reported across studies likely reflect differences in residual LDH activity, cell-type–specific metabolic wiring, and the extent to which glycolytic flux remains sustained versus becoming redox-limited upstream, rather than a universal Pasteur/Crabtree-like “switch” from fermentation to respiration. Accordingly, interpreting LDH inhibition as a Pasteur/Crabtree-like toggle may oversimplify the biochemical consequences of disrupting cytosolic NAD<sup>+</sup> regeneration.

      We have revised the Discussion to clarify this conceptual distinction and to avoid relying on comparisons that are not mechanistically equivalent to LDH inhibition.

      Reviewer #2 (Public Review):

      Summary:

      Zeng et al. investigated the role of LDH in determining the metabolic fate of pyruvate in HeLa and 4T1 cells. To do this, three broad perturbations were applied: knockout of two LDH isoforms (LDH-A and LDH-B), titration with a non-competitive LDH inhibitor (GNE-140), and exposure to either normoxic (21% O2) or hypoxic (1% O2) conditions. They show that knockout of either LDH isoform alone, though reducing both protein level and enzyme activity, has virtually no effect on either the incorporation of a stable 13C-label from a 13C6-glucose into any glycolytic or TCA cycle intermediate, nor on the measured intracellular concentrations of any glycolytic intermediate (Figure 2). The only apparent exception to this was the NADH/NAD+ ratio, measured as the ratio of F420/F480 emitted from a fluorescent tag (SoNar).

      The addition of a chemical inhibitor, on the other hand, did lead to changes in glycolytic flux, the concentrations of glycolytic intermediates, and in the NADH/NAD+ ratio (Figure 3). Notably, this was most evident in the LDH-B-knockout, in agreement with the increased sensitivity of LDH-A to GNE-140 (Figure 2). In the LDH-B-knockout, increasing concentrations of GNE-140 increased the NADH/NAD+ ratio, reduced glucose uptake, and lactate production, and led to an accumulation of glycolytic intermediates immediately upstream of GAPDH (GA3P, DHAP, and FBP) and a decrease in the product of GAPDH (3PG). They continue to show that this effect is even stronger in cells exposed to hypoxic conditions (Figure 4). They propose that a shift to thermodynamic unfavourability, initiated by an increased NADH/NAD+ ratio inhibiting GAPDH explains the cascade, calculating ΔG values that become progressively more endergonic at increasing inhibitor concentrations.

      Then - in two separate experiments - the authors track the incorporation of 13C into the intermediates of the TCA cycle from a 13C6-glucose and a 13C5-glutamine. They use the proportion of labelled intermediates as a proxy for how much pyruvate enters the TCA cycle (Figure 5). They conclude that the inhibition of LDH decreases fermentation, but also the TCA cycle and OXPHOS flux - and hence the flux of pyruvate to all of those pathways. Finally, they characterise the production of ATP from respiratory or fermentative routes, the concentration of a number of cofactors (ATP, ADP, AMP, NAD(P)H, NAD(P)+, and GSH/GSSG), the cell count, and cell viability under four conditions: with and without the highest inhibitor concentration, and at norm- and hypoxia. From this, they conclude that the inhibition of LDH inhibits the glycolysis, the TCA cycle, and OXPHOS simultaneously (Figure 7).

      Strengths:

      The authors present an impressively detailed set of measurements under a variety of conditions. It is clear that a huge effort was made to characterise the steady-state properties (metabolite concentrations, fluxes) as well as the partitioning of pyruvate between fermentation as opposed to the TCA cycle and OXPHOS.

      A couple of intermediary conclusions are well supported, with the hypothesis underlying the next measurement clearly following. For instance, the authors refer to literature reports that LDH activity is highly redundant in cancer cells (lines 108 - 144). They prove this point convincingly in Figure 1, showing that both the A- and B-isoforms of LDH can be knocked out without any noticeable changes in specific glucose consumption or lactate production flux, or, for that matter, in the rate at which any of the pathway intermediates are produced. Pyruvate incorporation into the TCA cycle and the oxygen consumption rate are also shown to be unaffected.

      They checked the specificity of the inhibitor and found good agreement between the inhibitory capacity of GNE-140 on the two isoforms of LDH and the glycolytic flux (lines 229 - 243). The authors also provide a logical interpretation of the first couple of consequences following LDH inhibition: an increased NADH/NAD+ ratio leading to the inhibition of GAPDH, causing upstream accumulations and downstream metabolite decreases (lines 348 - 355).

      Weaknesses:

      Despite the inarguable comprehensiveness of the data set, a number of conceptual shortcomings afflict the manuscript. First and foremost, reasoning is often not pursued to a logical conclusion. For instance, the accumulation of intermediates upstream of GAPDH is proffered as an explanation for the decreased flux through glycolysis. However, in Figure 3C it is clear that there is no accumulation of the intermediates upstream of PFK. It is unclear, therefore, how this traffic jam is propagated back to a decrease in glucose uptake. A possible explanation might lie with hexokinase and the decrease in ATP (and constant ADP) demonstrated in Figure 6B, but this link is not made.

      We appreciate the reviewer's critical comment. In Figure 3C, there is no accumulation of F6P or G6P, which are upstream of PFK1. This is because the PFK1-catalyzed reaction sets a significant thermodynamic barrier. Even with treatment using 30 μM GNE-140, the ∆G<sub>PFK1</sub> (Gibbs free energy of the PFK1-catalyzed reaction) remains -9.455 kJ/mol (Figure 3D), indicating that the reaction is still far from thermodynamic equilibrium, thereby preventing the accumulation of F6P and G6P.

      We agree with the reviewer that hexokinase inhibition may play a role, this requires further investigation.

      The obvious link between the NADH/NAD+ ratio and pyruvate dehydrogenase (PDH) is also never addressed, a mechanism that might explain how the pyruvate incorporation into the TCA cycle is impaired by the inhibition of LDH (the observation with which they start their discussion, lines 511 - 514).

      We agree with the reviewer’s comment. In this study, we did not explore how the inhibition of LDH affects pyruvate incorporation into the TCA cycle. As this mechanism was not investigated, we have titled the study:

      "Elucidating the Kinetic and Thermodynamic Insights into the Regulation of Glycolysis by Lactate Dehydrogenase and Its Impact on the Tricarboxylic Acid Cycle and Oxidative Phosphorylation in Cancer Cells."

      It was furthermore puzzling how the ΔG, calculated with intracellular metabolite concentrations (Figures 3 and 4) could be endergonic (positive) for PGAM at all conditions (also normoxic and without inhibitor). This would mean that under the conditions assayed, glycolysis would never flow completely forward. How any lactate or pyruvate is produced from glucose, is then unexplained.

      This issue also concerned me during the study. However, given the high reproducibility of the data, we consider it is true, but requires explanation. The PGAM-catalyzed reaction is tightly linked to both upstream and downstream reactions in the glycolytic pathway. In glycolysis, three key reactions catalyzed by HK2, PFK1, and PK are highly exergonic, providing the driving force for the conversion of glucose to pyruvate. The other reactions, including the one catalyzed by PGAM, operate near thermodynamic equilibrium and primarily serve to equilibrate glycolytic intermediates rather than control the overall direction of glycolysis, as previously described by us (J Biol Chem. 2024 Aug8;300(9):107648).

      The endergonic nature of the PGAM-catalyzed reaction does not prevent it from proceeding in the forward direction. Instead, the directionality of the pathway is dictated by the exergonic reaction of PFK1 upstream, which pushes the flux forward, and by PK downstream, which pulls the flux through the pathway. The combined effects of PFK1 and PK may account for the observed endergonic state of the PGAM reaction.

      However, if the PGAM-catalyzed reaction were isolated from the glycolytic pathway, it would tend toward equilibrium and never surpass it, as there would be no driving force to move the reaction forward.

      Finally, the interpretation of the label incorporation data is rather unconvincing. The authors observe an increasing labelled fraction of TCA cycle intermediates as a function of increasing inhibitor concentration. Strangely, they conclude that less labelled pyruvate enters the TCA cycle while simultaneously less labelled intermediates exit the TCA cycle pool, leading to increased labelling of this pool. The reasoning that they present for this (decreased m2 fraction as a function of DHE-140 concentration) is by no means a consistent or striking feature of their titration data and comes across as rather unconvincing. Yet they treat this anomaly as resolved in the discussion that follows.

      GNE-140 treatment increased the labeling of TCA cycle intermediates by [<sup>13</sup>C<sub>6</sub>]glucose but decreased the OXPHOS rate, we consider the conflicting results as an 'anomaly' that warrants further explanation. To address this, we analyzed the labeling pattern of TCA cycle intermediates using both [<sup>13</sup>C<sub>6</sub>]glucose and [<sup>13</sup>C<sub>5</sub>]glutamine. Tracing the incorporation of glucose- and glutamine-derived carbons into the TCA cycle suggests that LDH inhibition leads to a reduced flux of glucose-derived acetyl-CoA into the TCA cycle, coupled with a decreased flux of glutamine-derived α-KG, and a reduction in the efflux of intermediates from the cycle. These results align with theoretical predictions. Under any condition, the reactions that distribute TCA cycle intermediates to other pathways must be balanced by those that replenish them. In the GNE-140 treatment group, the entry of glutamine-derived carbon into the TCA cycle was reduced, implying that glucose-derived carbon (as acetyl-CoA) entering the TCA cycle must also be reduced, or vice versa.

      This step-by-step investigation is detailed under the subheading "The Effect of LDHB KO and GNE-140 on the Contribution of Glucose Carbon to the TCA Cycle and OXPHOS" in the Results section in the manuscript.

      In the Discussion, we emphasize that caution should be exercised when interpreting isotope tracing data. In this study, treatment of cells with GNE-140 led to an increase labeling percentage of TCA cycle intermediates by [<sup>13</sup>C<sub>6</sub>]glucose (Figure 5A-E). However, this does not necessarily imply an increase in glucose carbon flux into TCA cycle; rather, it indicates a reduction in both the flux of glucose carbon into TCA cycle and the flux of intermediates leaving TCA cycle. When interpreting the data, multiple factors must be considered, including the carbon-13 labeling pattern of the intermediates (m1, m2, m3, ---) (Figure 5G-K), replenishment of intermediates by glutamine (Figure 5M-V), and mitochondrial oxygen consumption rate (Figure 5W). All these factors should be taken into account to derive a proper interpretation of the data.

      Reviewer #3 (Public Review):

      Hu et al in their manuscript attempt to interrogate the interplay between glycolysis, TCA activity, and OXPHOS using LDHA/B knockouts as well as LDH-specific inhibitors. Before I discuss the specifics, I have a few issues with the overall manuscript. First of all, based on numerous previous studies it is well established that glycolysis inhibition or forcing pyruvate into the TCA cycle (studies with PDKs inhibitors) leads to upregulation of TCA cycle activity, and OXPHOS, activation of glutaminolysis, etc (in this work authors claim that lowered glycolysis leads to lower levels of TCA activity/OXPHOS). The authors in the current work completely ignore recent studies that suggest that lactate itself is an important signaling metabolite that can modulate metabolism (actual mechanistic insights were recently presented by at least two groups (Thompson, Chouchani labs). In addition, extensive effort was dedicated to understanding the crosstalk between glycolysis/TCA cycle/OXPHOS using metabolic models (Titov, Rabinowitz labs). I have several comments on how experiments were performed. In the Methods section, it is stated that both HeLa and 4T1 cells were grown in RPMI-1640 medium with regular serum - but under these conditions, pyruvate is certainly present in the medium - this can easily complicate/invalidate some findings presented in this manuscript. In LDH enzymatic assays as described with cell homogenates controls were not explained or presented (a lot of enzymes in the homogenate can react with NADH!). One of the major issues I have is that glycolytic intermediates were measured in multiple enzyme-coupled assays. Although one might think it is a good approach to have quantitative numbers for each metabolite, the way it was done is that cell homogenates (potentially with still traces of activity of multiple glycolytic enzymes) were incubated with various combinations of the SAME enzymes and substrates they were supposed to measure as a part of the enzyme-based cycling reaction. I would prefer to see a comparison between numbers obtained in enzyme-based assays with GC-MS/LC-MS experiments (using calibration curves for respective metabolites, of course). Correct measurements of these metabolites are crucial especially when thermodynamic parameters for respective reactions are calculated. Concentrations of multiple graphs (Figure 1g etc.) are in "mM", I do not think that this is correct.

      We thank the reviewer’s comment and the following are clarification of the conceptual framework, the quantitative methodology, and the experimental basis supporting our conclusions.

      (1) “It is well established that glycolysis inhibition or forcing pyruvate into the TCA cycle… leads to upregulation of TCA/OXPHOS… (authors claim lowered glycolysis leads to lower TCA/OXPHOS)”

      This framing is not accurate in the context of our study. PDK inhibition and LDH inhibition are fundamentally different perturbations. PDK inhibition directly promotes mitochondrial pyruvate oxidation by enabling PDH flux, whereas LDH inhibition primarily perturbs cytosolic redox balance (free NADH/NAD<sup>+</sup>) and thereby constrains upstream glycolytic reactions, particularly the GAPDH step. Therefore, the metabolic outcomes of these interventions are not expected to be identical and should not be treated as interchangeable.

      Importantly, we do not “ignore” prior studies proposing increased OXPHOS after LDH inhibition; we explicitly cite and summarize this prevailing interpretation in the Introduction. Our study was motivated precisely because this interpretation does not resolve key quantitative inconsistencies, including (i) the large mismatch between glycolytic flux and mitochondrial oxidative capacity, and (ii) the exceptionally high catalytic capacity of LDH relative to upstream rate-limiting glycolytic enzymes. These constraints raise a mechanistic question: how does LDH inhibition actually suppress glycolytic flux in intact cancer cells, and what are the consequences for TCA cycle and OXPHOS?

      Our central contribution is the identification of a biochemical mechanism supported by integrated measurements of fluxes, metabolite concentrations, redox state, and reaction thermodynamics: LDH inhibition increases free NADH/NAD<sup>+</sup>, decreases free NAD<sup>+</sup> availability, inhibits GAPDH, drives accumulation/depletion patterns in glycolytic intermediates, shifts Gibbs free energies of near-equilibrium reactions (PFK1–PGAM segment), suppresses pyruvate production, and consequently reduces carbon input into TCA cycle and OXPHOS. These analyses are not provided by most prior work and directly address the mechanistic gap.

      (2) Lactate signaling (Thompson/Chouchani) and metabolic modeling (Titov/Rabinowitz)

      These research directions are valuable, but they address questions that are different from the one investigated here. Our manuscript focuses on steady-state biochemical control of metabolic flux by LDH inhibition through redox-linked kinetics and pathway thermodynamics.

      (3) Pyruvate in RPMI

      Pyruvate in standard medium does not invalidate our conclusions. All experimental comparisons were performed under identical conditions across groups, and the major conclusions rely on orthogonal measurements including glycolytic flux (glucose consumption/lactate production), OCR profiling, and isotope tracing with [<sup>13</sup>C<sub>6</sub>]glucose and [<sup>13</sup>C<sub>5</sub>] glutamine, which directly quantify carbon entry into lactate and TCA cycle intermediates. These tracer-based results are not confounded by unlabeled extracellular pyruvate in a way that would reverse the mechanistic conclusions.

      (4) LDH activity assay in homogenates and “many enzymes can react with NADH”

      This concern is overstated. In the LDH assay, substrates are pyruvate + NADH, and the measured signal reflects NADH oxidation coupled to pyruvate reduction. In cell lysates, LDH is uniquely abundant and catalytically efficient for this reaction pair, and the inhibitor-response behavior matches the known LDHA/LDHB selectivity of GNE-140 and the cellular phenotypes. Thus, the assay is mechanistically specific in this context.

      (5) Enzyme-coupled metabolite assays and request for LC–MS validation

      The reviewer’s implication that enzyme-coupled assays are intrinsically unreliable is incorrect. Enzymatic cycling assays are a widely used quantitative approach when performed with proper specificity and calibration, and they are particularly useful for labile glycolytic intermediates that are challenging to quantify reproducibly by MS without specialized quenching, derivatization, and isotope dilution standards.

      We agree that MS-based quantification is valuable, and we have developed LC–MS methods for selected metabolites. However, absolute quantification of these intermediates remains technically difficult due to the inherent limitation of this method and, in our hands, did not provide uniformly robust performance for all intermediates required for thermodynamic analysis.

      (6) Units (“mM”)

      The metabolite concentration units are correct.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      If the goal is to investigate the direct impact of LDH inhibition, then in my opinion, most of these experiments need to be repeated at a very early time point immediately after or a few minutes after LDH inhibition. I understand that this is a tremendous amount of work that the authors might not want to pursue. I do want to highlight that the quality of the experiments performed in this work is impressive. I hope the authors continue investigating this subject and look forward to reading their future manuscripts on this topic.

      We thank the reviewer for this thoughtful and constructive comment and for the positive assessment of the experimental quality of our work.

      We fully agree that measurements at very early time points after LDH inhibition would be required if the goal were to isolate an immediate, proximal molecular event occurring before downstream propagation. However, the primary objective of our study is not to dissect a single instantaneous biochemical consequence of LDH inhibition, but rather to characterize the metabolic steady state that is re-established after sustained suppression of LDH activity, which we believe is more relevant for understanding the long-term metabolic and therapeutic consequences of LDH inhibition in cancer cells.

      (1) Scope: steady-state metabolic regulation versus immediate transient effects

      The reviewer raises an important point that many metabolic perturbations can trigger rapid, transient responses within seconds to minutes, whereas our measurements were performed after sustained LDH inhibition. We agree that very early time points would be required if the primary goal were to isolate the most immediate, proximal consequence of LDH inhibition before downstream propagation. However, the objective of our study is different: we aim to characterize the metabolic steady state re-established after sustained inhibition of LDH activity, because this adapted steady state is more relevant for understanding long-term metabolic consequences and therapeutic outcomes of LDH inhibition in cancer cells.

      (2) Genetic LDHA/LDHB knockout: comparison of two steady states

      A related point applies to the LDHA/LDHB knockout models. We fully agree that the knockout process necessarily involves a temporal perturbation during cell line generation and adaptation. Nevertheless, the experimental comparison in our study is explicitly between two steady states: the baseline steady state of control cells and the steady state achieved after stable genetic disruption of LDHA or LDHB. The observation that LDHA or LDHB knockout alone had minimal effects on glycolysis and respiration indicates that partial reduction of LDH activity can be compensated in a steady-state manner, consistent with the exceptionally high catalytic capacity of LDH in cancer cells relative to upstream rate-limiting enzymes.

      (3) LDH-activity-dependent quantitative relationships support stable metabolic states

      Importantly, our conclusions do not rely on a single inhibitor condition at a single time point. Rather, we established quantitative steady-state relationships between residual LDH activity and pathway behavior across a wide range of LDH inhibition. These LDH-activity-dependent data strongly support that the system resides in stable metabolic states at different degrees of LDH activity, rather than reflecting non-specific collapse due to prolonged stress.

      Specifically, we observed that when LDH activity was reduced from 100% to approximately ~9% (e.g., by genetic perturbation and partial pharmacologic inhibition), glucose consumption and lactate production remained essentially unchanged, indicating maintenance of a steady-state glycolytic flux despite substantial LDH inhibition. Only when LDH activity was further reduced below this threshold did glycolytic flux decrease in a graded manner, consistent with a nonlinear control structure.

      Likewise, the isotope tracing results showed distinct LDH-activity-dependent transitions in TCA cycle labeling patterns. Over the range in which LDH activity decreased from 100% to ~9%, the [<sup>13</sup>C<sub>6</sub>]glucose-derived labeling pattern of citrate remained largely unchanged, whereas deeper inhibition led to a decrease in m2 citrate with a compensatory rise in higher-order citrate isotopologues, consistent with altered flux entry versus cycling/retention in the TCA cycle. Similarly, [<sup>13</sup>C<sub>5</sub>]glutamine tracing revealed that deeper LDH inhibition reduced the direct m5 contribution, accompanied by corresponding shifts in other isotopologues. These graded, quantitative transitions—rather than an abrupt global failure—support the interpretation of distinct metabolic steady states across LDH activity levels, linking LDH inhibition to changes in both glycolysis and mitochondrial metabolism.

      Reviewer #2 (Recommendations For The Authors):

      All in all, the authors would benefit from collaboration with a group more well-versed in quantitative aspects of metabolism (such as Metabolic Control Analysis) and modelling methods (such as flux analysis) to boost the interpretation and impact of their really nice data set.

      We sincerely thank the reviewer for this insightful and constructive suggestion. We fully agree that collaboration with groups specializing in quantitative metabolic analysis, such as Metabolic Control Analysis and flux modeling, would further expand the interpretative depth and broader impact of this work.

      The primary objective of the present work, however, was not to construct a global mathematical model, but to experimentally dissect the biochemical mechanism by which LDH inhibition coordinately suppresses glycolysis, the TCA cycle, and OXPHOS, integrating enzyme kinetics with thermodynamic constraints at steady state. Within this scope, we focused on experimentally demonstrable relationships between LDH activity, redox balance, GAPDH perturbation, thermodynamic shifts in near-equilibrium reactions, and emergent flux suppression.

      We fully recognize the power of MCA and related modeling approaches in formalizing control coefficients and system-level sensitivities, and we view our dataset as particularly well suited to support such future analyses. We therefore see this work as providing a robust experimental platform upon which more comprehensive quantitative modeling can be built, either in future studies or through collaboration with specialists in metabolic modeling.

      Reviewer #3 (Recommendations For The Authors):

      We sincerely thank the reviewer for the important suggestions.

      (1) I strongly disagree that "regulation of glycolytic flux".. "remained largely unexplored.”

      Our original wording was meant to emphasize not the absence of prior work on glycolytic flux regulation, but rather that the specific biochemical mechanism by which LDH regulates glycolytic flux—particularly through the integrated effects of enzyme kinetics, redox balance, and thermodynamic constraints within the pathway—has not been fully elucidated.

      To avoid any ambiguity or overstatement, we have revised the relevant text to more precisely reflect this intent. The revised wording now reads:

      “This study elucidates a biochemical mechanism by which lactate dehydrogenase influences glycolytic flux in cancer cells, revealing a kinetic–thermodynamic interplay that contributes to metabolic regulation.”

      We believe this revised phrasing more accurately acknowledges prior work while clearly defining the specific mechanistic contribution of the present study.

      (2) Very confusing in the Introduction section: "If LDH is inhibited at the LDH step..”

      We sincerely thank the reviewer for pointing out the potential confusion caused by the phrase “If LDH is inhibited at the LDH step” in the Introduction.

      Our intention was to contrast two conceptual models of LDH inhibition. The first is the conventional view, in which the effect of LDH inhibition is assumed to be confined to the LDH-catalyzed reaction itself, leading primarily to local accumulation of pyruvate and its redirection toward mitochondrial metabolism. The second, which is supported by our data, is that LDH inhibition initiates a system-wide biochemical response, perturbing redox balance, upstream enzyme kinetics, and the thermodynamic state of the glycolytic pathway, ultimately resulting in coordinated suppression of glycolysis, the TCA cycle, and OXPHOS.

      We agree that the original phrasing was ambiguous and potentially misleading. To improve clarity, we have revised the text as follows:

      “If the effect of LDH inhibition were confined solely to its catalytic step…”

      (3) The entire introduction part when the authors attempt to explain how decreased glycolysis will lead to decreased mitochondrial respiration is confusing.

      We would like to clarify that the Introduction does not attempt to explain how decreased glycolysis leads to decreased mitochondrial respiration. Rather, the final paragraph of the Introduction is intended to highlight an unresolved conceptual inconsistency in the existing literature and to motivate the central question addressed in this study.

      Specifically, we summarize the prevailing view that LDH inhibition redirects pyruvate toward mitochondrial metabolism and enhances oxidative phosphorylation, and then point out that this interpretation is difficult to reconcile with quantitative considerations, such as the large disparity between glycolytic and mitochondrial flux capacities and the excess catalytic activity of LDH relative to upstream glycolytic enzymes. These observations are presented to emphasize that the biochemical mechanism linking LDH inhibition to changes in glycolysis and mitochondrial respiration has not been fully resolved.

      Importantly, the Introduction does not propose a mechanistic explanation for the observed suppression of mitochondrial respiration; rather, it poses this as an open question, which is then systematically addressed through experimental analysis in the Results section.

      (4) Line 144: "which is 81(HeLa-LDHAKO) -297(HeLa-Ctrl) times"- here and in many other places wording is confusing to the reader.

      Our intention was to emphasize the significant redundancy of LDH activity relative to hexokinase (HK), the first rate-limiting enzyme in the glycolysis pathway, in cancer cells.

      Specifically, we wanted to express that in HeLa-Ctrl cells, the total LDH activity is 297 times that of HK activity; while in HeLa-LDHAKO cells, although the total LDH activity decreased, it was still 81 times that of HK activity. This data comes from supplement Table 1 in the paper and aims to provide quantitative evidence for "why knocking out LDHA or LDHB alone is insufficient to significantly affect glycolysis flux," because the remaining LDH activity is still far higher than the HK activity at the pathway entrance, sufficient to maintain flux.

      Based on your suggestion, we rewrite it in the revised draft with a more specific statement: "...the total activity of LDH in HeLa cells is very high, which is 297-fold higher than the first rate-limiting enzyme HK activity in HeLa-Ctrl cells and 81-fold higher in HeLa-LDHAKO cells.”

      (5) Line 153: "in the following four aspects:"- but what are these aspects, the text below has no corresponding subtitles, etc.

      Our intention was to indicate that after LDHA or LDHB knockout alone failed to affect the glycolysis rate, we further explored its potential impact on the glycolytic pathway from four deeper perspectives: the glucose carbon to pyruvate and lactate, the glucose carbon to subsidiary branches of glycolysis, the concentration of glycolytic intermediates and the thermodynamic state of the pathway, and the redox state of cytosolic free NADH/NAD<sup>+</sup>.

      Following your valuable suggestion, we have now added the aforementioned clear subtitles to these four aspects in the revised manuscript.

      (6) Lines 193, another example of the very confusing statement: "The results suggested that the loss of total LDH concentration was compensated.."

      The actual catalytic activity (reaction rate) of LDH is determined by both its enzyme concentration and substrate concentration (pyruvate and NADH). When the total LDH protein concentration (enzyme amount) in the cell is reduced through gene knockout, the reaction equilibrium is disrupted. To maintain sufficient lactate production flux to support a high glycolysis rate, the cell compensates by increasing the concentration of one of the substrates—free NADH (as shown in Figure 1I). This results in an increased substrate concentration, despite a reduction in the amount of enzyme, thus partially maintaining the overall reaction rate.

      We have revised the original statement to more accurately describe this kinetic equilibrium process: "The decrease in total LDH concentration was counterbalanced by a concomitant increase in the concentration of its substrate, free NADH, thereby maintaining the reaction velocity.”

      (7) Line 222-223: "did not or marginally significantly affect....”

      Our intention is to reflect the complexity of the data in Figure 1. Specifically: Regarding "did not affect": This means that there were no statistically significant differences in most key parameters, such as glycolytic flux (glucose consumption rate, lactate production rate). Regarding "or marginally significantly affected": This means that in a few indicators, although statistical calculations showed p-values less than 0.05, the absolute value of the difference was very small, with limited biological significance.

      To clarify this, we rewrite it as: "...did not significantly affect glucose-derived pyruvate entering into TCA cycle, neither significantly affect mitochondrial respiration, although statistically significant but minimal changes were observed in a few specific parameters (e.g., m3-pyruvate% in medium).”

      (8) It is very confusing to use the same colors for three GNE-140 drug concentrations (Figure 2a-b) and for 3 different cell lines right next to each other (Figure 2c-d).

      The figures have been revised accordingly.

      (9) Lines 263-273: nothing is new here as oxidized NAD+ is required for run glycolysis and LDH inhibition/KO leads to a high NADH/NAD+ ratio; Also below it is well known that reductive stress blocks serine biosynthesis;

      It is well established that oxidized NAD<sup>+</sup> is required for glycolysis, that LDH inhibition or knockout increases the NADH/NAD<sup>+</sup> ratio, and that reductive stress can suppress serine biosynthesis. We did not intend to present these observations as novel.

      The key point of this section is not the qualitative requirement of NAD<sup>+</sup> for GAPDH, but rather the mechanistic alignment between LDH inhibition, changes in free NAD<sup>+</sup> availability, and the emergence of GAPDH as a flux-controlling step within the glycolytic pathway under steady-state conditions. Previous studies have largely treated the increase in NADH/NAD<sup>+</sup> following LDH inhibition as a correlative or downstream effect, without directly demonstrating how this redox shift quantitatively propagates upstream to reorganize glycolytic flux distribution and thermodynamic driving forces.

      In our study, we explicitly link LDH inhibition to (i) an increase in free NADH/NAD<sup>+</sup> ratio, (ii) inhibition of GAPDH activity in intact cells, (iii) accumulation of upstream glycolytic intermediates, (iv) suppression of serine biosynthesis from 3-phosphoglycerate, and critically, (v) coordinated shifts in the Gibbs free energies of reactions between PFK1 and PGAM. This integrated kinetic–thermodynamic framework goes beyond the established qualitative understanding of NAD<sup>+</sup> dependence and provides a pathway-level mechanism by which LDH activity controls glycolytic flux.

      (10) Lines 368-370: "... we reached an alternative interpretation of the data.."- does not provide much confidence.

      Our intention was to prudently emphasize that we proposed a new interpretation based on detailed data, differing from conventional views. Our interpretation is grounded in key and consistent evidence from dual isotope tracing experiments using [<sup>13</sup>C<sub>6</sub>]glucose and [<sup>13</sup>C<sub>5</sub>]glutamine: The [<sup>13</sup>C<sub>6</sub>]glucose tracing data: the labeling pattern of citrate, the starting product of TCA cycle, showed a significant decrease in m+2 %. This directly reflects a reduction in the flux of newly generated acetyl-CoA from glucose entering the TCA cycle. Simultaneously, the sum of other isotopologues % (m+1/ m+3/ m+4/m+5/m+6) increased, indicating a longer retention time of the labeled carbon in the cycle, implying a simultaneous decrease in the flux of cycle intermediates effluxed for biosynthesis. [<sup>13</sup>C<sub>5</sub>]Glutamine tracing data: the labeling pattern of α-ketoglutarate showed a decrease in m+5 %, indicating a reduction in glutamine replenishment flux. The pattern of change in the total percentage of other isotopologues % (m+1/ m+2/ m+3/m+4) also supports the conclusion of reduced intermediate product efflux.

      These two sets of data corroborate each other, pointing to a unified conclusion: LDH inhibition not only reduces carbon source inflow into the TCA cycle but also decreases intermediate product efflux, leading to a decrease in overall cycle activity. Therefore, our "alternative interpretation" is a well-supported and more consistent explanation of our overall experimental results. We revise the original wording to: "Integrated analysis of dual isotope tracing data demonstrates that LDH inhibition reduces both influx and efflux of the TCA cycle..."

      (11) Lines 418-421: This entire discussion on how TCA cycle activity is decreased upon LDH inhibition is very confusing. I also would like to see these tracer studies when ETC is inhibited with different inhibitors.

      We would like to clarify that the mitochondrial respiration rate data presented in Figure 5W are based on studies using different ETC inhibitors, and the cell treatment conditions (including culture time, etc.) for these oxygen consumption measurements are consistent with the conditions for the [<sup>13</sup>C<sub>6</sub>]glucose and [<sup>13</sup>C<sub>5</sub>]glutamine isotope tracing experiments (Figure 5A-V). Therefore, the changes in TCA cycle flux revealed by the tracing data and the inhibition of OXPHOS rate shown by the respiration measurements are mutually corroborating evidence from the same experimental conditions.

      (12) Figure 6F, G - very limited representation of growth curves, why not perform these experiments with all corresponding cell lines and over multiple days. Especially since proliferation arrest vs cell death was implicated.

      We have provided the growth curves of the HeLa-Ctrl and HeLa-LDHAKO cell lines under the corresponding treatments in Figure 6—figure supplement 1, as a supplement to Figure 6F, G (HeLa-LDHBKO cells). The choice of 48 hours as the cutoff observation point is based on clear biological evidence: under the stress of hypoxia (1% O<sub>2</sub>) combined with GNE-140 treatment, HeLa-LDHBKO cells experienced substantial death within 24 to 48 hours, at which point the differences in the growth curves were already very significant.

      (13) Move most of the Supplementary tables into an Excel file - so values can be easily accessed.

      We have compiled the tables into an Excel file and submitted it along with the revised manuscript as supplementary material.

      (14) Consider changing colors to more appealing- especially jarring is a bright blue, red, black combination on many bar graphs.

      We have adjusted the color scheme of the figures (especially the bar graphs) in the paper, and have submitted them with the revised manuscript.

      (15) Double check y-axis on multiple graphs it says "mM".

      We have checked y-axis, the unit (mM) is correct.

      (16) Instead TCA cycle use the TCA cycle.

      In the revised manuscript, TCA cycle is used.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Frangos et al. used a transcriptomic and proteomic approach to characterise changes in HER2-driven mammary tumours compared to healthy mammary tissue in mice. They observed that mitochondrial genes, including OXPHOS regulators, were among the most down-regulated genes and proteins in their datasets. Surprisingly, these were associated with higher mitochondrial respiration, in response to a variety of carbon sources. In addition, there seems to be a reduction in mitochondrial fusion and an increase in fission in tumours compared to healthy tissues.

      Strengths:

      The data are clearly presented and described.

      The author reported very similar trends in proteomic and transcriptomic data. Such approaches are essential to have a better understanding of the changes in cancer cell metabolism associated with tumourigenesis.

      Weaknesses:

      (1) This study, despite being a useful resource (assuming all the data will be publicly available and not only upon request) is mainly descriptive and correlative and lacks mechanistic links.

      We appreciate this point. While the primary goal of our study was to assess mitochondrial adaptations with HER2-driven tumorigenesis, we agree strengthening the mechanistic interpretation would improve the impact of the data. To address this, we have provided experiments demonstrating HER2 inhibition in NF639 cells with lapatinib supresses respiratory capacity, directly supporting the interpretation that HER2 activity regulates respiratory function (Figure 10). We have expanded the discussion appropriately (lines 378-394). Both raw RNA-seq and proteomic data were deposited through GEO and the PRIDE repositories (accession numbers included in Data Availability Statement).

      (2) It would be important to determine the cellular composition of the tumour and healthy tissue used. Do the changes described here apply to cancer cells only or do other cell types contribute to this?

      We thank the reviewer for this suggestion; we have added experiments that have directly addressed this concern.

      Cell type composition analysis by immunofluorescence was added (Figure 6) where we quantified epithelial, mesenchymal, endothelial, immune and stromal populations in our benign mammary tissue and tumor samples. We found no major shift in the dominant cell types that would confound transcriptomic data in whole tissues.

      We integrated immunofluorescence data with a publicly available scRNA-seq dataset from human breast tumors which allowed us to estimate cell-type-specific expression of OXPHOS genes in our own samples. Despite the possibility of species differences, this is the only dataset of its kind, and we used this to generate an estimate of cell type weighted OXPHOS mRNA expression (Figure 6). This revealed that epithelial cells are likely the dominant contributors to OXPHOS gene expression for CIIV. All calculations are delineated in the Methods section.

      (3) Are the changes in metabolic gene expression a consequence of HER2 signalling activation? Ex-vivo experiments could be performed to perturb this pathway and determine cause-effects.

      Thank you for this suggestion – we have included an experiment directly testing this concept. We assessed mitochondrial respiration in NF639 HER2-driven mammary tumor epithelial cells in the presence or absence of the well-described dual tyrosine kinase inhibitor lapatinib. Lapatinib reduced basal, CI-linked and CI+II linked respiration without compromising mitochondrial integrity or coupling, demonstrating that HER2 activation regulates respiration in our model. This data is presented in Figure 10, and a new section has been added to the discussion describing the implications of this finding in the context of the current literature (lines 378-394).

      (4) The data of fission/fusion seem quite preliminary and the gene/protein expression changes are not so clear cut to be a convincing explanation that this is the main reason for the increased mitochondria respiration in tumours.

      We agree mitochondrial morphology and dynamics alone cannot fully account for the observed respiratory phenotype – this was emphasized in the discussion but has since been further clarified (lines 365-377). We retained the TEM and dynamics gene/protein data because they do support morphological differences consistent with enhanced fission. However, we have revised the tone of our interpretation to more explicitly acknowledge that these findings are correlative, and the updated discussion now emphasizes that the increased respiratory capacity in tumors is likely driven by multiple converging mechanisms.

      Reviewer #2 (Public review):

      Frangos et al present a set of studies aiming to determine mechanisms underlying initiation and tumour progression. Overall, this work provides some useful insights into the involvement of mitochondrial dysfunction during the cellular transformation process. This body of work could be improved in several possible directions to establish more mechanistic connections.

      (5) The interesting point of the paper: the contrast between suppressed ETC components and activated OXPHOS function is perplexing and should be resolved. It is still unclear if activated mitochondrial function triggers gene down-regulation vs compensatory functional changes (as the title suggests). Have the authors considered reversing the HER2-derived signals e.g. with PI3K-AKT-MTOR or ERK inhibitors to potentially separate the expression vs. functional phenotypes? The root of the OXPHOS component down-regulation should also be traced further, e.g. by probing into levels of core mitochondrial biogenesis factors. Are transcript levels of factors encoded by mtDNA also decreased?

      We appreciate this insight and agree that the discordance between mitochondrial content and function is fascinating and have addressed the concerns above in the following manner:

      - We have altered the title – we agree we cannot definitively say that the enhanced respiratory capacity observed is compensatory.

      - We have added experiments in NF639 cells in the presence of lapatinib, a tyrosine kinase inhibitor to interrogate whether HER2 is necessary for our functional outcome of interest – the enhanced respiratory capacity in the tumors. Lapatinib significantly suppressed respiration (Figure 10) demonstrating HER2 signaling directly regulates mitochondrial respiration.

      - We have expanded the discussion to provide further comment on potential explanations for increased respiratory function and low mitochondrial content.

      (6) The second interesting aspect of this study is the implication of mitochondrial activation in tumours, despite the downregulation of expression signatures, suggestive of a positive role for mitochondria in this tumour model. To address if this is correlative or causal, have the authors considered testing an OXPHOS inhibitor for suppression of tumorigenesis?

      Previous studies have eloquently highlighted that directly or indirectly inhibiting mitochondria can supress growth in HER2-driven breast cancer (PMID:31690671) or alternatively, amplification of mt-HER2 enhances tumorigenesis (PMID: 38291340). In many solid tumors, this is the concept of preclinical and clinical studies using IACS-010759 or similar inhibitors of OXPHOS which do suppress growth but have significant off target effects in healthy tissues (PMID: 36658425, 3580228We have expanded the discussion to ensure the reader is aware of these previous contributions and highlighted the importance of future work delineating the role of enhanced respiratory function in HER2-driven mammary cancer (lines 378-394).

      (7) A number of issues concerning animal/ tumour variability and further pathway dissection could be explored with in vitro approaches. Have the authors considered deriving tumourderived cell cultures, which could enable further confirmations, mechanistic drug studies and additional imaging approaches? Culture systems would allow alternative assessment of mitochondrial function such as Seahorse or flow cytometry (mitochondrial potential and ROS levels).

      We thank the reviewer for this suggestion – we have addressed this in part by using the NF639 HER2driven tumor epithelial line which demonstrated that HER2 regulates our observed respiratory response. Unfortunately, the addition of tumor derived cell cultures was not feasible or within the scope of our study. Animal and tumor variability has been clarified in the Methods section (lines 424-429). Mitochondrial respiration experiments were performed in paired tissue (benign and tumor from same mouse). Transcriptomic, proteomic and histological analyses were performed on tumors and benign samples from different mice due to tissue limitations.

      (8) The study could be greatly improved with further confirmatory studies, eg immunoblotting for mitochondrial components with parallel blots for phospho-signalling in the same samples. It would be interesting if trends could be maintained in tumour-derived cell cultures. It is notable that OXPHOS protein/transcript changes are more consistent (Figure 5, Supplementary Figure 4) than mitochondrial dynamics /mitophagy factors (Figure 8). Core regulatory factors in these pathways should be confirmed by conventional immunoblotting.

      We thank the reviewer for this thoughtful comment. While we agree that additional confirmatory studies can be valuable, due to tissue quantity constraints and the number of assays required for our multi-omics analysis, extensive additional blots were not feasible. However, we had sufficient protein to provide select OXPHOS proteins to verify the proteomic data (now provided in S-Fig.4H). Furthermore, we have plotted the fold change of genes and proteins detected in both datasets and added this to Figure 4 (4A, B), further highlighting the consistency between our transcriptomic and proteomic findings. We believe that the highly consistent and concordant nature of our datasets collectively provides strong support for our central objective - determining whether mitochondrial content and respiratory function correlate in HER2-driven mammary tumors. The reproducibility of OXPHOS-related changes reinforces the robustness of our observations. We also appreciate the reviewer’s insight that OXPHOS alterations appear particularly consistent. In response, we have edited the discussion to further emphasize this point, especially in relation to the distinctive pattern observed for Complex V, which showed greater preservation relative to Complexes I–IV across several methods (lines 348-364). We comment on how this stoichiometric shift may contribute to intrinsic respiratory activation despite reduced mitochondrial content.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Further Minor points.

      (9) It would be helpful to know further details regarding the source of the tumour samples, particularly for the proteomics (N=5) and transcriptomics (N=6) datasets, since the exact timepoint of tissue harvest and number of tumours/mouse varied, according to the methods section. Were all samples from the omics studies from different mice (ie 11 mice)? B4 and B6 seem like outliers in mitochondrial transcriptomes. Are these directly paired eg with T4 and T6? Are the side-by-side pairs of Ben and Tum samples for blots in Figure 1 and Supplementary Figure 1 from the same mouse.

      This has been clarified in the Methods section (lines 424-429). Mitochondrial respiration experiments were performed in paired tissue (benign and tumor from same mouse). Transcriptomic, proteomic and histological analyses were performed on tumors and benign samples from different mice due to tissue limitations.

      (10) Further references and details are needed to support the methodology of the mitochondrial function tests (eg. nutrients vs pairing with complexes). What was the time point of nutrient supplementation? It would seem that the lipid substrates should take longer to activate OXPHOS than pyruvate/malate or succinate. Is this the case? Is there speculation as to why succinate supplementation is much more active than pyruvate+malate? What is +MD in Figure 6? The rationale for pooling data for Figure 7A is unclear since the categories appear to overlap: (pyruvate, malate, ADP) vs. (palmitoyl-carnitine, malate, ADP).

      Thank you for this comment. We have expanded the methods (lines 515-531) to provide additional detail on the mitochondrial respiration protocol. Briefly, permeabilized tissues were exposed to substrates delivered at supraphysiological concentrations in a sequential protocol lasting ~30–60 minutes. Under these conditions, mitochondrial respiration reflects the maximal capacity to utilize each substrate rather than the physiological time course of substrate mobilization or uptake that would occur in vivo with the influence of blood flow and transport/substrate availability limitations.

      (11) Many of the figures were blurry (Figure 1F, 2B) or had labels that were too small to be effective (Figures 1G, H, 2D-G, 3E-G, 5E-I, 7C, 8B).

      The font size of figure labels has been increased where possible and all figures have been exported to maximize resolution.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chengjian Zhao et al. focused on the interactions between vascular, biliary, and neural networks in the liver microenvironment, addressing the critical bottleneck that the lack of high-resolution 3D visualization has hindered understanding of these interactions in liver disease.

      Strengths:

      This study developed a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized CUBIC tissue clearing. This method enables the simultaneous 3D visualization of spatial networks of the portal vein, hepatic artery, bile ducts, and central vein in the mouse liver. The authors reported a perivascular structure termed the Periportal Lamellar Complex (PLC), which is identified along the portal vein axis. This study clarifies that the PLC comprises CD34<sup>+</sup>Sca-1<sup>+</sup> dual-positive endothelial cells with a distinct gene expression profile, and reveals its colocalization with terminal bile duct branches and sympathetic nerve fibers under physiological conditions.

      Comments on revisions:

      The authors very nicely addressed all concerns from this reviewer. There are no further concerns or comments.

      We sincerely thank the reviewer for the positive evaluation of the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The present manuscript of Xu et al. reports a novel clearing and imaging method focusing on the liver. The Authors simultaneously visualized the portal vein, hepatic artery, central vein, and bile duct systems by injected metal compound nanoparticles (MCNPs) with different colors into the portal vein, heart left ventricle, vena cava inferior and the extrahepatic bile duct, respectively. The method involves: trans-cardiac perfusion with 4% PFA, the injection of MCNPs with different colors, clearing with the modified CUBIC method, cutting 200 micrometer thick slices by vibratome, and then microscopic imaging. The Authors also perform various immunostaining (DAB or TSA signal amplification methods) on the tissue slices from MCNP-perfused tissue blocks. With the application of this methodical approach, the Authors report dense and very fine vascular branches along the portal vein. The authors name them as 'periportal lamellar complex (PLC)' and report that PLC fine branches are directly connected to the sinusoids. The authors also claim that these structures co-localize with terminal bile duct branches and sympathetic nerve fibers and contain endothelial cells with a distinct gene expression profile. Finally, the authors claim that PLC-s proliferate in liver fibrosis (CCl4 model) and act as scaffold for proliferating bile ducts in ductular reaction and for ectopic parenchymal sympathetic nerve sprouting.

      Strengths:

      The simultaneous visualization of different hepatic vascular compartments and their combination with immunostaining is a potentially interesting novel methodological approach.

      Weaknesses:

      This reviewer has some concerns about the validity of the microscopic/morphological findings as well as the transcriptomics results, and suggests that the conclusions of the paper may be critically viewed. Namely, at this point, it is still not fully clear that the 'periportal lamellar complex (PLC)' that the Authors describe really exists as a distinct anatomical or functional unit or these are fine portal branches that connect the larger portal veins into the adjacent sinusoid. Also, in my opinion, to identify the molecular characteristics of such small and spatially highly organized structures like those fine radial portal branches, the only way is to perform high-resolution spatial transcriptomics (instead of data mining in existing liver single cell database and performing Venn diagram intersection analysis in hepatic endothelial subpopulations). Yet, the existence of such structures with a distinct molecular profile cannot be excluded. Further research with advanced imaging and omics techniques (such as high resolution volume imaging, and spatial transcriptomics/proteomics) are needed to reproduce these initial findings.

      We thank the reviewer for the thoughtful and constructive comments. In response to the reviewer’s concerns regarding the anatomical and molecular definition of the periportal lamellar complex (PLC), we have further clarified the scope and methodological boundaries of the present study in the revised manuscript.

      Regarding the key question raised by the reviewer—namely, whether the PLC represents an independent anatomical or functional unit, or merely small portal venous branches connecting larger portal veins to adjacent sinusoids—we provide below a more detailed explanation of the criteria used to define the PLC in this study. The identification of the PLC is primarily based on periportal structures that can be reproducibly recognized by three-dimensional imaging across multiple mice, exhibiting a relatively consistent spatial distribution within the periportal region. The PLC could be stably observed across different MCNP dye color assignments and independent experimental batches. In addition, three-dimensional CD31 immunofluorescence consistently revealed vascular-associated signal distributions in the same periportal region, indirectly supporting its spatial association with the periportal vascular system.

      At the morphological level, the PLC appears as a periportal vasculature-associated structure distributed around the main portal vein trunk and maintains a relatively consistent spatial proximity to portal veins, bile ducts, and neural components in three-dimensional space. This highly conserved spatial organization across multiple tissue systems supports the anatomical positioning of the PLC as a relatively distinct structural tissue unit within the periportal region.

      The present study primarily focuses on a descriptive characterization of the three-dimensional anatomical organization and spatial relationships of the PLC based on volumetric imaging and vascular labeling strategies. As a complementary exploratory analysis, we reanalyzed endothelial cell populations potentially associated with the PLC using existing liver single-cell transcriptomic datasets. This analysis was intended to provide molecular-level information consistent with the structural observations and to offer preliminary clues to its potential biological functions, rather than to independently define the PLC at the spatial level or to functionally validate it.

      We fully acknowledge the value of spatial transcriptomic and spatial proteomic technologies in revealing molecular heterogeneity within tissue architecture. However, under current technical conditions, these approaches are largely dependent on thin tissue sections and are limited by spatial resolution and signal mixing effects, which still pose challenges for resolving periportal structures with pronounced three-dimensional continuity, such as the PLC. In the future, further integration of high-resolution volumetric imaging with spatial omics technologies may enable a more refined understanding of the molecular features and potential functions of the PLC at higher spatial resolution.

      Reviewer #3 (Public review):

      Summary:

      In the revised version of the manuscript authors addressed multiple comments, clarifying especially the methodological part of their work and PLC identification as a novel morphological feature of the adult liver portal veins. Tet is now also much clearer and has better flow.

      The additional assessment of the smartSeq2 data from Pietilä et al., 2025 strengthens the transcriptomic profiling of the CD34+Sca1+ cells and the discussion of the possible implications for the liver homeostasis and injury response. Why it may suffer from similar bias as other scRNA seq datasets - multiple cell fate signatures arising from mRNA contamination from proximal cells during dissociation, it is less likely that this would happen to yield so similar results.

      Nevertheless, a more thorough assessment by functional experimental approaches is needed to decipher the functional molecules and definite protein markers before establishing the PLC as the key hub governing the activity of biliary, arterial, and neuronal liver systems.

      The work does bring a clear new insight into the liver structure and functional units and greatly improves the methodological toolbox to study it even further, and thus fully deserves the attention of the Elife readers.

      Strengths:

      The authors clearly demonstrate an improved technique tailored to the visualization of the liver vasulo-biliary architecture in unprecedented resolution.

      This work proposes a new morphological feature of adult liver facilitating interaction between the portal vein, hepatic arteries, biliary tree, and intrahepatic innervation, centered at previously underappreciated protrusions of the portal veins - the Periportal Lamellar Complexes (PLCs).

      Weaknesses:

      The importance of CD34+Sca1+ endothelial cell subpopulation for PLC formation and function was not tested and warrants further validation.

      We thank the reviewer for the careful and constructive comments regarding the functional validation of cell populations associated with the PLC. The central aim of this study is to establish and validate a novel volumetric imaging and vascular labeling strategy and to apply it to the periportal region of the liver, thereby revealing previously underappreciated structural organizational patterns at the three-dimensional level, rather than to perform a systematic functional validation of specific cellular subpopulations.

      We agree that the precise roles of the CD34<sup>+</sup>Sca-1<sup>+</sup> endothelial cell subpopulation in the formation and function of the periportal lamellar complex (PLC) have not been directly addressed through functional intervention experiments in the present study. Our conclusions are primarily based on three-dimensional imaging and spatial distribution analyses, which reveal a stable and consistent spatial association between this cell population and the PLC structure, but are not intended to independently support causal or functional inferences. The underlying functional mechanisms remain to be elucidated in future studies using genetic or functional perturbation approaches.

      In light of these considerations, we have further refined the relevant statements in the revised manuscript to more clearly define the functional scope and limitations of the current study in the Discussion section, and to avoid functional interpretations that extend beyond the direct support of the data. At the same time, we consider functional validation of the PLC to be an important and promising direction for future investigation.

      It should be emphasized that the present study is not primarily designed to provide direct functional validation, but rather to systematically characterize the three-dimensional structural features of the periportal lamellar complex (PLC) and its cellular associations using volumetric imaging and vascular labeling approaches. At this stage, we mainly provide spatial and histological evidence for the organizational relationship between the PLC structure and the CD34<sup>+</sup>Sca-1<sup>+</sup> endothelial cell population, while their specific roles in PLC formation and functional regulation await further investigation.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I highly appreciate the Authors' endeavors to improve the manuscript. I am enlisting those points (from my original review) where I still have further comments.

      (2) I would suggest this sentence:

      "...the liver has evolved a highly complex and densely organized ductal vascular-neuronal network in the body, consisting primarily of the portal vein system, central vein system, hepatic artery system, biliary system, and intrahepatic autonomic nerve network [6, 7]."

      We thank the reviewer for the valuable suggestion. We have revised the relevant sentence accordingly, and the revised wording is as follows:

      “The liver has evolved a highly complex and densely organized vascular–biliary–neural network, primarily composed of the portal venous system, central venous system, hepatic arterial system, biliary system, and the intrahepatic autonomic neural network.”

      (3) I suggest renaming 'clearing efficiency' to 'clearing time', and revise the last sentence like:

      '...The results showed that the average transmittance increased by 20.12% in 1mm-thick cleared tissue slices.'

      We thank the reviewer for this helpful suggestion. Accordingly, we have replaced the term “clearing efficiency” with “clearing time” and revised the final sentence to reflect this change. The revised wording is as follows:

      “The results showed that the average transmittance increased by 20.12% in cleared tissue slices with a thickness of 1 mm.”

      (4) While the dye perfusion was indeed on full lobe, FigS1F also seems to be rather a thick section instead of a full 3d reconstruction. This is OK, but please, be clear and specific about this in the respective part of the ms.

      We thank the reviewer for the careful review and detailed comments. We would like to clarify that Fig. S1F shows whole-lobe imaging of the mouse left liver lobe obtained after dye perfusion at the whole-liver scale, rather than an image derived from a thick tissue section. Although this image does not represent a three-dimensional reconstruction, it does reflect imaging of the entire left liver lobe at the macroscopic level.

      In addition, for the reviewer’s reference, we have provided in this response a representative image of a 200 μm-thick liver tissue section to directly illustrate the morphological differences between thick-section imaging and whole-lobe imaging. We note that the third and fourth panels in Fig. 1G of the main text already show local imaging results from 200 μm-thick sections; in contrast, the comparative image provided here presents a larger field of view and overall morphology. To avoid redundancy, this additional image is included solely for clarification in the present response and has not been incorporated into the revised manuscript or the supplementary materials.

      (11) Regarding the 'transmission quantification':

      'Regarding the comparative quantification of different clearing methods, as the reviewer noted, nearly all aqueous or organic solvent based clearing techniques can achieve relatively uniform transparency in 1 mm thick tissue sections, so differences at this thickness are limited.'

      So, based on all these, I think, measuring/comparisons of clearing efficacy in the present form are kind of pointless --- one may consider omitting this part.

      We thank the reviewer for the valuable comments. The purpose of the transmittance quantification in this study was not to provide a comprehensive comparison among different tissue-clearing methods, but rather to serve as a quantitative reference supporting the optimization of the Liver-CUBIC protocol. Accordingly, we have narrowed and clarified the relevant statements in the revised manuscript to define their scope and avoid overinterpretation.

      The revised text now reads as follows:

      “Importantly, Liver-CUBIC treatment did not induce significant tissue expansion (Figure 1B–D). In addition, quantitative transmittance measurements in 1-mm-thick cleared tissue slices showed an average increase of 20.12% (P < 0.0001; 95% CI: 19.14–21.09; Figure 1E).”

      Author response image 1.

      (16) It is OK, but please, indicate this clearly in the Methods/Results because in its present form it may be confusing for the reader: which color means what.

      We thank the reviewer for this helpful request for clarification. We agree that the previous wording may have caused confusion regarding the meaning of different MCNP colors. Accordingly, we have revised the Methods section and the relevant figure legends to clearly state that the color assignment of MCNP dyes is not fixed across different experiments or figures. The use of different colors serves solely for visualization and presentation purposes, facilitating the distinction of anatomical structures in multichannel and three-dimensional imaging, and does not indicate any fixed or intrinsic correspondence between a specific color and a particular vascular or ductal system. We believe that this clarification will help prevent misinterpretation and improve the overall clarity of the manuscript.

      (17) Still I think the hepatic artery is extremely shrunk, while the portal vein is extremely dilated. Please, note that in the referring figure (from Adori et al), hepatic artery and portal vein are ca 50 micrometers and 250 micrometers in diameter, respectively. In your figure, as I see, ca. 9-10 micrometers and 125 micrometers, respectively. This means 5x (Adori) vs. 13-14x differences (you). I would not say that this is necessarily problematic --- but may reflect some perfusion issues that may be good to consider.

      We thank the reviewer for the careful comparison and acknowledge the quantitative differences pointed out. Compared with the study by Adori et al., the diameter ratio between the hepatic artery and the portal vein in our images does indeed differ to some extent. We believe that this discrepancy primarily arises from methodological differences in imaging and analysis strategies between the two studies.

      In the work by Adori et al., periportal vasculature identification and three-dimensional segmentation were mainly based on 488 nm autofluorescence signals acquired from inverted tissues. This signal predominantly reflects the overall outline of periportal tissue regions rather than direct imaging of the vascular lumen itself. Consequently, the measured “vessel diameter” largely represents a spatial domain delineated by surrounding periportal structures, and does not necessarily correspond to the actual or functional luminal diameter of the vessel.

      In contrast, the present study employed fluorescent MCNP dye perfusion under low perfusion pressure, combined with tissue clearing and three-dimensional optical imaging. Under these experimental conditions, the measured vessel diameters more closely reflect the perfusable luminal space of vessels in a fixed state, rather than their maximally dilated diameter, and are not defined by the morphology of surrounding tissues. This distinction is particularly relevant for the hepatic artery: as a high-resistance, smooth muscle–rich vessel, its diameter is highly sensitive to perfusion pressure and post-excision changes in vascular tone. In comparison, the portal vein exhibits greater compliance and is relatively less affected by these factors.

      Based on these methodological differences, the observation of relatively smaller apparent hepatic arterial diameters—and consequently a higher arterial-to-portal vein diameter ratio—under dye perfusion–based optical imaging conditions is an expected outcome. Importantly, the primary focus of the present study is the identification and characterization of the periportal lamellar complex (PLC) as a three-dimensional lamellar tissue structure that can be stably and reproducibly recognized across different samples and imaging conditions, rather than absolute comparisons of vascular diameters.

      (21) After the presented documentation, I still have some concerns that the 'periportal lamellar complex (PLC)' that the Authors describe is really a distinct anatomical or functional unit. The confocal panel in Fig. 4F is nice and high quality. However, as far as I see, it shows that CD34+/Sca-1+ immunostaining is not specific for the presumptive PLCs in the peri-portal region. Instead, Sca-1 immunoreactivity is highly abundant also in the midzone --- to which the supposed PLCs do not extend, according to the cartoon shown in panel D, same figure. Notably, this questions also the specificity of the single cell analysis.

      We thank the reviewer for this detailed and important comment regarding the specificity of CD34<sup>+</sup>/Sca-1<sup>+</sup> markers and the definition of the periportal lamellar complex (PLC).

      It should be emphasized that the PLC is not defined on the basis of any single molecular marker, but rather by a reproducible periportal lamellar anatomical structure consistently revealed by three-dimensional imaging across multiple samples. The co-expression of CD34 and Sca-1 is interpreted within this clearly defined anatomical context and is used to characterize the molecular features of endothelial cells associated with the PLC structure.

      As shown in Fig. 4F, the co-expression of CD34 and Sca-1 delineates a continuous, lamellar endothelial structure surrounding the portal vein. In contrast, outside the periportal region—including the midlobular areas—Sca-1 or CD34 expression can also be detected, but these signals appear scattered and discontinuous, lacking an organized lamellar topology.

      In the single-cell transcriptomic analysis, we treated CD34<sup>+</sup>/Sca-1<sup>+</sup> endothelial cells as an operational population to explore molecular features that may be enriched in the microenvironment of the periportal lamellar complex (PLC). Importantly, this analysis was intended to provide molecular clues associated with the PLC, rather than to precisely assign spatial locations or identities to individual cells.

      Occasional isolated Sca-1<sup>+</sup> signals detected outside the periportal region do not affect the anatomical definition of the PLC, nor do they alter the interpretation of the single-cell analysis. These analyses serve to provide supportive and exploratory molecular information for the structural identification of the PLC, rather than constituting decisive spatial evidence.

      (23) '....In the manuscript, we have carefully stated that this analysis is exploratory in nature and have avoided overinterpretation. In future studies, high-resolution spatial omics approaches will be invaluable for more precisely delineating the molecular characteristics of these fine structures.'

      I do not find these statements either in the Discussion or in the Results. I must reiterate my opinion that the applied methodical approach in the single cell transcriptomics part has severe limitations, and the readers must be aware of this.

      We thank the reviewer for this further comment. We understand and acknowledge the reviewer’s concerns regarding the methodological limitations of single-cell transcriptomic analyses, and we agree that these limitations should be clearly communicated to readers in the main text.

      We acknowledge that in the previous version of the manuscript, the exploratory nature of the single-cell transcriptomic analysis and its methodological boundaries were discussed only in the response to reviewers and were not explicitly stated in the manuscript itself. We thank the reviewer for pointing out this omission. In the revised manuscript, we have now added explicit clarifications in the main text to prevent potential overinterpretation of these results.

      In the present study, our primary effort is focused on the descriptive characterization of the three-dimensional anatomical organization and spatial relationships of the PLC using volumetric imaging and vascular labeling strategies. As a complementary exploratory analysis, we reanalyzed existing liver single-cell transcriptomic datasets to examine endothelial cell populations exhibiting PLC-associated features, and performed differential gene expression and Gene Ontology enrichment analyses. Importantly, these results are intended to provide molecular-level support for the structural identification of the PLC and to offer preliminary insights into its potential biological functions. Accordingly, we have narrowed the presentation and interpretation of the single-cell analysis in both the Results and Discussion sections of the revised manuscript.

      In addition, we have expanded the Discussion to address the limitations of current spatial transcriptomic approaches in validating a continuous three-dimensional structure such as the PLC. Most existing spatial transcriptomic methods rely on two-dimensional tissue sections of 8–10 μm thickness, whereas identification of the PLC depends on three-dimensional imaging of tissue volumes with thicknesses of ≥200 μm, making reliable reconstruction of its spatial continuity from single sections challenging. Furthermore, because each spatial transcriptomic capture spot often encompasses multiple adjacent cells, signal mixing effects further limit precise resolution of specific periportal microstructures.

      Overall, we agree with the reviewer’s central point that the limitations of single-cell transcriptomic analyses should be clearly understood by readers. By explicitly clarifying the methodological boundaries and refining the related statements in the main text, we believe this concern has now been adequately addressed in the revised manuscript. We thank the reviewer for identifying this omission, which has helped to improve the rigor and clarity of the study.

      Reviewer #3 (Recommendations for the authors):

      (1) While interesting observations, suitable for discussion, the following sections are speculations, given that no functional characterization of PLC importance has been performed yet. This is the most felt when commenting on the role in hematopoiesis, which transiently takes place in the liver during embryogenesis (Khan et al 2016) but ceases to exist after ligation of the umbilical inlet. Adult Liver hematopoiesis remains controversial, and more solid evidence would need to be presented to support its existence in PLC regions.

      265 - These findings suggest that the Periportal Lamellar Complex (PLC) is not only a morphologically and spatially distinct, low-permeability vascular unit surrounding the portal vein, but also likely serves as a critical nexus connecting the portal vein, hepatic artery, and liver sinusoids. Thus, the PLC constitutes a key node within the interactive vascular network of the mouse liver.

      We thank the reviewer for the comments and suggestions regarding the potential functional interpretation of the periportal lamellar complex (PLC), particularly its possible association with hematopoietic function. We would like to clarify that the statement on page 265 was intended solely to describe the structural characteristics and spatial organization of the PLC within the periportal vascular network. Specifically, the original wording aimed to summarize the morphological features of the PLC and its spatial relationships among the portal vein, hepatic artery, and hepatic sinusoids.

      Nevertheless, to minimize potential misunderstanding, we have revised this section to avoid unnecessary functional implications. The revised text now reads:

      “These results suggest that the periportal lamellar complex (PLC) is a morphologically and spatially distinct vascular structure that surrounds the portal vein and may serve as a key organizational node coordinating the spatial relationships among the portal vein, hepatic artery, and hepatic sinusoids. Accordingly, the PLC represents an important structural element within the interactive vascular network of the mouse liver.”

      This revision preserves the structural significance of the PLC while avoiding overinterpretation of its functional roles.

      (2) The same is true also for this section, following Figure 3 - no functional experiment tested this. For example, diphtheria toxin is expressed in the CD34+Sca1+ population. Or at least a careful mapping of the developing liver, which would indicate if the PLC precedes or follows the BD development.

      356 as a spatial positional cue guiding bile duct growth and branching but also as a regulatory node involved in coordinating bile drainage from the hepatic lobule into the biliary network.

      To avoid potential misunderstanding, we have further refined and revised the statements in the manuscript regarding the functional interpretation of the periportal lamellar complex (PLC) and its relationship to bile duct development. We agree that cell ablation strategies are of great importance for functional validation studies. However, it should be noted that CD34 and Sca-1 are relatively broadly expressed markers during liver development, labeling multiple endothelial, mesenchymal, and progenitor cell populations, and their expression is not restricted to the PLC. Owing to this broad expression pattern, ablation of CD34<sup>+</sup>Sca-1<sup>+</sup> cell populations would likely exert widespread effects on vascular and stromal structures, thereby complicating the distinction between direct PLC-specific effects and secondary developmental alterations. As such, this strategy may present technical limitations for specifically dissecting the role of the PLC in bile duct development. At the same time, given that the primary objective of this study is the systematic characterization of the three-dimensional anatomical features and spatial organization of the PLC, we have correspondingly revised the manuscript to restrict statements regarding the relationship between the PLC and bile ducts to spatial associations supported by the current data. Specifically, our results show that primary bile ducts run along the main portal vein trunk, secondary bile ducts exhibit directed branching toward the PLC region, and terminal bile duct branches tend to spatially cluster in the vicinity of the PLC, thereby forming a reproducible periportal spatial arrangement. Based on these observations, the PLC delineates a relatively conserved anatomical microenvironment within the portal region, whose spatial position is closely associated with the organization and terminal distribution of the intrahepatic bile duct network.

      We believe that these revisions more accurately reflect the experimental evidence and the defined scope of the present study.

      (3) The following statement ought to be rephrased or skipped, considering that CD34 and Sca1 (Ly6a) are markers of periportal endothelial cells (Pietilä et al., 2025, Gómez-Salinero et al., 2022) and as shown by the authors in their own Fig. 6D. In this context and the context of the CCL4 experiments, a "simple" proliferative progenitor portal vein endothelial cell phenotype, suggested also by the presence of DLL4 (Fig5A) and JAG1 (Pietilä et al., 2025) (Benedito et al., 2009) ought to be considered.

      409 Notably, CD34 and Sca-1 (Ly6a) were co-expressed exclusively within PLC structures surrounding the portal vein, but absent from central vein ECs and midzonal LSECs (Figure 4F).

      We thank the reviewer for pointing out the potential imprecision in this wording. We agree that both CD34 and Sca-1 (Ly6a) are well-established markers of periportal endothelial cells, as previously reported (Pietilä et al., 2025; Gómez-Salinero et al., 2022), and as also illustrated in Fig. 4F of our study.

      Accordingly, the original statement suggesting that CD34 and Sca-1 are co-expressed exclusively within the PLC structure may indeed represent an overinterpretation. Following the reviewer’s suggestion, we have revised the relevant text on page 409 by removing the exclusive phrasing (“only in”) and by emphasizing instead that CD34<sup>+</sup>Sca-1<sup>+</sup> endothelial cells are enriched in periportal regions associated with the PLC, rather than being specific to or confined within the PLC.

      In addition, in the context of the CCl<sub>4</sub>-induced liver fibrosis model, we agree with the reviewer that the observed expression of DLL4 and JAG1 under fibrotic conditions is more appropriately interpreted as reflecting an activated or proliferative periportal endothelial progenitor–like phenotype, rather than defining a novel endothelial lineage. The corresponding statements in the revised manuscript have been adjusted accordingly.

      (4) Again, these concluding sentences are based on correlative evidence of mRNA expression and literature but not experimental evidence.

      436 These findings suggest that this unique endothelial cell subset in the periportal region may possess dual regulatory functions in both metabolic and hematopoietic modulation

      441 results suggest that PLC endothelial cells may not only regulate periportal microcirculatory blood flow but also help establish a specialized microenvironment that potentially supports periportal hematopoietic regulation, contributing to stem cell recruitment, vascular homeostasis, and tissue repair.

      We thank the reviewer for this thoughtful comment. We agree that these statements are primarily based on transcriptomic correlation analyses and support from previous literature, rather than direct functional experimental evidence.

      Accordingly, in the revised manuscript, we have appropriately toned down and adjusted the relevant concluding statements to more accurately reflect their inferential nature. The revised wording emphasizes associations and potential involvement, rather than definitive functional roles. These changes preserve the overall scientific interpretation while aligning the level of inference more closely with the available evidence.

      The revised text now reads:

      “Finally, we found that the main trunk of the PLC is primarily composed of CD34<sup>+</sup>Sca-1<sup>+</sup>CD31<sup>+</sup> endothelial cells (Fig. 4J). These CD34<sup>+</sup>Sca-1<sup>+</sup> double-positive cells are mainly distributed in the basal region of the PLC structure and exhibit molecular features associated with hematopoiesis. Taken together, these results suggest that PLC endothelial cells may contribute to the establishment of a local microenvironment related to periportal hematopoietic regulation and may play potential roles in stem cell recruitment and maintenance of vascular homeostasis.”

      (5) The following part is speculative and based on re-analysis from the dataset that was gathered after 6 more weeks of CCL4 treatment (12weeks Su et al., 2021), then in the linked experiments from the manuscript. And should be moved to discussion or removed.

      504 Moreover, single-cell transcriptomic re-analysis revealed significant upregulation of bile duct-related genes in the CD34<sup>+</sup>Sca-1<sup>+</sup> endothelium of PLC in fibrotic liver, with notably high expression of Lgals1 (Galectin-1) and Hgf (Figure 5G). Previous studies have shown that Galectin-1 is absent in normal liver parenchyma but highly expressed in intrahepatic cholangiocarcinoma (ICC), correlating with tumor dedifferentiation and invasion (Bacigalupo, Manzi, Rabinovich, & Troncoso, 2013; Shimonishi et al., 2001). Additionally, hepatocyte growth factor (HGF), particularly in combination with epidermal growth factor (EGF) in 3D cultures, promotes hepatic progenitor cells to form bile duct-polarized cystic structures (N. Tanimizu, Miyajima, & Mostov, 2007). Together, these findings suggest the PLC endothelium may act as a key regulator of bile duct branching and fibrotic microenvironment remodeling in liver fibrosis.

      Collectively, our results demonstrate that the PLC, situated between the portal vein and periportal sinusoidal endothelium, constitutes a critical vascular microenvironmental unit. It may not only colocalize with bile duct branches under normal physiological conditions, but also through its basal CD34<sup>+</sup>Sca-1<sup>+</sup> double-positive endothelial cells, potentially orchestrate bile duct epithelial proliferation, branching morphogenesis, and bile acid transport homeostasis via multiple signaling pathways. Particularly during liver fibrosis progression, the PLC exhibits dynamic structural extension, serving as a spatial scaffold facilitating terminal bile duct migration and expansion into the hepatic parenchyma (Figure 5H). These findings highlight the PLC endothelial cell population and the vascular-bile duct interface as key regulatory hubs in bile duct regeneration, tissue repair, and pathological remodeling, providing novel cellular and molecular insights for understanding bile duct-related diseases such as ductular reaction, cholangiocarcinoma, and cholestatic disorders, and offering potential targets for therapeutic intervention.

      We thank the reviewer for this careful and thought-provoking comment. We understand and agree with the reviewer’s assessment that this section involves a degree of inference, as the analysis is based on a re-analysis of a previously published single-cell transcriptomic dataset from a CCl<sub>4</sub>-induced liver fibrosis model (Su et al., 2021), rather than on experimental data directly generated in the present study.

      In response to the reviewer’s suggestion, we have carefully re-examined and revised the relevant paragraphs. Without altering the overall structure of the manuscript, we have appropriately moderated the wording to clarify that these results primarily describe the transcriptional features of PLC-associated CD34<sup>+</sup>Sca-1<sup>+</sup> endothelial cells under fibrotic conditions, and their associations with bile duct–related gene expression, rather than providing direct functional evidence for their roles in bile duct branching or microenvironmental remodeling.

      In addition, we have explicitly clarified in the main text the data source and methodological limitations of the single-cell transcriptomic analysis, and emphasized that these findings should be interpreted in conjunction with the spatial information revealed by three-dimensional imaging. Through these revisions, we aim to retain the value of this analysis in providing complementary molecular insight into PLC characteristics, while avoiding potential over-interpretation of its functional implications.

      Formal suggestions:

      (6) The following sentence would benefit from being more clearly written.

      263 - The formation of PLC structures in the adventitial layer may participate in local blood flow regulation, maintenance of microenvironmental homeostasis.

      We thank the reviewer for this helpful suggestion. The sentence has been revised to improve clarity by correcting the parallel structure and refining the wording.

      The formation of PLC structures in the adventitial layer may participate in local blood flow regulation and the maintenance of microenvironmental homeostasis.

      (7) The following sentence is misleading as it implies cell sorting, and "subsetted" rather than "sorted" should be used.

      414 Based on this, we sorted CD34<sup>+</sup>Sca-1<sup>+</sup> endothelial populations from the total liver EC pool (Figure 4G).

      Thank you for your comment.

      We have revised the term as suggested. This avoids the misleading implication of physical sorting, as our operation was analytical subsetting of the target subpopulation.

      We appreciate your careful review.

      (8) Correct typos, especially in the results section related to Fig. 6. and formatting issues in the discussion.

      730 Morphologically, the PLC shares features with previously described telocytes (TCs)- 731 a recently identified class of interstitial cells in the liver observed via transmission electron

      We thank the reviewer for pointing out this textual error. In the submitted version, the sentence describing the morphological similarity between the PLC and previously reported telocytes was inadvertently interrupted due to a punctuation issue. This has now been corrected to ensure sentence integrity and consistent formatting.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study by Xu et al. focuses on the impact of clathrin-independent endocytosis in cancer cells on T cell activation. In particular, by using a combination of biochemical approaches and imaging, the authors identify ICAM1, the ligand for T cell-expressed integrin LFA-1, as a novel cargo for EndoA3-mediated endocytosis. Subsequently, the authors aim to identify functional implications for T cell activation, using a combination of cytokine assays and imaging experiments.

      They find that the absence of EndoA3 leads to a reduction in T cell-produced cytokine levels. Additionally, they observe slightly reduced levels of ICAM1 at the immunological synapse and an enlarged contact area between T cells and cancer cells. Taken together, the authors propose a mechanism where EndoA3-mediated endocytosis of ICAM1, followed by retrograde transport, supplies the immunological synapse with ICAM1. In the absence of EndoA3, T cells attempt to compensate for suboptimal ICAM1 levels at the synapse by enlarging their contact area, which proves insufficient and leads to lower levels of T cell activation.

      Strengths:

      The authors utilize a rigorous and innovative experimental approach that convincingly identifies ICAM1 as a novel cargo for Endo3A-mediated endocytosis.

      Weaknesses:

      The characterization of the effects of Endo3A absence on T cell activation appears incomplete. Key aspects, such as surface marker upregulation, T cell proliferation, integrin signalling and most importantly, the killing of cancer cells, are not comprehensively investigated.

      We agree with the reviewer that the effects of EndoA3 depletion on T cell activation were not characterized enough. In new data presented in Fig.S4G-J, we explored additional activation markers and proliferation parameters. We didn’t observe any difference for the surface markers PD-1, CD137 and Tim-3 between LB33-MEL EndoA3+ cells treated with control and EndoA3 siRNAs. Regarding proliferation (Fig. S4J), although the proliferation index seems slightly lower upon EndoA3 depletion, we didn’t observe any significant difference either. Degranulation has also been monitored (Fig. S4K), but we didn’t observe any significant differences. In the new Fig. 3F however, we performed chromium release assays to assess the killing of cancer cells. Very interestingly, we observed an ~15% higher lysis of LB33-MEL EndoA3+ cells after EndoA3 depletion, when compared to the control condition at a ratio of 3:1 T cells:target cells (where the maximal effect is observed). These data are further discussed in the discussion section (new §6-9).

      As Endo- and exocytosis are intricately linked with the biophysical properties of the cellular membrane (e.g. membrane tension), which can significantly impact T-cell activation and cytotoxicity, the authors should address this possibility and ideally address it experimentally to some degree.

      Evaluating changes in the biophysical properties of cancer cell plasma membrane upon EndoA3 depletion is not trivial. An indirect way to address this question is by observing the area and shape of cells after siRNA treatment. In the new data added in the new Fig. S4B-D, we compared the area, aspect ratio and roundness of LB33-MEL EndoA3+ cells treated with negative control or EndoA3 siRNAs. While we observed a slight cell area reduction upon EndoA3 depletion, no significant changes were observed regarding the aspect ratio and the roundness. Hence, we think that the biophysical properties of cancer cells are not drastically modified by EndoA3 depletion.

      Crucially, key literature relevant to this research, addressing the role of ICAM1 endocytosis in antigen-presenting cells, has not been taken into consideration.

      We thank the reviewer for this important point. We have now considered and cited the relevant literature (Discussion, Page no.9).

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Xu et al. studies the relevance of endophilin A3-dependent endocytosis and retrograde transport of immune synapse components and in the activation of cytotoxic CD8 T cells. First, the authors show that ICAM1 and ALCAM, known components of immune synapses, are endocytosed via endoA3-dependent endocytosis and retrogradely transported to the Golgi. The authors then show that blocking internalization or retrograde trafficking reduces the activation of CD8 T cells. Moreover, this diminished CD8 T cell activation resulted in the formation of an enlarged immune synapse with reduced ICAM1 recruitment.

      Strengths:

      The authors show a novel EndoA3-dependent endocytic cargo and provide strong evidence linking EndoA3 endocytosis to the retrograde transport of ALCAM and ICAM1.

      Weaknesses:

      The role of EndoA3 in the process of T cell activation is shown in a cell that requires exogenous expression of this gene. Moreover, the authors claim that their findings are important for polarized redistribution of cargoes, but failed to show convincingly that the cargoes they are studying are polarized in their experimental system. The statistics of the manuscript also require some refinement.

      We fully acknowledge that the requirement for exogenous expression of EndoA3 in our immunological model represents a limitation of our study. Unfortunately, it remains challenging to identify cancer cell lines for which autologous CD8 T cells are available and that endogenously express all molecular players investigated (in particular EndoA3). At this stage, we do not have access to any other cancer cell line/autologous CD8⁺ T cell pairs that are sufficiently well characterized. In future studies, it would be valuable to investigate tumor types with high endogenous EndoA3 expression (such as glioblastomas, gliomas, and head and neck cancers) for which autologous CD8 T cells could be obtained, but this remains technically challenging.

      To address the reviewer’s second point regarding polarized redistribution of cargoes, we have added new data in the new Figure 4 and Movies S8-9. Using high-speed spinningdisk live-cell confocal microscopy, we captured the movement of ICAM1-positive tubulovesicular carriers in cancer cells at the moment of contact with CD8 T cells. Capturing such events is technically challenging, as T cell–cancer cell contacts form randomly and transiently. Successful imaging requires that the cancer cell be well spread and express ICAM1–GFP at an optimal level (as it is transiently expressed as a GFP-tagged construct), while acquisition must occur precisely at the moment when the T cell initiates contact. Despite these technical constraints, we successfully imaged early stages of immune synapse formation, enabling visualization of ICAM1 vesicular transport.

      The data reveal a flux of ICAM1-positive carriers emerging from the perinuclear region (corresponding to the Golgi area) and moving toward the contact site with the CD8 T cell, with fusion events of vesicles occurring at the developing immune synapse. AI-based segmentation and tracking analyses showed that ICAM1-positive carrier trajectories were predominantly oriented toward the forming immune synapse, whereas carriers moving toward other cellular regions were markedly less frequent. These results provide direct evidence for polarized ICAM1 transport via vesicular trafficking toward the immune synapse.

      Reviewer #3 (Public review):

      Summary:

      Shiqiang Xu and colleagues have examined the importance of ICAM-1 and ALCAM internalization and retrograde transport in cancer cells on the formation of a polarized immunological synapse with cytotoxic CD8+ T cells. They find that internalization is mediated by Endophilin A3 (EndoA3) while retrograde transport to the Golgi apparatus is mediated by the retromer complex. The paper is building on previous findings from corresponding author Henri-François Renard showing that ALCAM is an EndoA3dependent cargo in clathrin-independent endocytosis.

      Strengths:

      The work is interesting as it describes a novel mechanism by which cancer cells might influence CD8+ T cell activation and immunological synapse formation, and the authors have used a variety of cell biology and immunology methods to study this. However, there are some aspects of the paper that should be addressed more thoroughly to substantiate the conclusions made by the authors.

      Weaknesses:

      In Figure 2A-B, the authors show micrographs from live TIRF movies of HeLa and LB33MEL cells stably expressing EndoA3-GFP and transiently expressing ICAM-1-mScarlet. The ICAM-1 signal appears diffuse across the plasma membrane while the EndoA3 signal is partially punctate and partially lining the edge of membrane patches. Previous studies of EndoA3-mediated endocytosis have indicated that this can be observed as transient cargo-enriched puncta on the cell surface. In the present study, there is only one example of such an ICAM-1 and EndoA3 positive punctate event. Other examples of overlapping signals between ICAM-1 and EndoA3 are shown, but these either show retracting ICAM1 positive membrane protrusions or large membrane patches encircled by EndoA3. While these might represent different modes of EndoA3-mediated ICAM-1 internalization, any conclusion on this would require further investigation.

      We agree with the reviewer that the pattern of cargoes during endocytosis (puncta vs large patches) as observed by live-cell TIRF microscopy may be confusing. Actually, a punctate pattern has been observed quasi systematically when we monitored the uptake of endogenous cargoes via antibody uptake assays (whatever the imaging approach: TIRF, spinning-disk, classical confocal or lattice light-sheet microscopy). For example:

      - ALCAM: Fig.1e-h, Supplementary Figure 5 and Supplementary Movies 1-3 and 6 in Renard et al. 2020, https://doi.org/10.1038/s41467-020-15303-y; Fig.1D and Movie 2 in Tyckaert et al. 2022, https://doi.org/10.1242/jcs.259623.

      - L1CAM: Fig.2 and 3D, Movies S1-4 in Lemaigre et al. 2023, https://doi.org/10.1111/tra.12883.

      In rare examples, bigger clusters of antibodies were observed, where EndoA3 was observed to surround them, delineate them in a “lasso-like” pattern, and the clusters were progressively taken up:

      - ALCAM: Supplementary Movie 4 in Renard et al. 2020, https://doi.org/10.1038/s41467-020-15303-y.

      However, bigger patches of cargoes were more often observed when uptake was observed using transient expression of GFP-/mCherry-tagged versions of cargoes. In these cases, EndoA3 was predominantly observed to delineate cargo patches as a “lasso-like” pattern, progressively triming those patches leading to endocytosis. For example:

      - L1CAM: Fig.3E, Movie S5-7 in Lemaigre et al. 2023, https://doi.org/10.1111/tra.12883.

      - We also observed this pattern with CD166-GFP (unpublished).

      The fact that we observed rather patches than punctate patterns upon transient expression of fluorescently-tagged constructs of cargoes is likely due to the elevated expression level of the cargoes.

      Therefore, the patchy pattern observed for ICAM1 and ALCAM, transiently expressed in fusion with fluorescent proteins, and surrounded by EndoA3 in Fig.2A-B and old Movies S1-3, is not surprising. Of note, upon anti-ALCAM antibody uptake, we observed a more punctate pattern (Fig.2C), as previously described. Unfortunately, the lower quality of commercial anti-ICAM1 antibody did not allow us to proceed to uptake assays as for ALCAM.

      Regarding Fig.S2 and old Movies S4-5, we agree with the reviewer that these data may be misleading, as they represent phenomena happening at protrusions and contact zones between two adjacent cells. We have now replaced these images with other examples where we avoid contact zones (Fig.S2 and new Movies S5-7).

      These different patterns (patches vs dots) are still unexplained at the current stage, and may indeed represent different modes of endocytosis. We think these various patterns may depend on the abundance/expression level of cargoes and their degree of clustering. This will be investigated in future studies. Still, whatever the pattern, these data demonstrate and confirm the association between EndoA3 and cargoes (such as ICAM1 or ALCAM), even in the absence of antibodies.

      Moreover, in Figure 2C-E, uptake of the previously established EndoA3 endocytic cargo ALCAM is analyzed by quantifying total internal fluorescence in LB33-MEL cells of antibody labelled ALCAM following both overexpression and siRNA-mediated knockdown of EndoA3, showing increased and decreased uptake respectively. Why has not the same quantification been done for the proposed novel EndoA3 endocytic cargo ICAM-1? Furthermore, if endocytosis of ICAM-1 and ALCAM is diminished following EndoA3 knockdown, the expression level on the cell surface would presumably increase accordingly. This has been shown for ALCAM previously and should also be quantified for ICAM-1.

      As correctly pointed by the reviewer, anti-ICAM1 antibody uptake assays would have been great. We have tried to do them many times. Unfortunately, all commercial antibodies we tested did not yield satisfying results in uptake experiments. Either the labeling was too week/non-specific, or the antibody was not effectively stripped from the cell surface by acid washes, i.e. the acid-wash conditions required for efficient stripping were too harsh for the cells to tolerate. We have tried other approaches using the same commercial antibody which do not require acid washes (loss of surface assays by FACS, or uptake assays using surface protein biotinylation) or based on insertion of an Alfa-tag in the extracellular part of ICAM1 by CRISPR-Cas9 and detection of ICAM1 with an antiAlfa-tag nanobody (unpublished approach; collaboration with the lab of Prof. Leonardo Almeida-Souza, University of Helsinki, who developed the approach), but without success. However, we were more successful with the SNAP-tag-based approach to follow retrograde transport, for which the commercial anti-ICAM1 antibody worked properly. In Fig. 1F, we could show that retrograde transport of ICAM1 (and thus most likely its endocytosis step) was significantly decreased upon EndoA3 depletion in HeLa cells, indirectly demonstrating that ICAM1 is effectively an EndoA3-dependent cargo.

      Regarding the fact that surface level of ICAM1 should increase upon perturbation of EndoA3-mediated endocytosis, we agree with the reviewer that this could be an expected result. However, this is not necessarily systematic, as the surface level of a protein cargo is always the result of a balance between its endocytosis, recycling to plasma membrane, and lysosomal degradation. We also have to take into account the neosynthesized protein flux. One must also consider that multiple endocytic mechanisms exist in parallel, and that the perturbation of one mechanism (EndoA3-mediated CIE, here) may be partially compensated by others, as cargoes can often be taken up via multiple endocytic doors. Hence, an increased abundance at the cell surface is not always guaranteed upon endocytosis perturbation. Anyway, we measured the cell surface level of both ICAM1 and ALCAM in LB33-MEL EndoA3+ cells treated with negative control or EndoA3 siRNAs (Fig. S4E-F). Only minor differences were observed.

      In Figure 4A the authors show micrographs from a live-cell Airyscan movie (Movie S6) of a CD8+ T cell incubated with HeLa cells stably expressing HLA-A*68012 and transiently expressing ICAM1-EGFP. From the movie, it seems that some ICAM-1 positive vesicles in one of the HeLa cells are moving towards the T cell. However, it does not appear like the T cell has formed a stable immunological synapse but rather perhaps a motile kinapse. Furthermore, to conclude that the ICAM-1 positive vesicles are transported toward the T cell in a polarized manner, vesicles from multiple cells should be tracked and their overall directionality should be analyzed. It would also strengthen the paper if the authors could show additional evidence for polarization of the cancer cells in response to T-cell interaction.

      A similar point was raised by reviewer #2. We have revised this section accordingly. In the new Fig. 4 and Movies S8-9, we replaced the live-cell Airyscan confocal data with highspeed spinning-disk confocal imaging data, enabling a more accurate analysis of cargo polarized redistribution and at a higher time resolution.

      Using this approach, we captured the movement of ICAM1-positive tubulo-vesicular carriers in cancer cells at the moment of contact with CD8 T cells. Capturing such events is technically challenging, as T cell–cancer cell contacts form randomly and transiently. Successful imaging requires that the cancer cell be well spread and express ICAM1–GFP at an optimal level (as it is transiently expressed as a GFP-tagged construct), while acquisition must occur precisely at the moment when the T cell initiates contact. Despite these technical constraints, we successfully imaged early stages of immune synapse formation, enabling visualization of ICAM1 vesicular transport.

      The data reveal a flux of ICAM1-positive carriers emerging from the perinuclear region (corresponding to the Golgi area) and moving toward the contact site with the CD8 T cell, with fusion events of carriers occurring at the developing immune synapse.

      AI-based segmentation and tracking analyses showed that ICAM1-positive carrier trajectories were predominantly oriented toward the forming immune synapse, whereas carriers moving toward other cellular regions were markedly less frequent. These results provide direct evidence for polarized ICAM1 transport via vesicular trafficking toward the immune synapse.

      Finally, in Figures 4D-G, the authors show that the contact area between CD8+ T cells and LB33-MEL cells is increased in response to siRNA-mediated knockdown of EndoA3 and VPS26A. While this could be caused by reduced polarized delivery of ICAM-1 and ALCAM to the interface between the cells, it could also be caused by other factors such as increased cell surface expression of these proteins due to diminished endocytosis, and/or morphological changes in the cancer cells resulting from disrupted membrane traffic. More experimental evidence is needed to support the working model in Figure 4H.

      Regarding the cell surface expression of both ICAM1 and ALCAM, as already explained above, only minor differences were observed (Fig. S4E-F). Regarding morphological changes of cancer cells upon EndoA3 depletion (Fig. S4B-D), we compared the area, aspect ratio and roundness of LB33-MEL EndoA3+ cells treated with negative control or EndoA3 siRNAs. While we observed a slight cell area reduction upon EndoA3 depletion, no significant changes were observed regarding the aspect ratio and the roundness. Cancer cell morphology is thus not drastically modified by EndoA3 depletion. All these new data are now discussed in the manuscript.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers discussed the paper and all agreed it was incomplete in supporting the conclusions. Additional data needed to support the conclusions were:

      (1) Better characterisation of Endo3A-expressing and knock-down cells such as morphology, ICAM-1, and ALCAM surface levels to name two parameters.

      As discussed above, we have now added new data addressing these points:

      - Morphology: Fig. S4B-D

      - ICAM1 and ALCAM surface levels: Fig. S4E-F These new data are discussed in the main text.

      (2) Better characterisation of the ICAM-1 polarisation process. Does this require interaction with LFA-1 can ICAM-1 be delivered to the synapse without this?

      As discussed above, we have now added new data better addressing the characterization of ICAM1 polarized trafficking to the immune synapse, that can be found in the new Fig. 4 (high-speed spinning-disk confocal imaging of ICAM1 trafficking upon conjugate formation between CD8 T cell and cancer cell). The text has been modified accordingly. The dependency on LFA-1 has not been addressed directly, but we may suppose it is indeed important as (i) it has already been addressed in other cellular systems by previous studies (Jo et al. 2010), and (ii) we observed a denser flux of ICAM1-positive carriers in the cancer cell toward regions involved in immune synapses with CD8 T cells, than other regions. As we didn’t address this question more directly in our study, we briefly mentioned this point in the Discussion section.

      (3) Better characterisation of T cell response- activation markers, cytotoxicity assays.

      As discussed above, we have now added new data addressing these points:

      - Cell surface activation markers: Fig. S4G-I

      - Proliferation: Fig. S4J

      - Degranulation: Fig. S4K

      - Cytotoxic activity: Fig. 3F

      These new data are discussed in the main text.

      (4) Citing relevant literature.

      The relevant literature (in particular the paper by Jo et al. 2010) is now cited and discussed.

      (5) Number of donors evaluated - is it true there was only one blood donor? For human studies better to have key results on >4 donors.

      Our immunological working model indeed originates from a single patient (Baurain et al., 2000), from whom both a cancer cell line (LB33-MEL) and autologous CD8 T cells were derived. These CD8 T cells specifically recognize an HLA molecule presenting a defined antigenic peptide (MUM-3) on the surface of the cancer cells. This provides us with a unique and fully natural experimental system that allows us to faithfully reconstitute cytotoxic T lymphocyte (CTL)-mediated killing of cancer cells in vitro.

      Using CD8 T cells from other donors would not be meaningful in this context, as they would not recognize the LB33-MEL cells. Conversely, testing the same CD8 T cells on other cancer cell lines requires engineering these lines to express the appropriate HLA molecule and to be exogenously pulsed with the correct antigenic peptide – which is precisely what we did with the HeLa cell line.

      Therefore, increasing the number of donors would require obtaining both cancer cell lines and CD8 T cells from each donor, ideally with evidence that the donor’s T cells recognize their own tumor cells. This is technically challenging and not trivial, although it would indeed be highly valuable to diversify immunological models in future studies.

      Importantly, the high specificity of our autologous co-culture system, where cancer cells interact with their naturally matched CD8 T cells, offers clear advantages over commonly used in vitro models such as Jurkat (T) and Raji (B) cell lines, which rely on artificial stimulation with a superantigen to enforce immunological synapse formation and T cell activation.

      (6) How does the binding of antibodies to ICAM-1 and ALCAM impact their trafficking?

      As IgG antibodies are bivalent and can bind two target antigens, they may induce clustering, which could in turn affect endocytosis. To address this concern, we performed an uptake assay based on surface protein biotinylation using a cleavable biotin reagent (with a reducible linker). Briefly, after allowing endocytosis for different time intervals, cell surface–exposed biotins were removed by treatment with the cellimpermeable reducing agent MESNA, while internalized (endocytosed) biotinylated proteins remained protected. These internalized proteins were then recovered by affinity purification on streptavidin resin and analyzed by Western blot to detect the protein of interest.

      Importantly, this uptake assay can be performed in the absence or presence of an anticargo antibody, allowing assessment of its potential influence on endocytosis. Author response image 1 shows the results for ALCAM uptake in HeLa cells, with and without anti-ALCAM antibody:

      Author response image 1.

      Antibody binding to an extracellular epitope of ALCAM increases its endocytosis. HeLa cellsurface proteins were biotinylated on ice using EZ-Link Sulfo-NHS-SS-Biotin (Pierce) and then incubated at 37 °C for the indicated times to allow endocytosis. Internalization was assessed in the absence or presence of an anti-ALCAM antibody (Ab) added to the extracellular medium. Endocytosis was stopped by returning the cells to ice, and surface-exposed biotin was removed by treatment with the cell-impermeable reducing agent MESNA. Internalized, MESNA-resistant biotinylated proteins were affinity-purified on streptavidin resin and analyzed by Western blot to detect ALCAM. The “unstripped” condition shows the total amount of ALCAM at the cell surface at the beginning of the experiment (signal at ~95 kDa). Quantification of the time course (normalized to the no-antibody condition) shows increased ALCAM endocytosis in the presence of antibody at 15 and 30 min. Blot is representative of two independent experiments; quantifications include data from both experiments.

      We observed that the anti-ALCAM antibody slightly enhanced ALCAM uptake. A similar experiment was attempted for ICAM1, but we were unable to detect the protein by Western blot using the available commercial antibody.

      Although this outcome was expected, it highlights a potential caveat in using antibodies to monitor endocytosis. Alternative tools such as nanobodies, while monovalent and theoretically less perturbing, are not yet available for many cargo proteins and may still influence cargo conformation or dynamics. Therefore, antibodies remain the current gold standard in endocytosis studies. Nevertheless, data obtained with antibodies should always be validated by complementary approaches that do not rely on antibody binding, as we have done in this study (e.g. live-cell imaging of fluorescently tagged proteins).

      The work is of interest and we look forward to your response/revision.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Thank you for submitting your manuscript which I had the pleasure to review. While I enjoyed your work, I feel that it would strongly benefit by addressing the following points:

      (1) In-depth characterization of T cell responses upon Endo3A depletion: The characterization should be expanded to include surface marker upregulation, T cell proliferation, and, most importantly, tumor cell cytotoxicity. I was wondering if the incomplete characterization of T-cell responses is due to limited supplies of antigenspecific T-cells? My understanding is that these cells have been derived from a single patient. This also raises concerns in terms of reproducibility as all data are practically from a single biological replicate. My suggestion would be to use an additional system of specific cell-cell contacts to complement the current findings. For instance, HeLa cells could be transfected to express CD19 or EpCAM, for both of which bispecific T cell engagers (Invivogen) exist that would allow specific contact formation, thereby allowing the study of the effect of Endo3A depletion across T cells from different donors and through a more complete set of assays.

      We refer the reviewer to our responses above, where these points have been addressed in detail. We sincerely thank the reviewer for the excellent suggestion of transfecting HeLa cells with CD19 or EpCAM and using bispecific T-cell engagers. However, after careful consideration, we concluded that this approach falls outside the scope of the present study, which was specifically designed to investigate the most natural system, cancer cells and their autologous CD8 T cells. We nevertheless appreciate this insightful suggestion and will certainly consider it for future studies.

      (2) Alterations in membrane tension as an alternative explanation: Endo- and exocytosis have been found to influence the biophysical properties of cells, such as membrane tension (e.g., Djakbaravo et al., 2021, PMID: 33788963), which in turn influences their susceptibility to cytotoxic T cells with lower tension corresponding to reduced cytotoxicity (e.g., Basu & Whitlock, 2016, PMID: 26924577). Thus, interference with endocytic pathways could arguably lead to changes in membrane tension that could contribute to the observed effects. These possible effects should be discussed and addressed experimentally to a degree. While measuring membrane tension directly requires specialized expertise (e.g., tether pulling experiments) and is not within the scope of this study, membrane tension affects cell spreading and actin organization. Thus, I would suggest conducting a thorough comparative phenotypical and morphological characterization of the Endo3A+ and Endo3A- cancer cells to estimate the possible effect of changes in membrane tension (if any) on the results.

      We refer the reviewer to our responses above, where these points have been addressed in detail. New data have been added and the text of our manuscript has been modified accordingly.

      (3) Citation and consideration of earlier work: Jo & Kwon et al., 2010 (PMID: 20681010) have previously shown that ICAM1 undergoes clathrin-independent recycling and repolarization to the immunological synapse in APCs. Furthermore, they provided evidence that actin-based transport, but not lateral diffusion, together with recycling is crucial for the repolarization of ICAM1 to the immunological synapse. This important earlier work has to be cited. Actin-based transport on the cell surface has not been considered in the current manuscript. In light of these earlier findings, it is unclear in Figure 4A if ICAM1 is delivered to the T cell from within- or from the surface of the cancer cell. I would suggest changing the imaging modalities in this experiment to be able to differentiate cell surface from internal ICAM1, e.g., by detaching the cancer cells from the surface as has been done in Fig. 4B, E, and F.

      We refer the reviewer to our responses above, where these points have been addressed in detail. New data have been added and the text of our manuscript has been modified accordingly.

      Reviewer #2 (Recommendations for the authors):

      Major comments:

      (1) The authors should be more careful with their claims about the importance of their results for cell polarity as their evidence for this is scarce (i.e. The live-cell imaging in Figure 4A is not quantified and the ICAM1 polarization effect shown in figure 4B-C is, albeit significant, small and not very convincing).

      We refer the reviewer to our responses above, where these points have been addressed in detail. New data have been added and the text of our manuscript has been modified accordingly.

      (2) The absence (or very low expression) of EndoA3 on the LB33-MEL cell suggests that EndoA3-mediated recycling of immune synaptic components is not required for T-cell activation. The fact that EndoA3 exogenous expression in LB33-MEL cells leads to increased cytokine production in T cells is, however, interesting.

      We fully agree with the reviewer’s observation. Although EndoA3 is not expressed in some cellular contexts, its cargoes may still be present. It is therefore reasonable to assume that alternative endocytic mechanisms can compensate for its absence. It is now widely accepted that many cargoes can be internalized through multiple endocytic routes, and that the relative contribution of each pathway depends strongly on the cellular and physiological context.

      For example, we have shown that ALCAM and L1CAM, although primarily internalized via clathrin-independent pathways, present a minor fraction (< 25%) undergoing clathrinmediated endocytosis (Renard et al., 2020; Lemaigre et al., 2023). Moreover, we observed that inhibition of macropinocytosis enhances EndoA3-mediated endocytosis of ALCAM, indicating a crosstalk between specific EndoA3-mediated clathrin-independent endocytosis (CIE) and non-specific macropinocytosis (Tyckaert et al., 2022).

      Thus, even in the absence of EndoA3, its cargoes are likely internalized through alternative endocytic routes. Nonetheless, our data clearly demonstrate that EndoA3 expression markedly enhances the endocytosis and intracellular trafficking of its cargoes, ultimately leading to modified CD8 T cell responses.

      (3) For the statistics in bar graphs (graphs 1C, D, E &F; 3E, 3F, S1C-I, and S3C), one cannot have all values for controls simply normalized to 1. This procedure hides the variance for the controls between each replicate and makes any statistics meaningless.

      We thank the reviewer for this important remark. Regarding Figures 1C–F, S1C–I, and S3C, which correspond to quantifications from Western blots, it is standard practice to normalize the quantification to a control condition set to 1 (or 100%). Absolute signal intensities cannot be directly compared across different blots due to the variability inherent to this semi-quantitative technique. For this reason, we chose to keep the data presented in normalized form. However, we agree that this type of data require the careful choice of a convenient statistical analysis approach. Here, we choose one-sample T tests, allowing to test the hypothesis that the various siRNA conditions are different from 100% (the normalized value of the siCtrl condition). We adapted the statistical analysis accordingly in the different figures mentioned.

      Regarding old Figures 3E–F (now Fig. 3E and 3G), which correspond to IFNγ secretion assays, we agree that representing IFNγ secretion as a fold change relative to a control condition may obscure inter-experimental variability. However, this format was intentionally chosen to facilitate data interpretation, as IFNγ secretion was quantified by ELISA and also displayed inter-experimental variability. For completeness, we now provide below the corresponding graphs showing absolute IFNγ concentrations, which retain the information on inter-experimental variability (Author response image 2). As you can see, the overall conclusions remain unchanged.

      Author response image 2.

      IFNg secretion data corresponding to Fig. 3E and 3G, expressed in absolute values (pg/mL)

      Minor comments:

      (1) What happens to surface and total levels of ICAM1 and ALCAM in the retromer or EndoA3 knockdown/overexpression conditions? This information would put the effects described into context.

      We refer the reviewer to our responses above, where these points have been addressed in detail. New data have been added and the text of our manuscript has been modified accordingly.

      (2) The authors should clearly indicate that BFA means bafilomycin A in the figure legend or methods.

      BFA corresponds to Brefeldin A. We have now clarified this information in legends and methods.

      (3) In the sentence: "These data demonstrate that retromer-mediated retrograde transport is critical for trafficking ALCAM and ICAM1 to the Golgi and that this process requires the full secretory capacity of the TGN." What do the authors mean by full secretory capacity?

      We have modified the sentence: “Together, these data demonstrate that retromermediated retrograde transport is critical for trafficking ALCAM and ICAM1 to the Golgi and that this process requires efficient secretion from the TGN (as evidenced by the involvement of Rab6).”

      (4) The method used for retrograde transport seems to be a variation of the original protocol (reference 43). The manuscript would benefit from a thorough explanation of this assay, rather than citing the original protocol.

      We did not modify the original SNAP-tag–based protocol used to monitor retrograde transport. A comprehensive methodological paper has been published (ref. 44), and we have followed it strictly. Additionally, we briefly summarized the rationale of the approach in Figure 1A and in the first paragraph of the Results section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their manuscript, Richter and colleagues comprehensively investigate the cell wall recycling pathway in the model alphaproteobacterium Caulobacter crescentus using biochemical, imaging, and genetic approaches. They clearly demonstrate that this organism encodes a functional peptidoglycan recycling pathway and demonstrate the activities of many enzymes and transporters within this pathway. They leverage imaging and growth assays to demonstrate that mutants in peptidoglycan recycling have varying degrees of beta-lactam sensitivity as well as morphological and cell division defects. They propose that, rather than impacting the levels or activity of the major beta-lactamase, BlaA, defects in PG recycling lead to beta-lactam sensitivity by limiting the availability of new cell wall precursors. The findings will be of interest to those in the field of bacterial cell wall biochemistry, antibiotics and antibiotic resistance, and bacterial morphogenesis.

      Strengths:

      Overall, the manuscript is laid out logically, and the data are comprehensive, quantitative, and rigorous. The mutants and their phenotypes will be a valuable resource for Caulobacter researchers.

      Thank you for this positive evaluation. Previous work has mostly focused on the role of PG recycling in the regulation of ampC expression. However, our study and recent work in A. tumefaciens (Gilmore & Cava, 2022) and C. crescentus (Modi et al, 2025) demonstrates that β-lactam resistance is heavily influenced by PG recycling and the metabolic state of the cell, even in the presence of high levels of β-lactamase activity. It is likely that these effects are not limited to the two alpha­proteo­bacterial species investigated to date but may be more widely applicable. Therefore, we believe that our results are relevant beyond the Caulobacter field and may help to stimulate similar analyses in other, medi­cally more relevant species.

      Weaknesses:

      The only major missing piece is the complementation of mutants to demonstrate that loss of the targeted gene is responsible for the observed phenotypes.

      In our initial manuscript, we showed that the replacement of the native AmiR and NagZ genes with mutant alleles encoding catalytically inactive variants of the two proteins gave rise to the same pheno­types as gene deletions. This finding indicates that the defects observed were due to the loss of AmiR or NagZ activity, respectively. To rule out artifacts from polar effects, we have now also conducted the requested complementation analysis for the ΔampG, ΔamiR and ΔnagZ mutants. The results obtained show that deletion mutants carrying an ectopically expressed wild-type gene copy behave essentially like the wild-type strain, thereby verify­ing the validity of our conclusions (new Figure 4-figure supple­ment 1).

      Reviewer #2 (Public review):

      Summary:

      Pia Richter et al. investigated the peptidoglycan (PG) recycling metabolism in the alpha-proteobacterium Caulobacter crescentus. The authors first identified a functional recycling pathway in this organism, which is similar to the Pseudomonas route, and they characterized two key enzymes (NagZ, AmiR) of this pathway, showing that AmiR differs in specificity from the AmpD counterpart of E. coli. Further, they studied the effects of deletions within the PG recycling pathway (ampG, amiR, nagZ, sdpA, blaA, nagA1, nagA2, amgK, nagK mutants), showing filamentation and cell widening, thereby revealing a link between PG recycling and cell division. Finally, they provide a link between PG recycling and beta-lactam sensitivity in C. crescents that is not caused by activation of a beta-lactamase, but rather is a result of reduced supply of PG building blocks increasing the sensitivity of penicillin-binding proteins.

      Strengths:

      This work adds to the understanding of the role of PG recycling in alpha-proteobacteria, which significantly differ in their mode of cell wall growth from the better studied gamma-proteobacteria.

      Thank you for pointing out the relevance of our work. As mentioned above, we believe that our work goes beyond understanding the PG recycling pathway in alphaproteobacteria. Importantly, together with previous work, our results demonstrate a so-far largely neglected critical role of PG recycling in β-lactam resistance that goes beyond the mere regula­tion of β-lactamase gene expression. It will be interesting to determine the conservation of this phenomenon among other bacteria and to see whether blocking PG recycling could represent a potential strategy to combat β-lactam resistant pathogens.

      Weaknesses:

      The findings are not entirely novel as recent studies by Modi et al. 2025 mBio (studying C. crescentus) and Gilmore & Cava 2022 Nat. Commun. (studying Agrobacterium tumefaciens) came to similar conclusions.

      Gilmore & Cava have made the seminal finding that blocking anhydro-muropeptide import affects cell wall integrity in a manner that is partly independent of its effect on ampC expression. We now extend this finding by investigating various critical steps in the PG recycling pathway of C. cres­centus, a species lacking an AmpC homolog. Interestingly, by characterizing a variety of different mutants, we show that the morphol­ogical and ampicillin resistance defects they exhibit are not strictly con­nected and vary substantially between strains, suggesting that different steps in PG recycling differ in their importance for cellular fitness and cell wall integrity. This finding suggests that the phenotypes observed are not simply determined by the efficiency of PG recycling but likely result from a combination of factors. Based on the results obtained, we propose a model that highlights the different factors that may be at play and suggests a mechanism explaining their effects on β-lactam resistance and cell division. Our findings partly overlap with the recent study by Modi et al., but there are various points in which we disagree with their findings and conclusions. The need to rigorously validate our differing results led to a signi­ficant delay in the submission of our manuscript.

      Reviewer #1 (Recommendations for the authors):

      Major Comment

      Genetic complementation is lacking for deletion mutants throughout. Could you please provide complemented strains for mutants in key figures where deletion phenotypes are central to the conclusions (e.g., Figure 4 and related supplements).

      As explained above, we have not performed the requested comple­mentation experiments and included the data as Figure 4-figure supplement 1.

      Other minor comments:

      (1) Figure 1

      (a) This is a busy schematic; please consider visually separating PG biosynthesis vs. recycling (e.g., a faint divider line or shaded boxes).

      We have now simplified the schematic and visually separated the PG recycling and de novo biosyn­thesis pathways.

      (b) Please label "Fructose-6-phosphate" and "Glucosamine-6-phosphate (GlcN-6-P)" on the figure, since they are referenced in the caption (line 1410).

      The symbols for fructose, glucosamine and phosphate are given in the legend on the right. For consistency, we would therefore prefer not to additionally label these compounds in the figure.

      (c) Define all abbreviations in the caption: CM, GTase, TPase; and clarify the legend conventions (e.g., bold vs. regular font; red vs. black text).

      The structure of PG and the different lytic enzymes have now been removed from Figure 1. All remaining abbreviations have now been defined in the legend.

      (2) Figure 2 - Figure Supplement 2

      (a) Panel B: Please include the full chromatogram (it seems to be cropped at 10 min?). For AmiR in particular, it is important to show there are no nearby peaks at earlier retention times (eg GlcNAc).

      The region before 10 min is cropped in many published muropeptide profiles because the peaks contained in it are known to correspond to salts, i.e., borate from the reduction step and phos­phate, which are poorly retained on the C18 column (Figure 2–figure supplement 2). As the reviewer stated, free GlcNAc would elute in this region and would not be recognized if it were produced by AmiR. However, AmiR cleaves free anhydro-muropeptides between anhMurNAc and the peptide, and the experiment in Figure 2–figure supplement 2 shows that it does not cleave the bond between MurNAc and peptides in intact peptidoglycan.

      (b) Caption line 1439: with AmiR OR the catalytically...

      Done.

      (3) Figure 3

      Panel A: Label the products as NagZ-treated.

      In this analysis, we quantify specific intermediates from the total cellular pool of PG recycling inter­mediates. Since the products were not specifically treated with NagZ, we would prefer to keep the figures as it is.

      (4) Figure 4 (and Fig. 4-Figure Supplement 1, 2)

      (a) Please add complemented strains for ΔampG, ΔamiR, and ΔnagZ under the same conditions.

      As described in more detail above, we have now performed the requested complementation analysis.

      (b) Figure 4 - Figure S1 - Please include images of all strains quantified in B (e.g. control WT).

      Done.

      (c) Figure 4 - Figure S2: A. Please include images of all strains quantified in B. Please include spotting dilutions on minimal medium to assess the importance of PG recycling under nutrient limitation, especially given apparent lysis in ΔamiR and ΔampG.

      The length distributions of cells grown in PYE medium are taken from Figure 3 and only shown for comparison (as mentioned in the figure legend). To avoid the duplication of images, we would prefer to keep panel A as it is.

      We have now performed the requested serial-dilution spot assay on minimal (M2G) medium. The results show that ampicillin resistance de­creases even more dramatically for all strains in this condi­tion. The new data are presented in Figure 4-figure supplement 3C.

      (d) Figure 4 - Figures S3: A and B. Please include WT control.

      We have now added images of the wild-type strain to panel B of this figure. The serial dilution spot assays shown in panel A were performed on the same plates as those depicted in Figure 4 (as men­tioned in the figure legend). To avoid the duplication of images, we would prefer to keep this panel as it is.

      (5) Figure 5

      A, C - please include images of WT control.

      We have now added images of the wild-type strain to panel A of this figure. The serial dilution spot assays shown in panel C were performed on the same plates as those depicted in Figure 4 (as men­tioned in the figure legend). To avoid the duplication of images, we would prefer to keep this panel as it is.

      (6) Figure 6:

      (a) A, C - please include images of WT control.

      We have now added images of the wild-type strain to panel A of this figure. The serial dilution spot assays shown in panel C were performed on the same plates as those depicted in Figure 4 (as men­tioned in the figure legend). To avoid the duplication of images, we would prefer to keep this panel as it is.

      (b) It would be informative to test ΔamgK and ΔanmK on minimal medium (spotting and/or growth curves) to position these steps within the nutrient-dependent fitness landscape.

      We have now analyzed the ampicillin sensitivity of the ΔamgK, ΔnagK and ΔamgK ΔnagK strains on minimal medium (see Author response image 1). Consistent with the results obtained for other mutants in the PG recycling pathway, growth on minimal (M2G) medium plates leads to increased ampicillin sensi­tivity of the ΔamgK mutant. By contrast, ΔnagK and, to a lesser extent, ΔamgK ΔnagK cells show an in­creased tolerance to ampicillin under these conditions compared to growth on PYE plates.

      This phenomenon may be explained by the strong stimulatory effect of GlcNAc-6-P on NagB acti­vity. In the absence of NagK, GlcNAc-6-P levels drop, leading to reduced activation of NagB1/2. This effect, combined with abundant glucose to support central carbon metabolism may promote the GlcN-6-P biosynthesis through GlmS, thereby increasing the flux of meta­bol­ites into the de novo PG biosynthesis pathway and thus boosting ampicillin tolerance. However, more re­search is required to fully under­stand the molecular basis of this effect. Given that the results are likely to reflect complex interactions bet­ween dysregulated enzyme activity and altered metabolite pools caused by increased glucose avail­ability, they provide only limited insight into the role of PG recycling in ampicillin resistance. We therefore propose excluding this experiment from the present manuscript to avoid confusion.

      Author response image 1.

      Serial-dilution spot assay investigating the ampicillin resistance of the indicated mutant strains on minimal (M2G) medium plates.

      (c) Could Figures 6 and 7 be combined for better comparison and since there is no WT control? If so, could you also include the MurNAc cytoplasmic level quantification for the double mutant (Figure 7)?

      We would prefer to keep the two figures separated to avoid creating an overly large figure that contains a total of nine panels. However, we have now included an additional panel in Figure 7 show­ing the levels of MurNAc in the double mutant.

      (7) Figure 7. A, C

      Please include images of WT control.

      We have now added images of the wild-type strain (now panel B). The serial dilution spot assays (now panel D) were performed on the same plates as those depicted in Figure 4 (as men­tioned in the figure legend). To avoid the duplication of images, we would prefer to keep this panel as it is.

      (8) Figure 8-S1D, F

      Please include images of WT control.

      Panel F of this figure already contains a wild-type control.

      (9) Figure 10 A, C

      Please include images of WT control and ∆amiR (A).

      Done.

      (10) Figure 11

      Consider adding or highlighting in this figure (in a simplified manner) the major PG recycling differences in Caulobacter? The current model doesn't really show any difference that is unknown.

      This figure presents a model of the mechanism underlying the increased β-lactam sensitivity of PG recycling-deficient cells. Since the PG recycling pathway of C. crescentus is already presented in detail in Figure 1, we would like to keep this figure simple and thus leave it as it is.

      (11) Comments by lines:

      (a) Line 192: Clarify that NagZ is also part of the rate-limiting step since there is no difference between AmiR or NagZ order of hydrolysis?

      We have now omitted the statement that AmiR catalyzes the rate-limiting step in the PG recycling process, because our data do not allow definitive conclusions on this point.

      (b) Line 201: Define "considerable fraction" since this is known, please and cite original reference(s).

      Done.

      (c) Line 203: Please also cite the primary papers where they have found that disruption of the PG recycling pathway in E. coli and P. aeruginosa doesn't result in morphological defects.

      Since there are a number of papers that report PG recycling-deficient mutants of E. coli and P. aeru­ginosa, we would like to keep citing reviews to support this statement. However, we have now addi­tionally included a review by Park & Uehara (2008), which provides a detailed overview of PG recycling in bacteria.

      (d) Line 220-223: Though there are no obvious morphological defects, several mutants (e.g., ΔamiR, ΔampG) appear to be lysing or stressed under minimal conditions. Could you include spotting assays and/or growth curves on minimal medium (Figure 4, Figure S2) to quantify fitness under nutrient limitation?

      Have performed the requested serial dilution spot assays on minimal (M2G) medium plates and now present the data obtained in Figure 4-figure supplement 3C.

      (e) Line 224: PG recycling has been found to contribute to the regulation of B-lactam resistance in several organisms, not just those two. Perhaps add "including C. freundii and P. aeruginosa"

      Done.

      (12) Typographical errors:

      (a) Line 284: "caron" should be carbon.

      Done.

      (b) Line 323: "Figure C" needs a figure number.

      Done.

      (c) Line 33: "regulaton" should be regulation.

      Done.

      Reviewer #2 (Recommendations for the authors):

      (1) The study is well conducted and describes a number of experiments that significantly deepen previous findings. The conclusions of this paper are mostly well supported by data, but some experiments and data analysis may need to be clarified and extended.

      Thank you for this positive evaluation.

      (2) The data presented in Figures 2B and 2C show activities of AmiR and NagZ using LTase-cleaved cell wall preparations. Unfortunately, the preparations tested with the two enzymes should be identical, but apparently are not. Why aren't identical preparations used?

      We are sorry for the confusion. As stated in the Methods section (page 28, lines 757 and 773), the AmiR activity assays used LT products from PG sacculi isolated from E. coli D456, whereas the NagZ activity assays used LT-products from PG sacculi isolated from E. coli CS703-1. Both strains have a higher penta­peptide content than wild-type E. coli D456 lacks PBPs 4, 5 and 6 and has a moderate level of pentapeptides. CS703-1 lacks PBPs 1a, 4, 5, 6, 7 as well as AmpC and AmpH, and is known to have a higher pentapeptide content than D456. These differences are the reason for the distinct muro­peptide profiles in panel B and C of Figure 2.

      (3) I am missing a control experiment where muropeptides treated with NagZ were further digested with AmiR? This would show whether AmiR is able or not to cleave MurNAc-peptides. This is not evident from the provided experiments.

      We have now tested the activity of AmiR towards anhMurNAc-tetrapeptide in vitro. The results show that AmiR efficiently cleaves this GlcNAc-free anhydro-muropeptide species, verifying that it can also act on turnover products that have been previously processed by NagZ. The new data are shown in Figure 2–figure supplement 5.

      (4) The claim that PG recycling is critical, particularly upon transition to the stationary phase and under nutrient limitation, is not justified. It conflicts with the obvious morphological effects also in the exponential phase and with the absence of morphological defects in minimal medium: pronounced defects in rich PYE medium (Figure 4A/B) disappear in minimal M2G medium (Figure 4_figure supplement 2). It seems that catabolite repression effects apply here. Is the morphological effect in rich PYE medium reversed by adding glucose?

      We agree that PG recycling is not considerably more important in stationary phase and have removed this statement. Interestingly, while PG recycling-deficient mutants show no obvious mor­phol­ogical defects in minimal (M2G) medium, their ampicillin sensitivity even increases under this condi­tion (new Figure 4-figure supplement 3C), confirming that morphological and resistance defects are not strictly coupled. Preliminary data indicate that the morphological defects of the mutant cells are also abolished upon growth in PYE+glucose medium. High glucose availability may promote increased de novo synthesis of PG precursors, thereby partially restoring the PG precursor pool. We propose that the morphological and resistance phenotypes develop at different degrees of PG precursor depletion. However, future research is required to clarify the precise molecular basis of this phenomenon.

      (5) Figure 4: Why is the contribution of AmpG to ampicillin resistance much lower than for amiR or nagZ, despite ampG mutants showing the largest morphological defects? Does the accumulation of UDP-MurNAc or UDP-MurNAc-peptide correlate with ampicillin resistance, whereas the morphological effects correlate with the lack of precursors?

      The exact reason why the ΔampG mutant shows such a strong discrepancy in the severity of its morphol­ogical and resistance defects compared to the ΔamiR and ΔnagZ mutants remains unclear, because all of these deletions completely block the recycling of anhydro-muropeptides. The major difference in the ΔampG mutant is its inability to import anhydro-muropeptides, causing their accu­mu­lation in the periplasm. We propose that periplasmic anhydro-muropeptides, in particular the penta­peptide-containing species, can interact with the substrate-binding sites of PG metabolic enzymes, thereby interfering with proper PG biosyn­thesis. Conversely, by interacting with transpep­tidases, they may reduce their accessibility to ampicillin and thus preserve their acti­vity under β-lactam stress, particularly under conditions in which low PG precursor availability reduces binding site occupancy and thus facilitates antibiotic association.

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This important study introduces an advance in multi-animal tracking by reframing identity assignment as a self-supervised contrastive representation learning problem. It eliminates the need for segments of video where all animals are simultaneously visible and individually identifiable, and significantly improves tracking speed, accuracy, and robustness with respect to occlusion. This innovation has implications beyond animal tracking, potentially connecting with advances in behavioral analysis and computer vision. The strength of support for these advances is compelling overall, although there were some remaining minor methodological concerns.

      To tackle “minor methodological concerns” mentioned in the Editorial assessment and Reviewer 3, the new version of the manuscript includes the following changes:

      a) The new ms does not anymore use the word “accuracy” but “IDF1 scores”. See, for example, Lines 46, 161, 176, and 522 for our new wording as “IDF1 scores”.

      b) Instead of comparing softwares using mean accuracy over the benchmark, Reviewer 3 proposes to use medians or even boxplots. We now provide boxplot results with mean, median, percentiles and outliers (Figure 1- figure Supplement 2).

      Additionally, we also include in the text the other recommendations from Reviewer 3:

      a) We now more explicitly describe the problems of the original idtracker.ai v4 in the benchmark (lines 66-68). Around half of the videos had a high accuracy in the original dtracker.ai (v4) but the other half of the videos had lower accuracies (Figure 1a, blue). The new idtracker.ai has high accuracy values for all the videos (Figure 1a, magenta).

      Also, the videos with high accuracy in the old idtracker.ai had very long tracking times (Figure 1b, blue) and the new version does not (Figure 1b, magenta). So the benchmark allows us to distinguish the new idtracker.ai as having a better accuracy for all videos and lower tracking times, making it a much more practical system than previous ones. 

      b) We further clarified the occlusion experiment (lines 188-190 and 277-290).

      c) We explain why we measure accuracies with and without animal crossings (lines 49-62).

      d) We added a Discussion section (lines 223-244).

      We believe the new version has clarified the minor methodological concerns.

      Reviewer #3 (Public review):

      The authors have reorganized and rewritten a substantial portion of their manuscript, which has improved the overall clarity and structure to some extent. In particular, omitting the different protocols enhanced readability. However, all technical details are now in appendix which is now referred to more frequently in the manuscript, which was already the case in the initial submission. These frequent references to the appendix - and even to appendices from previous versions - make it difficult to read and fully understand the method and the evaluations in detail. A more self-contained description of the method within the main text would be highly appreciated.

      In the new ms, we have reduced the references to the appendix by having a more detailed explanation in one place, lines 49-62.

      Furthermore, the authors state that they changed their evaluation metric from accuracy to IDF1. However, throughout the manuscript they continue to refer to "accuracy" when evaluating and comparing results. It is unclear which accuracy metric was used or whether the authors are confusing the two metrics. This point needs clarification, as IDF1 is not an "accuracy" measure but rather an F1-score over identity assignments.

      We thank the reviewer for noticing this. Following this recommendation, we changed how we refer to the accuracy measure with “IDF1 score” in the entire ms. See, for example, lines 46, 161, 176, and 522.

      The authors compare the speedups of the new version with those of the previous ones by taking the average. However, it appears that there are striking outliers in the tracking performance data (see Supplementary Table 1-4). Therefore, using the average may not be the most appropriate way to compare. The authors should consider using the median or providing more detailed statistics (e.g., boxplots) to better illustrate the distributions.

      We thank the reviewer for asking for more detailed statistics. We added the requested box plot in Figure 1- figure Supplement 2 to provide more complete statistics in the comparison.

      The authors did not provide any conclusion or discussion section. Including a concise conclusion that summarizes the main findings and their implications would help to convey the message of the manuscript.

      We added a Discussion section in lines 223-244.

      The authors report an improvement in the mean accuracy across all benchmarks from 99.49% to 99.82% (with crossings). While this represents a slight improvement, the datasets used for benchmarking seem relatively simple and already largely "solved". Therefore, the impact of this work on the field may be limited. It would be more informative to evaluate the method on more challenging datasets that include frequent occlusions, crossings, or animals with similar appearances.

      Around half of the videos also had a very high accuracy in the original dtracker.ai (v4) but the other half of the videos had lower accuracies (Figure 1a, blue). For example, we found IDF1 scores of 94.47% for a video of 100 zebrafish with thousands of crossings (z_100_1), 93.77% for a video of 4 mice (m_4_2) and 69.66% for a video of 100 flies (d_100_3). The new idtracker.ai has high accuracy values for all the videos (Figure 1a, magenta).

      Importantly, the tracking times for the majority of videos was very high in the original idtracker.ai (Figure 1b, blue), making the use of the tracking system limited in practice. The new system manages both a high accuracy in all videos (Figure 1a, magenta) and much lower tracking times (Figure 1b, magenta), making it a much more practical system..

      We have added a sentence of the limitations of the original idtracker.ai as obtained from the benchmark, lines 66-68.

      The accuracy reported in the main text is "without crossings" - this seems like incomplete evaluation, especially that tracking objects that do not cross seems a straightforward task. Information is missing why crossings are a problem and are dealt with separately.

      We have now added an explanation on why we measure accuracy without crossings and why we separated it from the accuracy for all the trajectory in lines 49-62. The reason is that the identification algorithm being presented in this ms only identifies animal images outside the crossings. This algorithm makes robust animal identifications through the video despite the thousands of animal crossings typically existing in each of our videos used in the benchmark. It is a second algorithm (that hasn’t changed since the first idTracker in 2014) the one that assigns animal positions during crossings once the first algorithm has made animal identifications before and after the crossings.

      There are several videos with a much lower tracking accuracy, explaining what the challenges of these videos are and why the method fails in such cases would help to understand the method's usability and weak points.

      Some videos had low accuracy on previous versions (Figure 1a, blue), but the new idtracker.ai has high accuracy in all of them (Figure 1a, magenta).

      Reviewer #3 (Recommendations for the authors):

      (1) As described before, the authors claim to use IDF1 as their metric in the whole manuscript (lines 414-436) but only refer to accuracy when presenting the results. It is not clear, whether accuracy was used as a metric instead of IDF1 or the authors are confusing these metrics.

      Following this recommendation, we replaced “accuracy” with “IDF1 score” , see lines 46, 161, 176, and 522.

      (2) In the introduction, a brief explanation why crossings need to be dealt with separately would help to understand the logic of the method design.

      We added such an explanation in lines 49-62.

      (3) Figure 3: We asked about how the tracking accuracy is being assessed with occlusions. The authors responded with that only the GT points inside the ROI are taken into account when computing the accuracy. Does this mean, that the occluded blobs are still part of the CNN training and the clustering? This questions the purpose of this experiment, since the accuracy performance would therefore only change, if the errors, that their approach is doing either way, are outside the ROI and, therefore, not part of the metric evaluation.

      The occluded blobs are not part of any training because they are erased from the video, they do not exist. We made this more clear in lines 188-190 and 277-290.

      (4) Figure 1: The fact that datasets are connected with a line is misleading - there is no connection between the data along the x-axis. A line plot is not an appropriate way to present these results.

      The new ms clarifies that the lines are for ease of visualization, see last line in the caption of Figure 1.

      (5) Lines 38-39: It is not clear how the CNN can be pretrained for the entire video if there are no global segments or only short ones. Here, the distinction between "no segments", "only short segments" and "pretraining on the entire video" is not explained.

      This pretraining protocol is not used in the version of the software we present, so details of this are not as relevant.

      (6) Figure 2a: The authors are showing "individual fragments" and individual fragments in a global fragment." However, it seems there are a few blue borders missing. In the text (l. 73-79), they note, that they are displaying them as "examples" but the absence of correct blue borders is confusing.

      In the new ms, we have replaced the label “Individual fragments in a global fragment” with “Individual fragments in an example global fragment” in the legend of Figure 2.

      (7) Lines 61-63, 148-151, and 162-164: Could the authors clarify why they used the average instead of median when comparing the speedups of the new version and the old ones?

      We thank the reviewer for asking for more detailed statistics. We added the requested box plot in Figure 1- figure Supplement 2 to provide more complete statistics in the comparison of accuracies and tracking times for old and new systems.

      (8) Lines 140-144: The post-processing steps are not clear. The authors should rather state clearly which processes of the old versions they are using. Then the authors could shortly explain them.

      We removed this paragraph and explained in more detail in lines 49-62 which parts of the software are new and which ones are not.

      (9) Lines 239-251: Here, the authors are clarifying on a section 1-2 pages before. This information should be directly in that section instead.

      Following this recommendation, we clarified the occlusion experiment in the main text (lines 188-191) to make it more self-contained. Still, the flow of the main text is better with some details in Methods.

      (10) Line 38: It is not clear how the CNN can be pretrained for the entire video if there are no global segments or only short ones. Here, the distinction between "no segments"

      "only short segments" and "pretraining on the entire video" is a bit misleading/underexplained.

      See number 5.

      (11) Figure 2a: The authors are showing "individual fragments" and individual fragments in a global fragment." However, it seems there are a few blue borders missing. In the text (l. 73-79), they note, that they are displaying them as "examples" but the absence of correct blue borders is confusing.

      See number 6.

      (12) Figure 2c and line 115-118: "Batches" itself is not meaningful without any information of the batch size. The authors should rather depict the batch size and then the number of epochs. The Figure 2 contains the info 400 positive and 400 negative pairs of images per batch. However, there is no information about the total number of images.

      Furthermore, these metrics are inappropriate here, since training is carried out from scratch (or already pre-trained) for every new video, each video has different number of animals, different number of images.

      Following this recommendation, we clarified the number of images in each batch (Figure 1c caption and lines 134-138), why we do not work with epochs (lines 700-702), and the idea that the clusters in Figure 2 represent an example and the number of batches needed for the clusters to form depends on the video details.

      Appendix 1-figure 1: why do the methods fail? It looks that for certain videos the method is fairly unreliable. What is the reason for the methods to crash and how to avoid this?

      Those failures are only for the old idtracker.ai and Trex, not for the method presented here. Our new contrastive algorithm does not fail in any of the videos in the benchmark.

      We thank the reviewer for the detailed suggestions. We believe we have incorporated all of them in the new version of the ms.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper by Karimian et al proposes an oscillator model tuned to implement binding by synchrony (BBS*) principles in a visual task. The authors set out to show how well these BBS principles explain human behavior in figure-ground segregation tasks. The model is inspired by electrophysiological findings in non-human primates, suggesting that gamma oscillations in early visual cortex implement feature-binding through a synchronization of feature-selective neurons. The psychophysics experiment involves the identification of a figure consisting of gabor annuli, presented on a background of gabor annuli. The participants' task is to identify the orientation of the figure. The task difficulty is varied based on the contrast and density of the gabor annuli that make up the figure. The same figures (without the background) are used as inputs to the oscillator model. The authors report that both the discrimination accuracy in the psychophysics experiment and the synchrony of the oscillators in the proposed model follow a similar "Arnold Tongue" relationship when depicted as a function of the texture-defining features of the figure. This finding is interpreted as evidence for BBS/gamma synchrony being the underlying mechanism of the figure-ground segregation.

      Note that I chose to use "BBS" over gamma synchrony (used by the authors) in this review, as I am not convinced that the authors show evidence for synchronization in the gamma-band.

      We thank the reviewer for their careful assessment of our manuscript and useful comments that we believe have served to strengthen our work.

      Strengths:

      The design of the proposed model is well-informed by electrophysiological findings, and the idea of using computational modeling to bridge between intracranial recordings in non-human primates and behavioral results in human participants is interesting. Previous work has criticized the BBS synchrony theory based on the observation that synchronization in the gamma-band is highly localized and the frequency of the oscillation depends on the visual features of the stimulus. I appreciate how the authors demonstrate that frequency-dependence and local synchronization can be features of BBS, and not contradictory to the theory. As such, I feel that this work has the potential to contribute meaningfully to the debate on whether BBS is a biophysically realistic model of feature-binding in visual cortex.

      Weaknesses:

      I have several concerns regarding the presented claims, assessment of meaning and size of the presented effects, particularly with regard to the absence of a priori defined effect sizes.

      Firstly, the paper makes strong claims about the frequency-specificity (i.e., gamma synchrony) and anatomical correlates (early visual cortex) of the observed effects. These claims are informed by previous electrophysiological work in non-human primates but are not directly supported by the paper itself. For instance, the title contains the word "gamma synchrony", but the authors do not demonstrate any EEG/MEG or intracranial data in from their human subjects supporting such claims, nor do they demonstrate that the frequencies in the oscillator model are within the gamma band. I think that the paper should more clearly distinguish between statements that are directly supported by the paper (such as: "an oscillator model based on BBS principles accounts for variance in human behavior") and abstract inferences based on the literature (such as "these effects could be attributed to gamma oscillations in early visual cortex, as the model was designed based on those principles").

      We thank the reviewer for this helpful comment and agree that the scope of our claims should be clearly delineated between what is directly supported by our data and what is theoretically inferred from prior literature.

      We revised the Abstract, Introduction, and early Discussion to moderate the strength of our statements and make the distinction explicit. The revised title now emphasizes that our study tests principles derived from prior work on gamma synchrony rather than directly demonstrating gamma activity in humans. Throughout the text, we use more cautious phrasing that highlights potential mechanisms and theoretical predictions. The intention of our study was not to position synchrony as the only viable mechanism of figure–ground perception. Rather, our goal was to reinvigorate it as a potential contender by showing that features often cited as limitations of synchrony-based binding may in fact be essential properties of the mechanism. We updated phrasing throughout the manuscript to make this clearer and avoid overstating the study’s contribution.

      Importantly, our model is not agnostic with respect to frequency band. Oscillator frequencies exhibited by model units are within the gamma range by design. Frequency emerges directly from the contrast within each oscillator’s receptive field, following an empirically established relationship between stimulus contrast and gamma frequency. To our knowledge, such a robust, quantitative relationship between stimulus features to exact oscillation frequency has not been consistently demonstrated for other frequency bands. This relationship yields gamma-band frequencies for all contrasts used in our simulations. The model is thus indeed a gamma oscillator model of V1, not a generic instantiation of Binding by Synchrony (BBS) principles.

      That said, we fully agree with the reviewer that our study cannot demonstrate a direct link between gamma synchrony in visual cortex and human behavior. Our behavioral and modeling results instead show that synchronization principles derived from gamma-band physiology in V1 can predict perceptual performance patterns. We now make this distinction explicit throughout the revised manuscript.

      Secondly, unlike the human participants, the model strictly does not perform figure-ground segregation, as it only receives the figure as an input.

      We thank the reviewer for the opportunity to clarify our modeling approach. We chose not to model the background to reduce computational cost, since including it requires a substantially larger number of oscillators without changing the model’s predictions. The model thus indeed only receives the figure region as input. We aimed to test the local grouping mechanism predicted by TWCO, rather than to simulate a full figure–ground segregation process including a read-out stage. Our model therefore isolates the conditions under which local synchrony emerges within the figure region, assuming that a downstream read-out mechanism (not explicitly modeled here) would detect regions of coherent activity. The exact nature of such a read-out mechanism was beyond the scope of our work.

      To confirm that our simplified model is a valid proxy, we ran additional simulations including the background and found that a coherent figure assembly reliably emerges, as can be seen in the phase-locking patterns relative to a reference oscillator at the center of the figure. This validates that the principles of local grouping we studied in isolation hold even when the figure is embedded in a noisy surround. We have added an explicit note in the Results (paragraph 2) that we only simulate the figure and added Supplementary Figure S1 showing the additional simulations.

      Finally, it is unclear what effect sizes the authors would have expected a priori, making it difficult to assess whether their oscillator model represents the data well or poorly. I consider this a major concern, as the relationship between the synchrony of the oscillatory model and the performance of the human participants is confounded by the visual features of the figure. Specifically, the authors use the BBS literature to motivate the hypothesis that perception of the texture-defined figure is related to the density and contrast heterogeneity of the texture elements (gabor annuli) of the figure. This hypothesis has to be true regardless of synchrony, as the figure will be easier to spot if it consists of a higher number of high-contrast gabors than the background. As the frequency and phase of the oscillators and coupling strength between oscillators in the grid change as a function of these visual features, I wonder how much of the correlation between model synchrony and human performance is mediated by the features of the figure. To interpret to what extent the similarity between model and human behavior relies on the oscillatory nature of the model, the authors should find a way to estimate an empirical threshold that accounts for these confounding effects. Alternatively, it would be interesting to understand whether a model based on competing theories (e.g., Binding by Enhanced Firing, Roelfsema, 2023) would perform better or worse at explaining the data.

      We thank the reviewer for these insightful and constructive comments, which have prompted additional analyses that we believe substantially strengthen our work. The reviewer raises two main points: (1) the need for a benchmark to assess our model’s performance, and (2) the concern that the relationship between model synchrony and behavior might be a non-causal “confound” of the visual features. We address each point below.

      (1) Benchmarking model performance

      We agree that it is important to assess how well our model performs relative to the data and included this in the original manuscript. We did not predefine an absolute good fit threshold because absolute agreement depends on irreducible noise and inter-subject variability, making a universal cutoff arbitrary. Instead, we had benchmarked model performance in two complementary ways. First, the noise ceiling shown in Figure 5 provides an empirical benchmark for the maximum fit any model could achieve on our data. Simulated Arnold tongues (based on synchrony) approach this ceiling achieving 89% of possible similarity for correlation and 79% of possible similarity for weighted Jaccard similarity, respectively. Second, the parameter sweep (Figure 3) situates our model’s performance within the broader parameter space. It shows that the model, whose key parameters were fixed a priori from independent macaque neurophysiological data, lies close to the optimal regime for explaining the human data. It also provides an estimate of the lower bound (worst-performing point) on the fit that a misspecified model implementing the identical mechanism would achieve. Our model with fixed a priori parameters does 1.41 times better than a misspecified model for the correlation fit metric and 3 times better for weighted Jaccard similarity.

      (2) Synchrony as mechanism vs. potential confound

      We appreciate the reviewer’s suggestion to test whether synchrony explains behavior beyond stimulus features. In our framework, synchrony is a near-deterministic function of the manipulated stimulus features given fixed model parameters. As a result, synchrony and the stimulus features are collinear (R<sup>2</sup>≈0.8) leaving no independent variance for synchrony to explain once stimulus features are included. Adding both into one statistical model yields unstable coefficients and no out-of-sample improvement.

      Mechanistically, we believe the relevant question is not whether synchrony explains behavior beyond stimulus features but whether synchrony is the correct transformation of the stimulus features to reproduce the behavioral pattern. Please note that in our design we ensured that mean contrast and luminance are identical in the figure and the background such that there are not more high-contrast Gabors in the figure than in the background. We did this with the aim to render mean contrast not a relevant feature. However, there are more high-contrast Gabors in the background, and it is conceivable that the absence of such high contrasts in the figure drives the detection/discrimination of the figure. We therefore agree that testing alternative models would further clarify the unique explanatory value of the synchrony mechanism. To that end, we derived two alternative rate-based readouts from the same V1 simulations of our model from which we derived synchrony. First, average firing rates inside the figure and second, the difference between average firing rates inside the figure and average firing rates in the background (rate difference). We analyzed each individually as predictors of behavior and performed a model comparison based on out-of-sample predictions. While rate difference (but not average firing) showed meaningful associations with performance when considered alone, the synchrony readout had a larger effect size and was favored by the model comparison. We added a new subsection comparing synchrony to rate-based alternatives in the Results (paragraphs 7-9), including additional Bayesian analyses and LOO-CV model comparison. Please note that the model comparison we added to the manuscript provides an additional benchmark beyond the map-level ceiling analysis. It indicates that the mapping from stimulus features to behavior via synchrony generalizes best without requiring an a priori good-fit threshold.

      We agree that formally comparing our model to a sophisticated rate-based alternative, such as an instantiation of the Binding by Enhanced Firing model, is an important direction for future work. However, it remains an open and non-trivial question whether such a model could quantitatively reproduce the precise shape of the behavioral Arnold tongue that emerges from the systematic manipulation of our stimulus parameters. Implementing and parameterizing such a model in a comparable, biologically grounded framework is a substantial undertaking that lies beyond the scope of the current study. Therefore, our goal here was not to claim exclusivity for synchrony-based mechanisms, but rather to re-evaluate their plausibility by showing that features often seen as limitations (stimulus dependence and frequency heterogeneity) are, in fact, essential characteristics of the TWCO framework that can predict complex behavioral outcomes.

      We would also like to clarify that our stimulus features were derived from theory rather than psychophysical literature. Starting from the principles of TWCO, we mapped frequency detuning and coupling strength onto known anatomical and physiological properties of early visual cortex, and only then derived the corresponding stimulus manipulations (contrast heterogeneity and grid coarseness). Demonstrating that these features predict behavior is therefore not trivial but constitutes a first empirical confirmation that the core TWCO variables match perception.

      Apart from adding analyses of additional rate-based readouts of our model, we also refined our discussion of the relationship between these and a synchrony-based mechanism.

      Reviewer #2 (Public review):

      The authors aimed to investigate whether gamma synchrony serves a functional role in figure-ground perception. They specifically sought to test whether the stimulus-dependence of gamma synchrony, often considered a limitation, actually facilitates perceptual grouping. Using the theory of weakly coupled oscillators (TWCO), they developed a framework wherein synchronization depends on both frequency detuning (related to contrast heterogeneity) and coupling strength (related to proximity between visual elements). Through psychophysical experiments with texture discrimination tasks and computational modeling, they tested whether human performance follows patterns predicted by TWCO and whether perceptual learning enhances synchrony-based grouping.

      We thank the reviewer for their thoughtful and constructive review. We believe the comments have served to improve our work.

      Strengths:

      (1) The theoretical framework connecting TWCO to visual perception is innovative and well-articulated, providing a potential mechanistic explanation for how gamma synchrony might contribute to both feature binding and separation.

      (2) The methodology combines psychophysical measurements with computational modeling, with a solid quantitative agreement between model predictions and human performance.

      (3) In particular, the demonstration that coupling strengths can be modified through experience is remarkable and suggests gamma synchrony could be an adaptable mechanism that improves with visual learning.

      (4) The cross-validation approach, wherein model parameters derived from macaque neurophysiology successfully predict human performance, strengthens the biological plausibility of the framework.

      Weaknesses:

      (1) The highly controlled stimuli are far removed from natural scenes, raising questions about generalisability. But, of course, control (almost) excludes ecological validity. The study does not address the challenges of natural vision or leverage the rich statistical structure afforded by natural scenes.

      We agree with the reviewer that the insights of the present study are limited to texture stimuli and have made adjustments in the Discussion (final two paragraphs) to avoid claiming generalizability to natural stimuli. We have also adjusted the title to specifically limit our results to texture stimuli. To establish the principles of TWCO, we needed tight control over the stimulus, but are intrigued by the idea to investigate natural scenes. We have added to our Discussion (paragraph 9) that future should evaluate to what extent the principles we investigate here apply to natural scenes. Synchrony-based mechanisms have been successfully used for image segmentation tasks in machine vision, showing that the proposed mechanism can in principle work for natural scenes.

      (2) The experimental design appears primarily confirmatory rather than attempting to challenge the TWCO framework or test boundary conditions where it might fail.

      We thank the reviewer for this important point. Our primary motivation was to address the neurophysiological properties of gamma synchrony that have been suggested to severely challenge the binding by synchrony mechanism. Particularly the strong dependence of gamma oscillations and synchrony on stimulus features. Our goal was to show that from the perspective of TWCO, these challenges become expected components of the mechanism. In essence, we wanted to promote a conceptual shift that converts what pushes a theory to its limit into something that is actually its central tenet. To facilitate this shift, we designed the experiment to directly test this core tenet.

      While our approach was designed to test a central prediction of TWCO rather than explicitly challenge its boundaries, we respectfully argue that it was far from a simple confirmatory experiment. The design incorporated high-risk elements that provided considerable room for both the theory and our model to fail. First, the core prediction itself was non-obvious and highly specific. We did not simply test whether contrast heterogeneity and grid coarseness affect perception. We tested the stronger hypothesis that they would reflect a specific, interactive trade-off (the behavioral Arnold tongue) as specified by TWCO. Second, our modeling approach was deliberately constrained to provide a further stringent test. We did not post-hoc optimize the model's key parameters to fit our behavioral data. Instead, we fixed them a priori based on independent neurophysiological data from macaques. This was a high-risk choice, as a mismatch between a priori model predictions and the human data would have seriously challenged the framework's generalizability.

      We agree that future research should further challenge TWCO. For instance, by using stimuli that require segregating several objects simultaneously or objects that cover more extensive regions of the visual field.

      (3) Alternative explanations for the observed behavioral effects are not thoroughly explored. While the model provides a good fit to the data, this does not conclusively prove that gamma synchrony is the actual mechanism underlying the observed effects.

      We agree that our results do not conclusively show that gamma synchrony is the actual mechanism underlying figure-ground segregation. We admit that the original phrasing used throughout the manuscript was too strong and gave the impression that we wanted to establish exactly that. However, the goal of our work was only to reinvigorate gamma synchrony as a potential contender by showing that features often cited as limitations of synchrony-based binding may in fact be essential properties of the mechanism. We have revised the title and made adjustments throughout the manuscript to better reflect this more moderate goal.

      Additionally, we added tests of alternatives (Results, paragraphs 7–9) to clarify the unique explanatory value of the synchrony mechanism. To that end, we derived two alternative rate-based readouts from the same V1 simulations of our model. First, we extracted average firing rates inside the figure. Second, we computed the difference between average firing rates inside the figure and average firing rates in the background (rate difference). We analyzed each individually as predictors of behavior and performed a model comparison between these two and synchrony based on out-of-sample predictions. While the rate difference (but not average firing) showed meaningful associations with performance when considered alone, the synchrony readout had a larger effect size and was favored by the model comparison.

      (4) Direct neurophysiological evidence linking the observed behavioral effects to gamma synchrony in humans is absent, creating a gap between the model and the neural mechanism.

      We agree that the model only provides a how-possibly account linking stimulus features to performance. Showing that the brain actually relies on this mechanism would require showing that cortical synchrony mediates the effect of stimulus features on behavior beyond firing rates. Collecting such data would constitute a major effort that would go beyond the scope of this study. We acknowledge the need for electrophysiological data and the mediation analysis in the updated Discussion.

      Achievement of Aims and Support for Conclusions:

      The authors largely achieved their primary aim of demonstrating that human figure-ground perception follows patterns predicted by TWCO principles. Their psychophysical results reveal a behavioral "Arnold tongue" that matches the synchronization patterns predicted by their model, and their learning experiment shows that perceptual improvements correlate with predicted increases in synchrony.

      The evidence supports their conclusion that gamma synchrony could serve as a viable neural grouping mechanism for figure-ground segregation. However, the conclusion that "stimulus-dependence of gamma synchrony is adaptable to the statistics of visual experiences" is only partially supported, as the study uses highly controlled artificial stimuli rather than naturalistic visual statistics, or shows a sensitivity to the structure of experience.

      Likely Impact and Utility:

      This work offers a fresh perspective on the functional role of gamma oscillations in visual perception. The integration of TWCO with perceptual learning provides a novel theoretical framework that could influence future research on neural synchrony.

      The computational model, with parameters derived from neurophysiological data, offers a useful tool for predicting perceptual performance based on synchronization principles. This approach might be extended to study other perceptual phenomena and could inspire designs for artificial vision systems.

      The learning component of the study may have a particular impact, as it suggests a mechanism by which perceptual expertise develops through modified coupling between neural assemblies. This could influence thinking about perceptual learning more broadly, but also raises questions about the underlying mechanism that the paper does not address.

      Additional Context:

      Historically, the functional significance of gamma oscillations has been debated, with early theories of temporal binding giving way to skepticism based on gamma's stimulus-dependence. This study reframes this debate by suggesting that stimulus-dependence is exactly what makes gamma useful for perceptual grouping.

      The successful combination of computational neuroscience and psychophysics is a significant strength of this study.

      The field would benefit from future work extending (if possible) these findings to more naturalistic stimuli and directly measuring neural activity during perceptual tasks. Additionally, studies comparing predictions from synchrony-based models against alternative mechanisms would help establish the specificity of the proposed framework.

      Recommendations for the authors:

      Reviewing Editor Comments:

      In a joint discussion to integrate the peer reviews and agree on the eLife recommendations, both reviewers agreed that the work is valuable, but they were on the fence about whether the strength of evidence was incomplete or solid, eventually settling on incomplete. The reviewers make several recommendations for improving these ratings, which I (Reviewing Editor) have organised into 3 points below, with point 1 of particular importance. Underneath the summary, please see the individual recommendations of the reviewers.

      (1) Strengthen evidence for the unique role of gamma synchrony in explaining the data, and ensuring claims are directly supported by relevant data:

      Reviewers 2 and 3 both note the lack of direct evidence for gamma involvement, and reviewer 2 observes that the fit with behaviour may trivially be explained by a relationship between contrast heterogeneity and grid coarseness without need for oscillation. The reviewers felt that the approach of fitting the model to human data could be strengthened to help address this issue - and they offer various solutions, e.g., more principled a-priori criteria around good vs bad fit of the model to both main task and training data, and comparison to alternative binding models (Reviewer 2), identifying and testing boundary conditions of the model (Reviewer 3). There is also the possibility of collecting direct human neurophysiological evidence linking the behavioural data to neural mechanisms. Our discussion also highlighted the need to weaken claims (including in the title) where links are not directly demonstrated by methods from the present study, e.g., resting on indirect comparisons to primate literature.

      We agree with the editor and reviewers that this was a critical point. To address it, we have made several major revisions.

      As suggested, we have weakened claims where the links are not directly demonstrated by our data. The title has been revised to be more specific, and we have carefully edited the abstract, introduction, and discussion to distinguish between our model's predictions and direct neurophysiological evidence.

      To address the concern that our model's fit might be trivially explained by visual features, we have performed a new analysis comparing the synchrony-based readout to two alternative rate-based readouts from the same V1 simulations. This new comparison shows that the synchrony readout provides a superior out-of-sample prediction of human behavior.

      While a full implementation of a competing theory like "Binding by Enhanced Firing" would be a valuable next step, we note that parameterizing such a model in a comparably grounded framework is a substantial undertaking beyond the scope of the present study. Our new analysis provides an important first step in this direction.

      (2) Make explicit and address the limitations of the stimuli:

      Include that the model is not extracting the figure from the background, and the controlled stimuli may limit generalizability.

      To address the concern that our model was not performing true figure-ground extraction, we performed a new set of simulations that included both the figure and the immediate background. The results confirm that synchrony dynamics within the figure region are not affected by the presence of the background. We added these validation results as supplementary materials. We have additionally made the modeling choice and its justification more explicit in the Results and Methods sections.

      We have revised the Discussion to be more explicit about the limitations of using highly controlled texture stimuli. We now clearly state that our findings are specific to this context and that further research is required to determine if these principles generalize to the segregation of objects in natural scenes.

      (3) Some clarifications to make more accessible:

      Include the figure explaining the framework (Reviewers 1&2), and also the model details (Reviewer 2).

      We have revised Figure 1 and its caption to more clearly illustrate the links from TWCO principles to their neural implementation in V1 and the resulting behavioral predictions.

      We have expanded the Methods section to provide a more detailed and accessible description of the model's construction. We now clarify precisely how the oscillator grid was defined in visual space, how eccentricity-dependent receptive field sizes were implemented, and how these were mapped onto a retinotopic cortical surface to determine coupling strengths.

      Reviewer #1 (Recommendations for the authors):

      (A) Major concerns:

      (1) My main concern:

      My main concern is the repeated claims that the observed findings can be attributed to gamma synchrony in the early visual cortex. I find this claim misleading as the authors do not report any electrophysiological data that directly supports such claims. As stated in my public review, I feel that the authors should be clear about direct evidence versus more abstract inferences based on the literature.

      In particular, I recommend changing claims about "gamma synchrony" to "Binding by Synchrony" That being said, the authors can outline that the model was built under the assumption that this synchrony is mediated by gamma in early visual cortex, but I don't think it should be part of their main conclusions.

      We appreciate that TWCO’s general principles are frequency-agnostic and can be viewed as binding by synchrony in a broad sense. Our work, however, specifically instantiates these principles in V1 gamma: the model reflects TWCO dynamics together with V1 anatomy/physiology and the well-established contrast–frequency relationship in the gamma range (which, to our knowledge, has not been demonstrated with comparable specificity for other bands). In that sense, it is a gamma oscillator model of V1, rather than a generic BBS instantiation. Moreover, stimulus dependencies often cited as challenges to BBS have been used in particular to argue against gamma; showing that these very dependencies are integral to the TWCO mechanism is central to our contribution, and we therefore keep our conclusions focused on the gamma-specific instantiation tested here.

      (2) Mediation of the observed effects by the visual features of the figure:

      The authors motivate the hypothesis that BBS predicts that the perception of texture-defined objects depends on the density of texture elements and their contrast heterogeneity. This hypothesis seems trivial as those are the features that distinguish figure from ground. I think it would be important to clarify how this hypothesis is unique to BBS and not explained by competing theories, such as Binding by Enhanced Firing (Roelfsema, 2023). The authors should be clear about what part of the hypothesis is not trivial based on the task and clearly attributable to oscillators and synchrony.

      Our stimulus features were derived from theory rather than psychophysical literature. Starting from the principles of TWCO, we mapped frequency detuning and coupling strength onto known anatomical and physiological properties of early visual cortex, and only then derived the corresponding stimulus manipulations (contrast heterogeneity and grid coarseness). We agree that grid coarseness (element distance) is an established facilitator of figure–ground perception. By contrast, contrast heterogeneity (feature variance) is less commonly emphasized as a figure–ground cue, compared to mean-based cues, but follows directly from TWCO’s frequency detuning. Importantly, mean contrast and luminance were matched exactly between figure and background in our stimuli. Demonstrating that contrast heterogeneity and grid coarseness not only independently affect figure-ground perception, but reflect a trade-off where higher heterogeneity needs to counteracted by reduced grid coarseness in the way TWCO specifies is therefore non-obvious and provides an initial empirical indication that the core TWCO variables might shape perception. We also agree that alternative models would further clarify the unique explanatory value of synchrony. In the revised manuscript, we compare rate-based readouts (mean figure rate; figure–background rate difference) with the synchrony readout from the same simulations. Rate difference indeed constitutes a predictor of performance, but the synchrony readout showed a larger effect and was preferred by out-of-sample model comparison.

      Using a linear model, the authors assess the relationship between discrimination accuracy and synchrony. Did the authors also include the factors grid coarseness and contrast heterogeneity in this model? Again, as both the task performance (as shown by the GEE analysis) and oscillatory synchrony depend on these features, the relationship between model and behavioral performance will be mediated by the visual features.

      Thank you for raising this. In our framework, detuning (via contrast heterogeneity) and coupling (via grid coarseness) are the inputs, synchrony is the proposed mechanistic mediator, and behavior is the output. Because synchrony in our model is a (near-)deterministic function of the manipulated features under fixed parameters, a joint features+synchrony regression is statistically ill-posed (perfect multicollinearity up to numerical error) and cannot add information. A proper mediation test would require trial-wise neural measurements of synchrony in the same task, which we do not have and acknowledge as a limitation in the Discussion. Accordingly, we show that both the features themselves (reflecting TWCO principles) and model-derived synchrony (realizing the proposed pathway) account for behavior.

      We agree this does not establish a unique contribution of synchrony. To probe alternatives, we added rate-based readouts and a model comparison to the revised manuscript. These additional analyses indicate that synchrony outperforms simple rate-based mappings. We do not claim this rules out more sophisticated rate-based mechanisms. Our aim is to demonstrate that synchrony is a viable, behaviorally informative readout for downstream processing. We do not assert it is the only mechanism the brain uses. Synchrony had been discounted due to its stimulus dependence; our results are intended to rule it back in. We have made changes throughout the manuscript to better reflect this more modest aim.

      (3) Goodness of fit measures are not established a prior:

      I have described this concern in my public review. It is hard to assess what the authors would have interpreted as a good or a bad fit, especially without accounting for the confound in the relationship between oscillator synchrony and behavior. Similarly, when assessing the similarity between the behavioral and dynamic Arnold Tongues across different coupling parameters, the authors found that the chosen parameters (based on macaque data) were not optimal. They offer the explanation that the human cortex has a lower coupling decay than the macaque cortex, and the similarity is higher for lower values of coupling decay. While this explanation is not entirely implausible, it is unclear where an oscillator model with human values would be in the presented plot, as the authors didn't estimate those values from the human studies. Moreover, the task used in the Lowet et al., 2017 paper is very different from the task presented here, which could also account for differences. Overall, the explanation appears hand-wavy considering the lack of empirically defined goodness of fit measures.

      Thank you for these concerns.

      We did indeed not provide a priori thresholds for what would be considered good fit. Instead, we used two complementary benchmarks; namely noise ceilings and parameter exploration. The former provides an upper bound on what any model (not just ours but based on completely different mechanisms) could achieve given our data. The parameter sweep provides an indication how well our concrete model can maximally fit the data and how bad it can be based on possible parameters. These benchmarks are more informative than a fixed a-priori cutoff, which would depend on unknown noise and inter-subject variability. Both the noise ceiling and the parameter exploration indicate that our model, using a priori fixed parameters, performs well. Additionally, we redid all our statistical analyses after z-normalizing every predictor to provide easier interpretation of effect sizes.

      Regarding the reason that key model parameters were not optimal, we believe our interpretation to be plausible. We agree that we currently do not have data to estimate the exact human decay factor and hence cannot establish how much model fit would be affected. However, the parameter exploration in Figure 3 shows that small to modest reductions in decay would improve model fit. We discuss this now in the revised manuscript.

      The reviewer’s suggestion is intriguing. While Lowet et al. (2017) used a different task, the parameters we took from their work (decay rate and maximum coupling) are intended to reflect anatomical properties and thus should not be task-dependent. That said, Lowet et al. ‘s data carry uncertainty, so our estimates may not be exact; we note this explicitly in the revised Discussion. Whether a different task would have yielded better parameter estimates is difficult to determine, but we considered Lowet’s paradigm appropriate because it was designed to target the same V1 anatomical and physiological properties that map onto TWCO.

      I have concerns about a similar confound in the training effects. If I'm not mistaken, the Hebbian Learning rule encourages synchronization between the oscillators in the grid. As such, it causes synchronization to increase over several simulations. Clearly, the task performance of the participants also improves over the sessions. Again, an empirical threshold would be required to assess whether the similarity in learning between model and performance goes beyond what is expected based on learning alone. How much of these effects can be attributed to the model being oscillatory?

      The reviewer is correct that, in our framework, learning operates via changes in coupling that increase synchrony. Enhanced synchrony is the proposed (and in our model also the actual) pathway by which learning impacts behavior. We agree that learning could, in principle, act through pathways other than synchrony. Demonstrating this would not be achieved by a mediation analysis here, because that requires independent, trial-level neural measurements of the candidate pathways (synchrony and alternatives). In the absence of such data, the appropriate approach would be model comparison between competing mechanistic readouts. We have added such a model comparison for a synchrony readout versus two rate-based readouts derived from the same simulations for the first session; i.e., focusing on the pathway from stimulus features to behavior. However, a similar model comparison is not possible for learning. As we show in the supplementary materials, rate-based readouts of our V1 model are not at all affected by coupling strength. As such, they are insensitive to changes in coupling and are thus not viable as alternative mechanisms to explain performance changes due to learning. A fair test of rate-based alternatives would require building a detailed rate-based figure–ground segregation model that predicts session-wise changes. We agree that this is an important next step but it is also substantial undertaking beyond the scope of the present study.

      (4) Similarly, for the comparison of the Arnold Tongue in the transfer session and the early session:

      In the first part of the Results section, it says: "Our model rests on the assumption that learning-induced structural changes in early visual cortex are specific to the retinotopic locations of the trained stimuli. We evaluated whether this assumption holds for our human participants using the transfer session following the main training period. [...] If learning is indeed local, participants' performance in the transfer session should resemble that of early training sessions, indicating a reset in performance for the new retinal location."

      The authors find that a model fit to session 3 explains the data in the transfer session best and consider this as evidence for the above-stated expectation. Again, it is unclear where the cutoff would have been for a session to be declared as early or late. For instance, had the participants only performed 4 sessions, would the performance be best explained by session 3 or session 1?

      A high number of statistical tests are used, which, firstly, need to be corrected for multiple comparisons (did the authors do this?). Secondly, I feel that the regression models could be improved. For instance, the authors fit one model per session and then assess how well each model explains the variance in the transfer session. I think the authors might want to opt for one model with the regressors contrast heterogeneity, grid coarseness, and session (and their interaction). Using this approach, the authors would still be able to assess which session predicts the data best. Similarly, interindividual variability could be accounted for by adding participant-specific random effects to the model (and using a mixed model), instead of fitting individual models per participant.

      We agree the “early vs late” cutoff was underspecified. In the revision, we predefine Session 2 as the early-learning reference, excluding Session 1 to avoid familiarization/response–mapping effects. We then fit a single Bayesian hierarchical model with contrast heterogeneity, grid coarseness, and session, plus a transfer indicator, and participant-level random effects. This allows us to place the transfer session on the same scale as training and to test a) whether the transfer session precedes the state in session 2 via the posterior contrast P(βtransfer<βSess2) and b) whether it is indistinguishable from the state in session two using an equivalence test derived from the fitted model. We find that the transfer session is equivalent to session 2. We added this updated analysis of the transfer session in the Results (paragraph 15).

      In response to the suggestion to use a hierarchical regression model for analyzing the transfer session, we have decided to use such a model for all our analyses in a Bayesian framework. In this Bayesian framework, inference is based on the joint posterior (credible intervals/equivalence) of all predictors in a model and additional post-hoc multiplicity corrections are not required.

      (5) Questions regarding the model:

      What does it mean that the grid was "defined in visual space"? How biologically plausible with regard to the retinotopy and organization of the oscillators do the authors claim the model to be?

      We are happy to clarify this point. We have a total of 400 oscillators reflecting neural assemblies in V1. We start by defining a regular, 20x20, grid of the receptive field (RF) centers of these oscillators inside the figure region. Each oscillator is then also assigned a RF size based on the eccentricity of its RF center. We use the threshold-linear relationship between RF eccentricity and RF size reported in [1] to assign RF sizes. Each oscillator thus has an individual, eccentricity-dependent, RF size.

      For the coupling between oscillators, we need to know their cortical distances. We obtain these by first determining the cortical location of each oscillator through a complex-logarithmic topographic mapping of neuronal receptive field coordinates onto the cortical surface [2,3]. For this mapping, we use human parameter values estimated by [4]. From these cortical locations, we then compute pairwise Euclidean distances.

      The model thus captures realistic retinotopy, eccentricity-dependent RF sizes, and distance-dependent coupling on the cortical surface. We have adjusted our Methods to make these steps clearer.

      (1) Freeman, J., & Simoncelli, E. P. (2011). Metamers of the ventral stream. Nature neuroscience, 14(9), 1195-1201.

      (2) Balasubramanian, M., & Schwartz, E. L. (2002). The isomap algorithm and topological stability. Science, 295(5552), 7. https://doi.org/10.1126/science.1066234

      (3) Schwartz, E. L. (1980). Computational anatomy and functional architecture of striate cortex: a spatial mapping approach to perceptual coding. Vision Research, 20(8), 645–669. http://www.sciencedirect.com/science/article/pii/0042698980900905

      (4) Polimeni, J. R., Hinds, O. P., Balasubramanian, M., van der Kouwe, A. J. W., Wald, L. L., Dale, A. M., & Schwartz, E. L. (2005). Two-dimensional mathematical structure of the human visuotopic map complex in V1, V2, and V3 measured via fMRI at 3 and 7 Tesla. Journal of Vision, 5(8), 898. https://doi.org/10.1167/5.8.898

      Similarly, do the authors claim that each gabor annuli stimulates a single receptive field in V1?

      We hope that with the additional explanation above, it is clearer that there is not a one-to-one mapping. Each oscillator samples the local image by pooling over all Gabor annuli that overlap its receptive field (partially or fully) and computes the average contrast within its RF. Conversely, a single annulus typically overlaps multiple RFs and contributes to each in proportion to the overlap.

      I am unsure how the oscillators were organized, if not retinotopically. How is the retinotopic input fed into the non-retinotopically arranged oscillators?

      We hope that with the additional explanation above, it is clearer that the network is strictly retinotopic.

      The frequency of each oscillator changes according to ω=2πv with ν=25+0.25C. How were the values for the linear regression in v chosen? Reference?

      The slope and intercept parameters for this equation were first reported in [5]. We added the reference to the Methods.

      (5) Lowet, E., Roberts, M., Hadjipapas, A., Peter, A., van der Eerden, J., & De Weerd, P. (2015). Input-dependent frequency modulation of cortical gamma oscillations shapes spatial synchronization and enables phase coding. PLoS computational biology, 11(2), e1004072.

      (6) Hebbian Learning Rule:

      I am confused about how the effective learning rate E= ∈t is calculated. It is said that it is estimated based on the similarity between the second experimental session and the distribution of synchrony after letting the model learn. How can the model learn without knowing epsilon and t?

      We agree with the reviewer that our procedure to estimate the effective learning rate requires further clarification. We performed a nested grid search. Essentially, we let the model learn between session 1 and 2 with each of 25 candidate effective learning rates and evaluate how well each of them allow the model to fit performance in session 2. We then select the best effective learning rate and create a new, smaller, grid around this value and repeat that procedure. In total we perform 5 nested grids to arrive at the final effective learning rate. We expanded the explanation in the Methods.

      (B) Minor concerns:

      (1) Small N: 2/3 of the studies that were cited to justify the small sample were notably different from the current experiment, i.e., Intoy 2020 is an eye movement task, Lange 2020 is a memory task (Tesileanu 2020 is more similar). I think a power analysis would be great to support, as the sample size seems quite low

      Our study uses a within-subject design with ~750 trials per session (≈6,000 total) per participant, analyzed with a hierarchical model that pools information across trials and participants. To assess adequacy, we ran a simulation-based design analysis using the fitted hierarchical model (i.e., post hoc, based on the observed variance components). This analysis indicated a detection probability >90% for all key effects. We now report the results of this design analysis in the (Supplementary Table 1) and note this in the Results (paragraph 1).

      Regarding the literature context, we agree the cited studies are not identical to ours; we referenced them to illustrate a common practice (small N with many trials) when targeting low-level, early-visual mechanisms. Intoy (pattern/contrast sensitivity) and Lange (perceptual learning in early vision) share that focus, while Tesileanu is methodologically closest.

      (2) Figure 1 could be more informative and better described in the text. The authors often don't refer to the panels in Figure 1. Maybe it would help to swap a and b to describe the Arnold tongue first? It might also be a good idea to add the coupling strength and frequency detuning axes

      We have swapped panels a and b and now refer to each panel in the main text to enhance clarity.

      (3) Values of rho (distance - is this degrees visual angle)? Do the authors assume that the size of the stimuli corresponds to receptive fields in V1? If so, how is this justified?

      The center-to-center distance between any pair of neighboring annuli is indeed expressed in degrees of visual angle. Rho is a scaling factor for this distance. With rho=1, the center-to-center distance corresponds to the diameter of the annuli; i.e., they touch but do not overlap each other. We do not assume any relation between the size of receptive fields and the size of the annuli. Receptive field sizes in our model are purely determined by their eccentricity and each oscillator can have several annuli within its receptive field while each annulus can fall within several overlapping receptive fields of different oscillators. We believe that the schematic illustration in Figure 1 might have given the impression that each oscillator sees exactly one annulus and added a note that this is not the case and merely an oversimplification to illustrate the relationship between contrast and intrinsic frequency.

      (4) Some equations are embedded in the text, and some are not. It might be easier to find the respective equation if they all have an index. For instance, the authors mention the psychometric function that relates model synchrony and performance in the results section. It would be easier to find if it had an index that the authors could refer to.

      We moved this equation as well as the contrast intrinsic frequency mapping from inline to displayed and numbered them.

      (5) Is there a reference for "Our model rests on the assumption that learning-induced structural changes in early visual cortex are specific to the retinotopic locations of the trained stimuli"? (If so, it should be cited.)

      We added references supporting this assumption.

      (6) Figure 2b: colorbar missing label.

      We added the label.

      Reviewer #2 (Recommendations for the authors):

      Cool work!

      (1) The reader would benefit from (a single) comprehensive figure that visually explains the entire conceptual framework-from TWCO principles to neural implementation to behavioural predictions-accessible to readers without specialised knowledge of oscillatory dynamics. This will give the paper a greater impact.

      We have adjusted Figure 1 in accordance with suggestions made by reviewer 1 and added further explanations to the caption and the Introduction to enhance clarity on how the principles of TWCO relate to neural implementation.

      (2) I think this paper would benefit from the audience eLife provides, but the paper could move closer to the audience.

      (3) Pride comes before the fall, but I am not the most uninformed reader, and it took me some effort to process everything.

      Thank you, we took this to heart. In the Introduction, we now state more explicitly how each variable is operationalized and how these map onto TWCO with improved reference to relevant panels in the schematic figure. We agree the framework is conceptually dense. TWCO principles reach the stimuli through specific V1 anatomy and physiology, so there are several links to keep in mind. Our goal with the revised introduction and figure is to make those links better visible.

      (4) You could consider discussing potential implications for understanding perceptual disorders characterized by altered neural synchrony (e.g., schizophrenia, autism) and how your learning paradigm might inform perceptual training interventions.

      Thank you for this suggestion. We have added that TWCO might provide a new lens to study perceptual disorders to the Discussion. We provide a concrete example of the relation between grouping, gamma synchrony (in light of TWCO) and lateral connectivity in schizophrenia

      (5) I think this paper has real strength, but rather than dispersing limitations throughout the discussion, create a dedicated section that systematically addresses ecological validity, alternative explanations, and generalisability concerns. This will also preempt criticism.

      We appreciate the suggestion. Our preference is to discuss limitations in context, next to the specific results they qualify, so readers see why each limitation matters and how it affects interpretation. Nevertheless, paragraph 7 on page 20 summarizes most limitations in a single paragraph.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer 1 (Public review):

      When do behavioral differences emerge between the task variants? Based on the results and discussion, the cues increase the salience of either the wins or the losses, biasing behavior in favor of either risky or optimal choice. If this is the case, one might expect the cues to expedite learning, particularly in the standard and loss condition. Providing an analysis of the acquisition of the tasks may provide insight into how the cues are "teaching" decision-making and might explain how biases are formed and cemented.

      While considerable differences in decision making emerge in early sessions of training, we do not observe any evidence that cuing outcomes expedites the development of stable choice patterns. Indeed, since the outcomes are cued across all four options, there is no categorical difference in salience between optimal and risky choices. Thus, our interpretation is that cuing wins and/or losses alters the integration of this feedback into choice preference, rather than the rate of the development of choice preference. To quantitatively address this point, we have included the following analysis:

      “To quantitatively examine choice variability during training, we binned sessions 1-5 and 6-10 and analyzed variability in choice patterns across task variants. Analysis of the first five sessions of training revealed a significant shift in decision score across sessions (F(3, 502) = 31.23, p <.0001), which differed between task variants (session x task: F(16, 502) = 2.13, p = .007). Conversely, while significant differences in overall score were observed between task variants in sessions 6-10 (task: F(5, 156) = 6.81, p <.0001), there was no significant variability across sessions (session: F(3, 481) = 2.06, p = .10, task x session: F(15, 481) = 0.78, p = .71). This indicates that the variability in choice preference (and presumably, learning about outcomes) is maximized in the first five sessions, and there are no obvious differences in the rate of development of stable choice patterns between task variants.”

      Does the learning period used for the modeling impact the interpretation of the behavioral results? The authors indicate that computational modeling was done on the first five sessions and used these data to predict preferences at baseline. Based on these results, punishment learning predicts choice preference. However, these animals are not naïve to the contingencies because of the forced choice training prior to the task, which may impact behavior in these early sessions. Though punishment learning may initially predict risk preference, other parameters later in training may also predict behavior at baseline.

      The first five sessions were chosen based on a previously developed method used in Langdon et al. (2019). When choosing the number of sessions to include, there is a balance between including more data points to improve estimation of parameters while also targeting the timeframe of maximal learning. As training continues, the impact of outcomes on subsequent choice should decrease, and the learning rate would trend towards zero. This can be observed in the reduction in inter-session choice variability as training progresses, as demonstrated in the analyses above. Once learning has ceased, presumably other cognitive processes may dictate choice (for example, habitual stimulus-response associations), which would not be appropriately captured by reinforcement learning models. It would be a separate research question to determine the point at which parameters no longer become predictive, requiring a larger dataset to thoroughly assess. We acknowledge that we did not provide sufficient justification for the learning period used for the modeling. In conjunction with the analysis of early sessions outlined above, we have added the following to the text:

      “We investigated differences in the acquisition of each task variant by fitting several reinforcement learning (RL) models to early sessions. Our modeling approach closely follows methods outlined in Langdon et al. (2019), in which a much larger dataset (>100 rats per task) was used to develop the RL models applied here. Due to the comparatively small n per group in the current study, we limited our model selection to those previously validated in Langdon et al. (2019), with minor extensions. As in previous work, models were fit to valid choices from the first five sessions. As training continues, the impact of outcomes on subsequent choice should decline, and parameter values may evolve over time (e.g., decreasing learning rate). To target the period of learning during which outcomes have maximal influence over choice, and parameters likely have fixed values, we limited our analyses to the first five sessions.”

      The authors also present simulated data from the models for sessions 18-20, but according to the statistical analysis section, sessions 35-40 were used for analysis (and presumably presented in Figure 1). If the simulation is carried out in sessions 35-40, do the models fit the data?

      Based on our experience, choice patterns are well instantiated by session 20, and training only continues to 30+ sessions to achieve stability in other task variables (e.g., latencies, premature responding, etc.). That being said, the discrepancy between session numbers is confusing, so we’ve extended the simulations to match the same session numbers that were analyzed in the experimental data.

      Finally, though the n's are small, it would be interesting to see how the devaluation impacts computational metrics. These additional analyses may help to explain the nuanced effects of the cues in the task variants. 

      Unfortunately, as the devaluation experiment is only one session, there are insufficient data to run the same models. Furthermore, changes in choice are subtle and not uniform across rats, making it difficult to reliably model this effect at the individual level. A separate experiment could investigate the specific cognitive processes underlying the devaluation effect.

      Reviewer #1 (Recommendations for the authors):

      The authors do not present individual data points for behavior. Including these data points would improve the interpretability of the results. Adding significant notations to the bar graphs would also help the reader. Although the stats are provided and significant comparisons highlighted, it isn't easy to go between the table and the figure to detect significant outcomes. If done, the statistics tables could be moved to the supplement. Including estimates of effect size for main findings in the main text would also benefit the reader. 

      We thank the reviewer for their feedback on our approach to the figures and significance reporting – we have updated the relevant figures to include individual data points. Furthermore, we’ve added significance notations for task variants that are significantly different from the uncued or standard cued tasks on the figures. We’ve also moved some statistics tables to the supplement, as suggested. 

      The authors allude to other metrics of the task (trials, omissions, etc.) but do not present these data anywhere. Including supplementary figures including individual data points and statistical analyses in the supplement is strongly encouraged.

      A supplementary figure visualizing these metrics (choice latency, trials completed, and omissions) has been added, with individual data points included. Statistical analyses are reported in the main text – no significant effect in the ANOVAs were observed for any of these metrics, so post hoc analyses were not performed. 

      Figure 4 is confusing. Presenting the WAIC values for each model rather than compared to the nonlinear model would be easier to understand. It is also unclear if statistical tests were used to assess differences in model fit as no test information is provided.

      Figure 4 has been updated to increase clarity and address feedback from another reviewer. Raw WAIC values are not ideal for visualization, as the task variants have differing amounts of data and thus would be difficult to include on the same Y-axis. Instead, we present each model’s difference in WAIC relative to a basic model with no timeout penalty transform, so that all three models are visible, and the direction of model improvement is clearly indicated. Statistical tests of WAIC differences are not standard, as the numerical differences themselves indicate a better fit.

      The authors do not provide a data availability statement.

      We thank the reviewer for calling our attention to this oversight. A data availability statement has been added. 

      Reviewer 2 (Public review):

      Additional support and evidence are needed for the claims made by the authors. Some of the statements are inconsistent with the data and/or analyses or are only weakly supportive of the claims.

      We appreciate the reviewer’s overarching concern that some claims in the original manuscript were insufficiently supported by the data or analyses. To address this, we have provided further rationale for the devaluation experiment and clarified our interpretation of those results, expanded the computational modeling analyses, and revised figures and wording to improve clarity. Below, we respond to the reviewer’s specific comments in detail.

      Reviewer #2 (Recommendations for the authors):

      Different variants of an RL model were used to understand how loss outcomes impacted choice behavior across the gambling task variants. Did the authors try different variants for rewarded outcomes? I wonder whether the loss specific RL effects are constrained to that domain or perhaps emerged because choice behavior to losses was better estimated with the different RL variants. For example, rewarded outcomes across the different choices may not scale linearly (e.g., 1, 2, 3, 4) so including a model in which Rtr is scaled by a free parameter might improve the fit for win choices.

      We agree that asymmetries in model flexibility could, in principle, contribute to the observed effects. While we are somewhat limited in our ability to develop and validate further models due to the small size of the datasets compared to the high degree of choice variability between rats, we have explored the possibility as far as the data allow by fitting a model that includes a scaling parameter for rewards in addition to punishments:

      “While we restricted our model selection to those previously validated on larger datasets, the specificity of the main finding to the punishment learning rate may be due to the greater flexibility afforded to loss scaling, rather than a true asymmetry in learning. To test this hypothesis, we fit a model featuring a scaling parameter for rewards, in addition to scaled costs:

      where mRew is a linear scaling parameter for reward size. A separate scaling parameter was used for timeout penalty duration (i.e., same as scaled cost model). Group-level parameter estimates (Figure S3) reflected similar differences in the punishment learning rate and reward learning rate as the scaled cost model (Figure S4). Furthermore, all 95% HDIs for the mRew scaling parameter included 1, indicating that at least at the group level, scaling of reward size across the P1-P4 options closely follows the actual number of earned sucrose pellets. Thus, we find no evidence that our results can be simply attributed to the increased parameterization of losing outcomes.”

      Additionally, I would like to see evidence that these alternative models provide a better fit compared to a standard delta-rule updating for unrewarded choices.

      Each model is now compared directly to a standard delta-rule update model in the WAIC figure to demonstrate that the current models are a better fit for the data.

      Could the authors provide some visualization of how variation in the r, m, or b parameters impact choices and/or patterns of choices?

      We have added a figure to the supplementary section to visualize how different values for the r, m, and b parameters could alter the size of updates to Q-values on each trial across the four different options, thereby impacting subsequent choice. 

      It was challenging to understand the impact of the reported effects and interpretation of the authors at various points in the manuscript. For example, the authors state that "only rats trained on tasks without win-paired cues exhibited shifts in risk preference following reinforcer devaluation". Figure 3 however seems to indicate that rats trained on the reverse-cued task show shifts in risk preference. 

      We agree the original wording did not fully capture the nuance apparent in the figure. While not significantly different from baseline, rats in the reverse-cued experiment could have indeed updated their choice patterns and we were underpowered to detect the effect. We have updated the results section to include this point, and to more specifically outline that win-paired cues that scale with reward size lead to insensitivity to reinforcer devaluation:

      “This indicates that pairing audiovisual cues with reward induces some degree of inflexibility in risk-preferring rats. Importantly, pairing cues with losses alone does not elicit rigidity in choice. Thus, in keeping with the observed effect on overall choice patterns, pairing cues with wins has a unique impact on sensitivity to reinforcer devaluation. Although not statistically significant, visual inspection of the reverse-cued task suggests that some choice flexibility may be present, and the study may be underpowered to detect this effect. Nonetheless, win-paired cues that scale with reward size reduce flexibility in choice patterns following reinforcer devaluation.”

      It was not clear to me why the authors did a devaluation test and what was expected. Adding details regarding the motivation for specific analyses and/or experiments would improve understanding of these exciting results.

      Further explanation has been added to the results section for the devaluation test to clarify the rationale and expected results:

      “We next tested whether pairing salient audiovisual cues with outcomes on the rGT impacts flexibility in decision making when outcome values are updated. Reinforcer devaluation, in which subjects are sated on the sugar pellet reinforcer prior to task performance (presumably devaluing the outcome), is a common test of flexibility of decision making (Adams & Dickinson, 1981). We have previously employed this method to demonstrate that rats trained on the standard-cued task are insensitive to reinforcer devaluation (i.e., choice patterns do not shift despite devaluation of the sugar pellet reward; Hathaway et al., 2021).”

      Some rats in the rGT become risk takers and some do not, but whether this is an innate phenomenon or emerges with training is not known. The authors report some correlations between the RL parameters and subsequent risk scores but this may be an artifact because the risk scores and many of the parameters differ between the experimental groups. Restricting these analyses to the rats in the standard procedure (or even conducting it in other rats that have been run in the rGT standard task) would alleviate this concern. The authors should also expand upon this result in the discussion. (if it holds up) and provide graphs of this relationship in the manuscript.

      In a previous paper on which these analyses were based (Langdon et al., 2019), analyses of the relationship between RL parameter estimates and final decision score were conducted separately for rats trained on either the uncued or standard cued task, as the reviewer has suggested here. Those analyses showed that parameters controlling the learning from negative outcomes were specifically related to final score in both tasks. While we don’t have the appropriate n per group to split the analyses by task variant in the current study, we have highlighted these previous findings in the results section to address this concern:

      “In Langdon et al. (2019), analyses were conducted to test whether parameters controlling sensitivity to punishment predicted final decision score at the end of training in the uncued and standard cued task variants. These analyses showed that across both task variants, there was evidence of reduced punishment sensitivity (i.e., lower m parameter or punishment learning rate) in risky versus optimal rats. We conducted similar analyses here to examine whether parameter estimates covary with decision score at end of training. To accomplish this, we fit simple linear regression models for each parameter and assessed whether the slopes were significantly different from zero.”

      I don't see a b parameter in the nonlinear cost model, but is presented in Figure 6 and also in the "Parameters predicting risk preference on the rGT". The authors either need to update the formula or clarify what the b parameter quantifies in the nonlinear model.

      We thank the reviewer for pointing out this oversight; the equation has been updated to include the b parameter.

      The risk score is very confusing as high numbers or % indicate less risk and lower (more negative numbers) indicate greater risk. I've had to reread the text multiple times to remind myself of this, so I anticipate the same will be true for other readers. Perhaps the authors can add a visual guide to their y-axis indicating more positive numbers are less risky choices.

      We acknowledge that this measure can be confusing – the calculation of this score is standard for the Iowa Gambling Task conducted in humans, on which the rGT is based, and was therefore adopted here. We’ve changed the name from “risk score” to “decision score”, along with including a visual guide to the y-axis in Figure 2, to address this point.

      Negative learning rate is confusing as it almost implies that the learning was a negative value, rather than being a learning rate for negative outcomes. Please revise in the figures and in the text.

      We have updated the text and figures where appropriate from “negative learning rate” to “punishment learning rate”. We have also changed the text from “positive learning rate” to “reward learning rate” to match this terminology.

      Reviewer 3 (Public review):

      There is a very problematic statistical stratagem that involves categorising individuals as either risky or optimal based on their choice probabilities. As a measurement or outcome, this is fine, as previously highlighted in the results, but this label is then used as a factor in different ANOVAs to analyse the very same choice probabilities, which then constitutes a circular argument (individuals categorised as risky because they make more risky choices, make more risky choices...).

      Risk status was included as a factor to test whether the effects of the cue paradigms differed between risky versus optimal rats (i.e., interaction effects), not as an independent predictor of choice preference. We focus on results showing a significant task x risk status interaction, and conducted follow-up analyses separately within each group, at which point risk status was no longer included as a factor. We do not interpret main effects or choice x status interactions, which would indeed be circular for the reason noted by the reviewer.

      A second experiment was done to study the effect of devaluation on risky choices in the different tasks. The results, which are not very clear to understand from Figure 3, would suggest that reward devaluation affects choices in tasks where the win-cue pairing is not present. The authors interpret this result by saying that pairing wins with cues makes the individuals insensitive to reward devaluation. Counter this, if an individual is prone to making risky choices in a given task, this points to an already distorted sense of value as the most rewarding strategy is to make optimal non-risky choices.

      We have included significance notations in Figure 3 and included further detail in the text to improve clarity of the findings for the devaluation test. The reviewer raises an interesting point that risk-preferring rats have a distorted sense of value, since they do not follow the optimal strategy. However, we believe that this is at least partially separable from insensitivity to devaluation, since risk-preferring rats trained on tasks that don’t feature win-paired cues still exhibit flexibility in choice. We have added the following point to the discussion to address this:

      “While risk-preferring rats exhibit some degree of distortion in reward valuation, as they do not follow the most rewarding strategy (i.e., selecting optimal options), we believe this to be at least partially separable from choice inflexibility, as risk-preferring rats on tasks that don’t feature win-paired cues remain sensitive to devaluation.”

      While the overall computational approach is excellent, I believe that the choice of computational models is poor. Loss trials come at a double cost, something the authors might want to elaborate more upon, firstly the lost opportunity of not having selected a winning option which is reflected in Q-learning by the fact that r=0, and secondly a waiting period which will affect the overall reward rate. The authors choose to combine these costs by attempting to convert the time penalty into "reward currency" using three different functions that make up the three different tested models. This is a bit of a wasted opportunity as the question when comparing models is not something like "are individuals in the paired win-cue tasks more sensitive to risk? or less sensitive to time? etc" but "what is the best way of converting time into Q-value currency to fit the data?" Instead, the authors could have contrasted other models that explicitly track time as a separate variable (see for example "Impulsivity and risk-seeking as Bayesian inference under dopaminergic control" (Mikhael & Gershman 2021)) or give actions an extra risk bonus (as in "Nicotinic receptors in the VTA promote uncertainty seeking" (Naude et al 2016)).

      We thank the reviewer for their thoughtful suggestions and agree that alternative modeling frameworks that explicitly track time or incorporate uncertainty bonuses would be highly informative for understanding the mechanisms underlying risky choice. However, the models employed here are drawn from previous work that required >100 rats per group for model development and validation. Due to the high degree of variability in decision making within the groups and the relatively small number of rats, this dataset is not well suited for substantial model innovation. Indeed, the most complex model from previous work had to be simplified to achieve model convergence. Testing models that greatly diverge from the previously validated RL models would make it difficult to determine whether poor model fit reflects a misspecified model or insufficient data.

      We’d also like to note that the driving question for this study is to investigate the impact of different cue variants on choice patterns – untangling the relationship between timing, uncertainty, and risky choice is an important and interesting question, but beyond the scope of the present work. 

      To address this limitation, we have expanded our justification of model choice in the results section to emphasize that we are applying previously developed models, with minor extensions:

      “We investigated differences in the acquisition of each task variant by fitting several reinforcement learning (RL) models to early sessions. Our modeling approach closely follows methods outlined in Langdon et al. (2019), in which a much larger dataset (>100 rats per task) was used to develop the RL models applied here. Due to the comparatively small n per group in the current study, we limited our model selection to those previously validated in Langdon et al. (2019), with minor extensions.”

      Another weakness of the computational section is the fact, that despite simulations having been made, figure 5 only shows the simulated risk scores and not the different choice probabilities which would be a much more interesting metric by which to judge model validity. 

      We have expanded Figure 5 to show the simulated choice of each option.

      In the last section, the authors ask whether the parameter estimates (obtained from optimisation on the early sessions) could be used to predict risk preference. While this is an interesting question to address, the authors give very little explanation as to how they establish any predictive relationship. A figure and more detailed explanation would have been warranted to support their claims.

      We have expanded this section to provide clearer detail on the methods used to conduct this analysis and added a figure. To address a point raised by another reviewer, the statistical approach has been revised to more closely align with that used in Langdon et al. (2019), and the results have been updated appropriately:

      “We next tested whether any of the subject-level parameter estimates in the nonlinear or scaled + offset model could reliably predict risk preference scores at the end of training. In Langdon et al. (2019), analyses were conducted to test whether parameters controlling sensitivity to punishment predicted final decision score at the end of training in the uncued and standard cued task variants. These analyses showed that across both task variants, there was evidence of reduced punishment sensitivity (i.e., lower m parameter or punishment learning rate) in risky versus optimal rats. We conducted similar analyses here to examine whether parameter estimates covary with decision score at end of training. To accomplish this, we fit simple linear regression models for each parameter and assessed whether the slopes were significantly different from zero.”

      Why were the simulated risk scores calculated for sessions 18-20 and not 35-39 as in the experimental data, and why were the models optimised only on the first sessions?

      These points were addressed in response to reviewer #1:

      Based on our experience, choice patterns are well instantiated by session 20, and training only continues to 30+ sessions to achieve stability in other task variables (e.g., latencies, premature responding, etc.). That being said, the discrepancy between session numbers is confusing, so we’ve extended the simulations to match the same session numbers that were analyzed in the experimental data.

      The first five sessions were chosen based on a previously developed method used in Langdon et al. (2019). When choosing the number of sessions to include, there is a balance between including more data points to improve estimation of parameters while also targeting the timeframe of maximal learning. As training continues, the impact of outcomes on subsequent choice should decrease, and the learning rate would trend towards zero. This can be observed in the reduction in inter-session choice variability as training progresses, as demonstrated in the analyses above. Once learning has ceased, presumably other cognitive processes may dictate choice (for example, habitual stimulus-response associations), which would not be appropriately captured by reinforcement learning models. It would be a separate research question to determine the point at which parameters no longer become predictive, requiring a larger dataset to thoroughly assess. We acknowledge that we did not provide sufficient justification for the learning period used for the modeling. In conjunction with the analysis of early sessions outlined above, we have added the following to the text:

      “We investigated differences in the acquisition of each task variant by fitting several reinforcement learning (RL) models to early sessions. Our modeling approach closely follows methods outlined in Langdon et al. (2019), in which a much larger dataset (>100 rats per task) was used to develop the RL models applied here. Due to the comparatively small n per group in the current study, we limited our model selection to those previously validated in Langdon et al. (2019), with minor extensions. As in previous work, models were fit to valid choices from the first five sessions. As training continues, the impact of outcomes on subsequent choice should decline, and parameter values may evolve over time (e.g., decreasing learning rate). To target the period of learning during which outcomes have maximal influence over choice, and parameters likely have fixed values, we limited our analyses to the first five sessions.”

      Concerning the figures, could you consider replacing or including with the bar plots, the full distribution of individual dots, or a violin plot, something to better capture the distribution of the data. This would be particularly beneficial for Figure 2B the risk score which, without a distribution suggests all individuals are optimal, something which in the text claim is not the case. 

      Individual data points have been added to the relevant figures.

      Is this not a case of compositional data where ANOVA is definitely not an appropriate method (compositional data consist in reporting proportions of different elements in a whole, eg this rock is 60% silicate, 20% man-made cement, etc.) because of violation of normality and mostly dependence between measurements (the sum must be 100% as in your case where knowing the proportions of P1, P2 and P3, I automatically deduce P4). I leave to you the care of finding a potential alternative. In any case, I also had difficulties understanding the varying degrees of freedom of the different reported F statistics which worry me that this has not been done properly.

      This is a fair criticism, as choice proportions across P1-P4 are not fully independent. While alternative approaches do exist, there is no widely adopted or straightforward method that has been validated for this task. Accordingly, ANOVA remains the standard analytical approach for this task, as it facilitates comparison with previous work and is readily understood by readers. As mentioned in the methods, an arcsine transformation was applied to the proportional data to mitigate issues associated with bounded measures (i.e., summing to 100%). We thank the reviewer for drawing our attention to the discrepancies in the degrees of freedom – these have now been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study provides a useful analysis of the changes in chromatin organization and gene expression that occur during the differentiation of two cell types (anterior endoderm and prechordal plate) from a common progenitor in zebrafish. Although the findings are consistent with previous work, the evidence presented in the study appears to be incomplete and would benefit from more rigorous interpretation of single-cell data, more in-depth lineage tracing, overexpression experiments with physiological levels of Ripply, and a clearer justification for using an explant system. With these modifications, this paper will be of interest to zebrafish developmental biologists investigating mechanisms underlying differentiation.

      We sincerely thank the editor and the reviewers for their valuable time and efforts. Their insightful comments were greatly appreciated and have been largely addressed in the revised manuscript. We are confident that these revisions have enhanced the overall quality and clarity of our paper.

      Reviewer #1 (Public review):

      Summary:

      During vertebrate gastrulation, mesendoderm cells are initially specified by morphogens (e.g. Nodal) and segregate into endoderm and mesoderm in part based on Nodal concentrations. Using zebrafish genetics, live imaging, and single-cell multi-omics, the manuscript by Cheng et al presents evidence to support a claim that anterior endoderm progenitors derive primarily from prechordal plate progenitors, with transcriptional regulators goosecoid (Gsc) and ripply1 playing key roles in this cell fate determination. Such a finding would represent a significant advance in our understanding of how anterior endoderm is specified in vertebrate embryos.

      We would like to thank reviewer #1 for his/her comments and positive feedbacks about our manuscript.

      Strengths:

      Live imaging-based tracking of PP and endo reporters (Figure 2) is well executed and convincing, though a larger number of individual cell tracks will be needed. Currently, only a single cell track (n=1) is provided.

      We thank the reviewer for the positive comments and the valuable suggestion. As the reviewer suggested, we re-performed live imaging analyses on the embryos of Tg(gsc:EGFP;sox17:DsRed). We tracked dozens of cells during their transformation from gsc-positive to sox17-positive. Furthermore, we performed quantification of the RFP/GFP signal intensity ratio in these cells over the course of development (Please see the revised Figure 2D and MovieS4).

      Weaknesses:

      (1) The central claim of the paper - that the anterior endoderm progenitors arise directly from prechordal plate progenitors - is not adequately supported by the evidence presented. This is a claim about cell lineage, which the authors are attempting to support with data from single-cell profiling and genetic manipulations in embryos and explants. The construction of gene expression (pseudo-time) trajectories, while a modern and powerful approach for hypothesis generation, should not be used as a substitute for bona fide lineage tracing methods. If the authors' central hypothesis is correct, a CRE-based lineage tracing experiment (e.g. driving CRE using a PP marker such as Gsc) should be able to label PP progenitor cells that ultimately contribute to anterior endoderm-derived tissues. Such an experiment would also allow the authors to quantify the relative contribution of PP (vs non-PP) cells to the anterior endoderm, which is not possible to estimate from the indirect data currently provided. Note: while the present version of the manuscript does describe a sox17:CRE lineage tracing experiment, this actually goes in the opposite direction that would be informative (sox:17:CRE-marked descendants will be a mixture of PP-derived and non-PP derived cells, and the Gsc-based reporter does not allow for long-term tracking the fates of these cells).

      We sincerely thank the reviewer for the professional comments and the constructive suggestions. As the reviewer indicated, utilizing the single-cell transcriptomic trajectory analyses on zebrafish embryos and Nodal-injected explants system, along with the live imaging analyses on Tg(gsc:EGFP;sox17:DsRed) embryos, we revealed that anterior endoderm progenitors arise from prechordal plate progenitors. To further verify this observation, we conducted two sets of lineage-tracing assays. Initial evidence came from the results of co-injecting sox17:Cre and gsc:loxp-STOP-loxp-mcherry plasmids. We observed RFP-positive cells at 8 hpf, demonstrating the presence of cells that had expressed both genes. To explicitly follow the proposed lineage, we then implemented a reciprocal strategy, as suggested by the reviewer, that constructed and co-injected sox17:loxp-STOP-loxp-mcherry and gsc:Cre plasmids. The appearance of RFP-positive cells in the anterior dorsal region at 8 hpf provides direct evidence for a transition from gsc-positive to sox17-positive identity. These results are now included in the revised manuscript (Please see Author response image 1 and Figure S4E). However, in accordance with the reviewer's caution, we acknowledge that this does not prove this is the sole origin of anterior endoderm. Consequently, we have revised the text to clarify that our findings demonstrate that anterior endoderm can be specified from prechordal plate progenitors, without claiming that it is the only source.

      Author response image 1.

      Characterization of anterior endoderm lineage by Cre-Lox recombination system.

      (2) The authors' descriptions of gene expression patterns in the single-cell trajectory analyses do not always match the data. For example, it is stated that goosecoid expression marks progenitor cells that exist prior to a PP vs endo fate bifurcation (e.g. lines 124-130). Yet, in Figure 1C it appears that in fact goosecoid expression largely does not precede (but actually follows) the split and is predominantly expressed in cells that have already been specified into the PP branch. Likewise, most of the cells in the endo branch (or prior) appear to never express Gsc. While these trends do indeed appear to be more muddled in the explant data (Figure 1H), it still seems quite far-fetched to claim that Gsc expression is a hallmark of endoderm-PP progenitors.

      We thank the reviewer for pointing out this issue. Our initial analysis proposed that the precursors of the prechordal plate (PP) and anterior endoderm (endo) more closely resemble a PP cell fate, as their progenitor populations highly express PP marker genes, such as gsc. The gsc gene is widely recognized as a PP marker[1]. The reviewer pointed out that in our analysis, these precursor cells do not initially exhibit high gsc expression; rather, gsc expression gradually increases as PP fate is specified.

      The reason for this observation is as follows: First, for the in vivo data, we used the URD algorithm to trace back all possible progenitor cells for both the PP and anterior endo trajectory. As mentioned in the manuscript, the PP and anterior endo are relatively distant in the trajectory tree of the zebrafish embryonic data. Consequently, this approach likely included other, confounding progenitor cells that do not express gsc (like ventral epiblast, Author response image 2). However, we further investigated the expression of gsc and sox17 along these two trajectories. The conclusion remains that gsc expression is indeed higher than sox17 in the progenitor cells common to both trajectories (Author response image 2). Combined with the live imaging analysis presented in this study, which shows that gsc expression increases progressively in the PP, this supports the notion that the progenitor cells for both PP and anterior endoderm initially bias towards a PP cell fate.

      On the other hand, in our previously published work using the Nodal-injected explant system, which specifically induces anterior endo and PP, the cellular trajectory analysis also revealed that the specifications of PP and anterior endo follow very similar paths. Therefore, we proceeded to analyze the Nodal explant data. Similarly, when using URD to trace the differentiation trajectories of PP and anterior endo cells, a small number of other progenitor cells were also captured. This explains why a minority of cells do not express gsc—these are likely ventral epiblast cells (Author response image 2). However, based on the Nodal explant data, gsc is specifically highly expressed in the progenitor cells of the PP and anterior endo. Its expression remains high in the PP trajectory but gradually decreases in the endoderm trajectory (Figure 1H).

      Author response image 2.

      (A) The expression of ventral epiblast markers in PP and anterior Endo URD trajectory. (B) The expression of gsc, sox32 and sox17 in the progenitors of PP and anterior endo in embryos and Nodal explants.

      (3) The study seems to refer to "endoderm" and "anterior endoderm" somewhat interchangeably, and this is potentially problematic. Most single-cell-based analyses appearing in the study rely on global endoderm markers (sox17, sox32) which are expressed in endodermal precursors along the entire ventrolateral margin. Some of these cells are adjacent to the prechordal plate on the dorsal side of the gastrula, but many (most in fact) are quite some distance away. The microscopy-based evidence presented in Figure 2 and elsewhere, however, focuses on a small number of sox17-expressing cells that are directly adjacent to, or intermingled with, the prechordal plate. It, therefore, seems problematic for the authors to generalize potential overlaps with the PP lineage to the entire endoderm, which includes cells in ventral locations. It would be helpful if the authors could search for additional markers that might stratify and/or mark the anterior endoderm and perform their trajectory analysis specifically on these cells.

      We thank the reviewer for these comments and suggestions. We fully agree with the reviewer's point that the expression of sox32 and sox17 cannot be used to distinguish dorsal endoderm from ventral-lateral endoderm cells. However, during the gastrulation stage, all endodermal cells express sox32 and sox17, and there are currently no specific marker genes available to distinguish between them.

      After gastrulation ends, the dorsal endoderm (i.e., the anterior endoderm) begins to express pharyngeal endoderm marker genes, such as pax1b. Therefore, in the analysis of embryonic data in vivo, when studying the segregation of the anterior endoderm and PP trajectory, we specifically used the pharyngeal endoderm as the subject to trace its developmental trajectory.

      In the case of Nodal explants, Nodal specifically induces the fate of the dorsal mesendoderm, which includes both the PP and pharyngeal endoderm (anterior endoderm). Precisely for this reason, we consider the Nodal explant system as a highly suitable model for investigating the mechanisms underlying the cell fate separation between anterior endoderm and PP. Thus, in the Nodal explant data, we included all endodermal cells for downstream analysis.

      To avoid any potential confusion for readers, we have revised the term "endoderm" in the manuscript to "anterior endoderm" as suggested by the reviewer.

      (4) It is not clear that the use of the nodal explant system is allowing for rigorous assessment of endoderm specification. Why are the numbers of endoderm cells so vanishingly few in the nodal explant experiments (Figure 1H, 3H), especially when compared to the embryo itself (e.g. Figures 1C-D)? It seems difficult to perform a rigorous analysis of endoderm specification using this particular model which seems inherently more biased towards PP vs. endoderm than the embryo itself. Why not simply perform nodal pathway manipulations in embryos?

      We sincerely thank the reviewer for raising this important question. In our study of the fate separation between the PP and anterior endoderm, we initially analyzed zebrafish embryonic data. However, when reconstructing the transcriptional lineage tree using URD, we observed that these two cell trajectories were positioned relatively far apart on the tree. Yet, existing studies have shown that the anterior endoderm and PP are not only spatially adjacent but also both originate from mesendodermal progenitor cells[2-4], and they share transcriptional similarities[5]. Therefore, as the reviewer pointed out, when tracing all progenitor cells of these two trajectories using the URD algorithm, it is easy to include other cell types, such as ventral epiblast cells (Author response image 2). For this reason, we concluded that directly using embryonic data to dissect the mechanism of fate separation between PP and anterior endoderm might not yield highly accurate results.

      In contrast, our group’s previous work, published in Cell Reports, demonstrated that the Nodal-induced explant system specifically enriches dorsal mesendodermal cells, including anterior endoderm, PP, and notochord[5]. Thus, we considered the Nodal explant system to be a highly suitable model for investigating the mechanism of fate separation between PP and anterior endoderm. Ultimately, by analyzing both in vivo embryonic data and Nodal explant data, we consistently found that the anterior endoderm likely originates from PP progenitor cells—a conclusion further validated by live imaging experiments.

      Regarding the reviewer’s concern about the relatively low number of endodermal cells in the Nodal explant system, we speculate that this is because the explants predominantly induce anterior endoderm. Since endodermal cells constitute only a small proportion of cells during gastrulation, and anterior endoderm represents an even smaller subset, the absolute number is naturally limited. Nevertheless, the anterior endodermal cells captured in our Nodal explants were sufficient to support our analysis of the fate separation mechanism between anterior endoderm and PP. Finally, to further strengthen the findings from scRNA-seq analyses, we subsequently performed live imaging validation experiments using both zebrafish embryos and the explant system.

      (5) The authors should not claim that proximity in UMAP space is an indication of transcriptional similarity (lines 207-208), especially for well-separated clusters. This is a serious misrepresentation of the proper usage of the UMAP algorithm. The authors make a similar claim later on (lines 272-274).

      We would like to extend our gratitude to the reviewer for their insightful comments. We have revised the descriptions regarding UMAP throughout the manuscript as suggested (Please see the main text in revised manuscript).

      Reviewer # 1 (Recommendations For The Authors):

      - Pseudotime trajectories constructed from single-cell snapshots are not true "lineage" measurements. Authors should refrain from referring to such data as lineage data (e.g. lines 99, 100, 103, 109, 112, 127, etc). Such models should be referred to as "trajectories", "hypothetical lineages", or something else.

      We are grateful to the reviewer for this comment. Following their recommendation, we have revised the terminology from "transcriptional lineage tree" to "trajectory" across the entire manuscript (Please see main text in revised manuscript).

      - The live imaging data presented in Figure 2 (and supplemental figures) are compelling and do seem to show that some cells can switch between PP and endo states. However, the number of cells reported is still too low to be able to ascertain whether or not this is just a rare/edge-case phenomenon. Tracks for just a single cell are reported in Figure 2C-D. This is insufficient. Tracks for many more cells should be collected and reported alongside this current sole (n=1) example. The choice of time window for these live imaging experiments should also be better explained. These live imaging experiments are being performed at or after 6hpf, but authors claim in the text that "... the segregation between PP and Endo has already occurred by 6hpf." (lines 126-127). Why not perform these live imaging experiments earlier, when the initial fate decision between PP and endo is supposedly occurring?

      We sincerely appreciate the reviewer’s insightful questions and constructive feedback. In response, we have made several important revisions. First, the reviewer noted that our original manuscript tracked only a single cell and suggested increasing the number of tracked cells. Following this recommendation, we repeated the live-imaging experiments and expanded the number of tracked endodermal cells (Please see the revised Movie S4 and Figure 2D). The experimental conditions were kept identical to the previous setup, and these cells consistently exhibited a gradual transition from a gsc+ fate to a sox17+ endodermal fate. In addition, the reviewer recommended performing live imaging at an earlier time point (Movie S5). Accordingly, we conducted additional experiments initiating live imaging at around 5.7 hours and observed the onset of a sox17 expression in gsc+ cells at approximately 6 hpf, which is consistent with our single-cell transcriptomic analysis.

      - The sections devoted to lengthy descriptions of GO terms (lines 131-146, 239-254) and receptor-ligand predictions (lines 170-185) are largely speculative. Consider streamlining.

      Thanks for the reviewer's comment. We have streamlined the content related to the GO analysis as suggested (Please see Lines 128-132, 157-167, 221-225).

      - The use of a "Nodal Activity Score" (lines 212-226) is clever but might actually be less informative than showing contributions from individual nodal target genes. The combining of counts data from 29 predicted nodal targets means that the contribution (or lack of contribution) from each gene becomes masked. The authors should include supplementary dot plots that break down the score across all 29 genes, allowing the reader to assess overall contributions and/or sub-clusters of gene co-expression patterns, if present.

      Thank you very much for the reviewer's positive feedback on our use of the "Nodal Activity Score" and the valuable suggestions provided. Following the recommendation, we analyzed the expression of the 29 Nodal direct targets used in our study across the WT, ndr1 knockdown (kd), and lft1 knockout (ko) groups. We found that the known axial mesoderm genes, such as chrd, tbxta, noto, and gsc, contributed significantly to the Nodal score. The newly conducted analysis has been included in the Supplementary Information (Please see Figure S7L).

      - The differential expression trends being reported for srcap (line 251) do not appear to be significant. Are details and P-values for these DEG tests reported somewhere in the manuscript?

      We thank the reviewer for raising this question. Based on the reviewer's comment, we performed statistical tests (Wilcoxon test) to compare the expression of srcap in PP and Endo. Our analysis revealed that while srcap expression is slightly higher in PP than in Endo, this difference is not statistically significant. The specific p-value and fold change have been indicated in the revised figure (Please see Figure 4J and S7H). Based on this analysis, we revised our description to state that srcap expression is slightly higher in the PP compared to in the anterior endoderm.

      - Following the drug experiments with the drug AU15330 (lines 254-263), authors have only reported #s of endodermal cells, which seem to have increased, which the authors suggest indicates a fate switch from PP to endo. However, the authors have not reported whether the numbers of PP cells decreased or stayed the same in these embryos. This would be helpful information to include, as it is very difficult to discern quantitative trends from the images presented in Fig 4H and 4L.

      Thank the reviewer for his/her comments and suggestions. Following the reviewer's suggestions, we performed Imaris analysis on the HCR staining results from the DMSO (control), 1μM AU15330-treated, and 5μM AU15330-treated groups. Our analysis focused on the number of frzb-positive cells (PP), and the comparison revealed that treatment with AU15330 significantly reduces the PP cell number. These findings have been incorporated into the revised manuscript and supplementary information (Please see Figures S7J and S7K).

      Reviewer #2 (Public review):

      Summary:

      During vertebrate gastrulation, the mesoderm and endoderm arise from a common population of precursor cells and are specified by similar signaling events, raising questions as to how these two germ layers are distinguished. Here, Cheng and colleagues use zebrafish gastrulation as a model for mesoderm and endoderm segregation. By reanalyzing published single-cell sequencing data, they identify a common progenitor population for the anterior endoderm and the mesodermal prechordal plate (PP). They find that expression levels of PP genes Gsc and ripply are among the earliest differences between these populations and that their increased expression suppresses the expression of endoderm markers. Further analysis of chromatin accessibility and Ripply cut-and-tag is consistent with direct repression of endoderm by this PP marker. This study demonstrates the roles of Gsc and Ripply in suppressing anterior endoderm fate, but this role for Gsc was already known and the effect of Ripply is limited to a small population of anterior endoderm. The manuscript also focuses extensively on the function of Nodal in specifying and patterning the mesoderm and endoderm, a role that is already well known and to which the current analysis adds little new insight.

      We would like to thank the reviewer #2 for the constructive comments and positive feedback regarding our manuscript.

      Strengths:

      Integrated single-cell ATAC- and RNA-seq convincingly demonstrate changes in chromatin accessibility that may underlie the segregation of mesoderm and endoderm lineages, including Gsc and ripply. Identification of Ripply-occupied genomic regions augments this analysis. The genetic mutants for both genes provide strong evidence for their function in anterior mesendoderm development, although these phenotypes are subtle.

      We thank the reviewer for recognizing our work, and we greatly appreciate the constructive suggestions from the reviewer.

      Weaknesses:

      The use of zebrafish embryonic explants for cell fate trajectory analysis (rather than intact embryos) is not justified. In both transcriptomic comparisons between the two fate trajectories of interest and Ripply cut-and-tag analysis, the authors rely too heavily on gene ontology which adds little to our functional understanding. Much of the work is focused on the role of Nodal in the mesoderm/endoderm fate decision, but the results largely confirm previous studies and again provide few new insights. Some experiments were designed to test the relationship between the mesoderm and endoderm lineages and the role of epigenetic regulators therein, but these experiments were not properly controlled and therefore difficult to interpret.

      We sincerely thank the reviewer for the comments. As we previously answered, in our study of the fate differentiation between the PP and the anterior endoderm, we initially analyzed zebrafish embryonic data. However, when we used URD to reconstruct the transcriptional trajectory tree, we found that these two cell trajectories were distantly located on the tree. Existing studies have shown that the anterior endoderm and the PP are not only spatially adjacent but also both originate from mesendodermal progenitor cells and share transcriptional similarities[2-4]. Therefore, when tracing all progenitor cells of these two trajectories using the URD algorithm, it is easy to include other cell types, such as ventral mesendodermal cells (Please see Author response image 2A). Based on this, we believe that directly using embryonic data to decipher the mechanism of fate differentiation between the PP and the anterior endoderm may not yield sufficiently precise results. In contrast, our group’s previous study published in Cell Reports demonstrated that the Nodal-induced explant system can specifically enrich dorsal mesendodermal cells, including the anterior endoderm, PP, and notochord[5]. Thus, we consider the Nodal explant system as an ideal model for studying the fate differentiation mechanism between the PP and the anterior endoderm. Ultimately, through comprehensive analysis of in vivo embryonic data and Nodal explant data, we consistently found that the anterior endoderm likely originates from PP progenitor cells—a conclusion further validated by live imaging experiments.

      Regarding the GO analysis, we have streamlined it as suggested by the reviewers. In the revised manuscript, we analyzed the expression of specific genes contributing to key GO functions. Additionally, in the revised version, we conducted more live imaging experiments and quantitative cell assays. We designed gRNA for srcap using the CRISPR CAS13 system to knock down srcap, which further corroborated the morpholino knockdown results, showing consistency with the morpholino data. We also performed Western blot validation of the SWI/SNF complex's response to the drug AU15330, confirming the drug's effectiveness. We hope these additional experiments adequately address the reviewers' concerns.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the introduction, the authors state that mesendoderm segregates into mesoderm and endoderm in a Nodal-concentration dependent manner. While it is true that higher Nodal signaling levels are required for endoderm specification, A) this is also true for some mesoderm populations, and B) Work from Caroline Hill's lab has shown that Nodal activity alone is not determinative of endoderm fate. Although the authors cite this work, it is conclusions are not reflected in this over-simplified explanation of mesendoderm development. The authors also state that it is not clear when PP and endoderm can be distinguished transcriptionally, but this was also addressed in Economou et al, 2022, which found that they can be distinguished at 60% epiboly but not 50% epiboly.

      We sincerely thank the reviewer for raising this question and reminding us of the conclusions drawn from that excellent study. As the reviewer pointed out, Economou et al. demonstrated that Nodal signaling alone is insufficient to determine the cell fate segregation of mesendoderm[6]. However, their study primarily focused on the fate segregation of the ventral-lateral mesendoderm lineage. In contrast, we believe that the mechanisms underlying dorsal mesendoderm specification may differ.

      First, it is well-studied that in zebrafish embryos, the most dorsal mesendoderm is initially specified by the activity of the dorsal organizer. Notably, the Nodal signaling ligands ndr1 and ndr2 begin to be expressed in the dorsal organizer as early as the sphere stage[7]. In our study, through single-cell transcriptomic trajectory analysis and live imaging analysis, we observed that the cell fate segregation of the dorsal mesendoderm can be traced back to the shield stage.

      Second, the regulatory mechanisms governing dorsal mesendoderm fate differentiation may differ from those of the ventral-lateral mesendoderm. For instance, the gsc gene is exclusively expressed in the dorsal mesendoderm and is absent in the ventral-lateral mesendoderm. Given that gsc is a critical master gene, its overexpression in the ventral side can induce a complete secondary body axis. Similarly, ripply1, identified in our study, is also expressed early and specifically in the dorsal mesendoderm. Overexpression of ripply1 in the ventral side similarly induces a secondary body axis, albeit with the absence of the forebrain[5]. In this study, we found that gsc and ripply1 as the repressor, collectively inhibited dorsal (anterior) endoderm specified from PP progenitors.

      In summary, our study focuses on the regulatory mechanisms of fate segregation in the dorsal (anterior) mesendoderm, which differs from the mechanisms of ventral-lateral mesendoderm lineage segregation reported by Economou et al. We believe that this distinction represents a key novelty of our work.

      (2) As noted in the manuscript, Warga and Nusslein-Volhard determined long ago that PP and anterior endoderm share a common precursor. It is surprising that this close relationship is not apparent from the lineage trees in whole embryos but is apparent in lineage trees from explants. The authors speculate that the resolution of the whole embryo dataset is insufficient to detect this branch point and propose explants as the solution, but it is not clear why the explant dataset is higher resolution and/or more appropriate to address this question.

      We sincerely thank the reviewer for their thoughtful comments. As we mentioned previously, our investigation of fate differentiation between the PP and the anterior endoderm initially involved the analysis of zebrafish embryonic data. However, when we used URD to reconstruct the transcriptional trajectory tree, we observed that these two cell trajectories were located far apart. Previous elegant studies, as the reviewer mentioned, have shown that the anterior endoderm and the PP are not only spatially adjacent but also both originate from mesendodermal progenitor cells and share transcriptional similarities[2,3,8]. Consequently, when tracing all progenitor cells of these two trajectories using the URD algorithm, other cell types—such as ventral mesendodermal cells—are easily included. Based on this, we believe that directly using embryonic data to elucidate the mechanism of fate differentiation between the PP and the anterior endoderm may lack sufficient precision.

      In contrast, our group’s previous study published in Cell Reports demonstrated that the Nodal-induced explant system specifically enriches dorsal mesendodermal cells, including the anterior endoderm, PP, and notochord[5]. Therefore, we consider the Nodal explant system as an ideal model for studying the mechanism underlying fate differentiation between the PP and the anterior endoderm. Through comprehensive analyses of both in vivo embryonic and Nodal explant data, we consistently found that the anterior endoderm likely originates from PP progenitor cells—a conclusion further supported by live imaging experiments.

      (3) Much of the analysis of DEGs between the lineages of interest is focused on GO term enrichment. But this logic is circular. The endoderm lineage is defined as such because it expresses endoderm-enriched genes, therefore the finding that the endoderm lineage is enriched for endoderm-related GO terms adds no new insights.

      We thank the reviewer for these comments. As the reviewers suggested, in the revised manuscript, we indicated specific genes associated with key GO terms (Please see Figure 4B). Additionally, we have streamlined the content related to the GO analysis as suggested.

      (4) The authors describe the experiment in Figure S4 as key evidence that Gsc+ cells can give rise to endoderm, but no controls are presented. Only a few cells are shown that express mCherry upon injection of sox17:cre constructs. Is mCherry also expressed in the occasional cell injected with Gsc:lox-stop-lox-mCherry in the absence of cre? Although they report 3 independent replicates, it appears that only 2 individual embryos express mCherry. This very small number is not convincing, especially in the absence of appropriate controls.

      We thank the reviewer for raising this question. Following the reviewer's suggestion, we injected gsc:loxp-stop-loxp-mCherry into zebrafish embryos at the 1-cell stage as a control. After performing at least three independent replicates and analyzing no fewer than 100 embryos, we did not observe any mCherry-positive cells. Additionally, we co-injected gsc:loxp-stop-loxp-mCherry with sox17:cre and increased the sample size. Furthermore, we constructed plasmids of sox17:loxp-stop-loxp-mCherry and gsc:cre, and upon injection at the 1-cell stage, we observed RFP-positive cells at 8 hpf (Please see Author response image 1 and Figure S4E). Together with our live imaging data, these experiments collectively demonstrate that anterior endodermal cells can originate from PP progenitors.

      (5) The authors spend a lot of effort demonstrating that PP and anterior endoderm are Nodal dependent. First, these data (especially Figures 3E and 3I) are not very convincing, as the differences shown are very small or not apparent. Second, this is already well-known and adds nothing to our understanding of mesoderm-endoderm segregation.

      We sincerely thank the reviewer for their insightful questions. First, the reviewer mentioned that in the initial version of our manuscript, the effects of ndr1 knockdown and lefty1 knockout on Nodal signaling and cell fate—particularly prechordal plate (PP) and anterior endoderm (endo)—in Nodal-induced explants were not very pronounced. We recognize that the negative feedback mechanism between Nodal and Lefty signaling may explain why Nodal acts as a morphogen, regulating pattern formation through a Turing-like model[9]. Therefore, knocking down a Nodal ligand gene, such as ndr1 in this study, or knocking out a Nodal inhibitor, such as lft1, may only have a subtle impact on Nodal signaling[10].

      Accordingly, in this study, we performed extensive pSmad2 immunofluorescence analysis and observed that although the overall intensity of Nodal activity did not change dramatically, there was a statistically significant difference. Importantly, this subtle variation in Nodal signaling strength is precisely what we intended to capture, since PP and anterior endoderm are highly sensitive to Nodal signaling[11], and even minor differences may bias their fate segregation.

      This leads directly to the reviewer’s second concern. While numerous studies suggest that the strength of Nodal signaling influences mesendodermal fate—with high Nodal promoting endoderm and lower concentrations inducing mesoderm—most of these studies focus on ventral-lateral mesendoderm development[4,6,10]. In contrast, the mechanisms underlying dorsal mesendoderm fate specification differ, which is a key innovation of our study.

      Previous work by Bernard Thisse and colleagues demonstrated that even a slight reduction in Nodal signaling, achieved by overexpressing a Nodal inhibitor, is sufficient to cause defects in the specification of PP and endoderm[11]. This indicates that PP and endoderm require the highest levels of Nodal signaling for proper specification. Moreover, the most dorsal mesendoderm, PP and anterior endoderm are not only spatially adjacent but also share similar transcriptional states, making the regulation of their fate separation particularly challenging to study.

      The Dr. C.P. lab made important contributions to this issue, showing that the duration of Nodal exposure is critical for segregating PP and anterior endoderm fates: prolonged Nodal signaling promotes expression of the transcriptional repressor Gsc, which directly suppresses the key endodermal transcription factor Sox17, thereby inhibiting anterior endoderm specification[3]. They also found that tight junctions among PP cells facilitate Nodal signal propagation[8]. However, their studies revealed that Gsc mutants do not exhibit endodermal phenotypes, suggesting that additional factors or mechanisms regulate PP versus anterior endoderm fate separation[3].

      In our study, we first observed that subtle differences in Nodal concentration may bias the fate choice between PP and anterior endoderm. Given that ndr1 knockdown and lft1 knockout mildly reduce or enhance Nodal signaling, respectively, we reasoned that using these two perturbations in a Nodal-induced explant system combined with single-cell RNA sequencing could generate transcriptomic profiles under slightly reduced and enhanced Nodal signaling. This approach may help identify key decision points and transcriptional differences during PP and anterior endoderm segregation, ultimately uncovering the molecular mechanisms downstream of Nodal that govern their fate separation.

      (6) The authors claim that scrap expression differs between the 2 lineages of interest, but this is not apparent from Figure 4J-K. Experiments testing the role of SWI/SNF and scrap also require additional controls. Can scrap MO phenotypes be rescued by scrap RNA? Is there validation that SWI/SNF components are degraded upon treatment with AU15330?

      We are very grateful for the reviewers' questions. Using single-cell data from zebrafish embryos and Nodal explants, we compared the expression of srcap in the PP and anterior Endo cell populations. We found that srcap expression showed a slight increase in PP compared to anterior Endo, but the difference was not statistically significant (Please see Figure 4J and S7H). Therefore, we modified our description in the revised manuscript. However, we speculate that this slight difference might influence the distinct cell fate specification between PP and anterior endo. In the original version of the manuscript, we reported that either treatment with AU15330, an inhibitor of the SWI/SNF complex, or injection of morpholino targeting srcap—a key component of the SWI/SNF complex—enhanced anterior endo fate while reducing PP cell specification. During this round of revision, we initially attempted to follow the reviewer’s suggestion to co-inject srcap mRNA along with srcap morpholino to rescue the phenotype. However, we found that the length of srcap mRNA exceeds 10,000 bp, and despite multiple attempts, we were unable to successfully obtain the srcap mRNA. Therefore, we were unable to perform the rescue experiment and instead adopted an alternative approach to validate the function of srcap. We aimed to use anthor knockdown approach (CRISPR/Cas system) to determine whether a phenotype similar to that observed with morpholino knockdown could be achieved. Using the CRISPR/Cas13 system, we designed gRNA targeting srcap, knocked down srcap, and examined the cell specification of PP and anterior endo. We found that, consistent with our previous results, knocking down srcap obviously reduced PP cell fate while increasing anterior endo cell fate (Author response image 3). Additionally, the reviewer raised the question of whether the SWI/SNF complex is degraded after AU15330 treatment. Following the reviewer’s suggestion, we attempted to perform Western blot analysis on BRG1, one of the components of the SWI/SNF complex. However, despite multiple attempts, we were unable to achieve successful detection of the BRG1 protein by the antibody in zebrafish. Several studies have reported that knockdown or knockout of brg1 leads to defects in neural crest cell specification in zebrafish[12,13]. Therefore, alternatively, we treated zebrafish embryos at the one-cell stage with 0 μM (DMSO), 1 μM, and 5 μM AU15330, and examined the expression of sox10 and pigment development around 48 h. We found that treatment with 1 μM AU15330 reduced sox10 expression and pigment production, though not significantly, whereas treatment with 5 μM AU15330 significantly disrupted neural crest cell development. Thus, this experiment demonstrates that AU15330 is functional in zebrafish. (Author response image 3).

      Author response image 3.

      (A) Characterization of anterior endoderm and PP cells following CRISPR-Cas13d-mediated srcap knockdown. (B) Validation of srcap mRNA expression by RT‑qPCR following CRISPR‑Cas13d knockdown. (C) RT‑qPCR shows the expression of sox10 after treatment with increasing concentrations of AU15300. (D) Morphology of zebrafish embryos at 48 hpf after treatment with increasing concentrations of AU15300.

      (7) The authors conclude from their chromatin accessibility analysis that variations in Nodal signaling are responsible for expression levels of PP and endoderm genes, but they do not consider the alternative explanation that FGF signaling is playing this role. Such a function for FGF was established by Caroline Hill's lab, and the authors also show in Figure S5G that FGF signaling in enriched between these cell populations.

      Thank you very much for raising this issue. As the reviewer pointed out, Caroline Hill's lab has conducted elegant work demonstrating that FGF signaling plays a crucial role in the separation of ventral-lateral mesendoderm cell fates[4,6]. In contrast, our study primarily focuses on studying the mechanisms underlying the separation of dorsal mesendoderm cell fates. However, our research also reveals that FGF signaling significantly regulates the fate separation of the dorsal mesendoderm, as inhibiting FGF signaling suppresses PP cell specification while promoting anterior Endo fate. In our previously published work, we found that Nodal signaling can directly activate the expression of FGF ligand genes[5]. Therefore, we hypothesize that Nodal signaling, acting as a master regulator, activates various downstream target genes—including FGF—and how FGF signaling regulates the cell fate separation of the dorsal mesendoderm warrants further investigation in our further studies.

      (8) When interpreting the results of their Ripply cut-and-run experiment, the authors again rely heavily on GO term analysis and claim that this supports a role for Ripply as a transcriptional repressor. GO term enrichment does not equal functional analysis. It would be more convincing to intersect DEGs between WT and ripply-/- embryos with Ripply-enriched loci.

      Thanks for raising this important issue and the constructive suggestion. In response to the reviewer's valid concern regarding the GO term analyses from our CUT&Tag data, we implemented a more stringent filtering strategy. We identified peaks enriched in the treatment group and applied differential analysis, selecting genes with a log<sub>2</sub>FoldChange > 3, padj < 0.05, and baseMean > 30 as high-confidence Ripply1 binding targets. A GO enrichment analysis of these genes revealed significant terms related to muscle development, consistent with Ripply1's established role in somite development, thereby validating our approach. We supplemented the related gene list in the revised manuscript. Moreover, within this refined analysis, we found that sox32 met our binding threshold, while sox17 did not. Furthermore, as suggested, we examined mespbb—a known Ripply1-repressed gene—which was present, and gsc, a Nodal target used as a negative control, which was absent. This confirms the specificity of our analysis (Figure 6 and Figure S11). Consequently, our revised analyses support a model in which Ripply1 directly binds the sox32 promoter. Given that Sox32 is a known upstream regulator of sox17, this binding provides a plausible direct mechanism for the observed regulation of sox17 expression. We have updated the figures and text accordingly. We attempted to generate ripply1<sup>-/-</sup> mutants but found that homozygous loss results in embryonic lethality.

      (9) The way N's are reported is unconventional. N= number of embryos used in the experiment, n= number of embryos imaged. If an embryo was not imaged or analyzed in any way, it cannot be considered among the embryos in an experiment. If only 4 embryos were imaged, the N for that experiment is 4 regardless of how many embryos were stained. Authors should also report not only the number of embryos examined but also the number of independent trials performed for all experiments.

      Thank you very much for the reviewer's suggestion. As suggested, we have revised the description regarding the number of embryos and experimental replicates in the figure legends.

      (10) The authors should avoid the use of red-green color schemes in figures to ensure accessibility for color-blind readers.

      Thanks for the suggestions. We have updated the figures in our revised manuscript and adjusted the color schemes to avoid red-green combinations.

      Reviewer #3 (Public Review):

      Summary:

      Cheng, Liu, Dong, et al. demonstrate that anterior endoderm cells can arise from prechordal plate progenitors, which is suggested by pseudo time reanalysis of published scRNAseq data, pseudo time analysis of new scRNAseq data generated from Nodal-stimulated explants, live imaging from sox17:DsRed and Gsc:eGFP transgenics, fluorescent in situ hybridization, and a Cre/Lox system. Early fate mapping studies already suggested that progenitors at the dorsal margin give rise to both of these cell types (Warga) and live imaging from the Heisenberg lab (Sako 2016, Barone 2017) also pretty convincingly showed this. However, the data presented for this point are very nice, and the additional experiments in this manuscript, however, further cement this result. Though better demonstrated by previous work (Alexander 1999, Gritsman 1999, Gritsman 2000, Sako 2016, Rogers 2017, others), the manuscript suggests that high Nodal signaling is required for both cell types, and shows preliminary data that suggests that FGF signaling may also be important in their segregation. The manuscript also presents new single-cell RNAseq data from Nodal-stimulated explants with increased (lft1 KO) or decreased (ndr1 KD) Nodal signaling and multi-omic ATAC+scRNAseq data from wild-type 6 hpf embryos but draws relatively few conclusions from these data. Lastly, the manuscript presents data that SWI/SNF remodelers and Ripply1 may be involved in the anterior endoderm - prechordal plate decision, but these data are less convincing. The SWI/SNF remodeler experiments are unconvincing because the demonstration that these factors are differentially expressed or active between the two cell types is weak. The Ripply1 gain-of-function experiments are unconvincing because they are based on incredibly high overexpression of ripply1 (500 pg or 1000 pg) that generates a phenotype that is not in line with previously demonstrated overexpression studies (with phenotypes from 10-20x lower expression). Similarly, the cut-and-tag data seems low quality and like it doesn't support direct binding of ripply1 to these loci.

      In the end, this study provides new details that are likely important in the cell fate decision between the prechordal plate and anterior endoderm; however, it is unclear how Nodal signaling, FGF signaling, and elements of the gene regulatory network (including Gsc, possibly ripply1, and other factors) interact to make the decision. I suggest that this manuscript is of most interest to Nodal signaling or zebrafish germ layer patterning afficionados. While it provides new datasets and observations, it does not weave these into a convincing story to provide a major advance in our understanding of the specification of these cell types.

      We sincerely thank the reviewer for their thorough and thoughtful assessment of our work. The reviewer acknowledged several strengths of our study, such as the use of multiple technical approaches to demonstrate that anterior endoderm differentiates from PP progenitor cells, and recognized the value of the newly added single-cell omics data. The reviewer also raised some concerns regarding the initial version of our work, including the SWI/SNF remodeler experiments and the Ripply1 gain-of-function experiment. In the revised manuscript, we have supplemented these parts with additional control experiments to better support our conclusions. We hope that our updated manuscript adequately addresses the points raised by the reviewer.

      Major issues:

      (1) UMAPs: There are several instances in the manuscript where UMAPs are used incorrectly as support for statements about how transcriptionally similar two populations are. UMAP is a stochastic, non-linear projection for visualization - distances in UMAP cannot be used to determine how transcriptionally similar or dissimilar two groups are. In order to make conclusions about how transcriptionally similar two populations are requires performing calculations either in the gene expression space, or in a linear dimensional reduction space (e.g. PCA, keeping in mind that this will only consider the subset of genes used as input into the PCA). Please correct or remove these instances, which include (but are not limited to):

      p.4 107-110

      p.4 112

      p.8 207-208

      p.10 273-275

      We would like to thank the reviewer for raising this question. The descriptions of UMAP have been revised throughout the manuscript in accordance with the reviewer's suggestion (Please see the main text in the revised manuscript).

      (2) Nodal and lefty manipulations: The section "Nodal-Lefty regulatory loop is needed for PP and anterior Endo fate specification" and Figure 3 do not draw any significant conclusions. This section presents a LIANA analysis to determine the signals that might be important between prechordal plate and endoderm, but despite the fact that it suggests that BMP, Nodal, FGF, and Wnt signaling might be important, the manuscript just concludes that Nodal signaling is important. Perhaps this is because the conclusion that Nodal signaling is required for the specification of these cell types has been demonstrated in zebrafish in several other studies with more convincing experiments (Alexander 1999, Gritsman 1999, Gritsman 2000, Rogers 2017, Sako 2016). While FGF has recently been demonstrated to be a key player in the stochastic decision to adopt endodermal fate in lateral endoderm (Economou 2022), the idea that FGF signaling may be a key player in the differentiation of these two cell types has strangely been relegated to the discussion and supplement. Lastly, the manuscript does not make clear the advantage of performing experiments to explore the PP-Endo decision in Nodal-stimulated explants compared to data from intact embryos. What would be learned from this and not from an embryo? Since Nodal signaling stimulates the expression of Wnts and FGFs, these data do not test Nodal signaling independent of the other pathways. It is unclear why this artificial system that has some disadvantages is used since the manuscript does not make clear any advantages that it might have had.

      We sincerely thank the reviewers for their valuable comments. As mentioned in our manuscript, although a substantial number of studies have reported on the mechanisms governing the segregation of mesendoderm fate in zebrafish embryos—including the Dr. Hill laboratory’s work cited by the reviewers, which demonstrated the involvement of FGF signaling in the ventral mesendoderm fate specification—research on the regulatory mechanisms underlying anterior mesendoderm differentiation remains relatively limited. This is largely due to the challenges posed by the close physical proximity and similar transcriptional states of anterior mesendoderm cells, as well as their shared dependence on high levels of Nodal signaling for specification.

      Several studies from the Dr. C.P. Heisenberg’s laboratory have attempted to elucidate the fate segregation between anterior mesendoderm cells, namely the prechordal plate (PP) and anterior endoderm (endo) cells. They found that PP cells are tightly connected, facilitating the propagation of Nodal signaling[8]. Prolonged exposure to Nodal activates the expression of Gsc, which acts as a transcriptional repressor to inhibit sox17 expression, thereby suppressing endodermal fate[3]. However, they also noted that Gsc mutants do not exhibit endoderm developmental defects, suggesting the involvement of additional factors in this process.

      The reviewer inquired about our rationale for using the Nodal-injected explant system. In our investigation of the fate separation between the PP and the anterior endo, we initially analyzed zebrafish embryonic data. Using URD to reconstruct the transcriptional lineage tree, we found that these two cell types were positioned distantly from each other. However, existing literature indicates that the anterior endoderm and PP are not only spatially adjacent but also derive from common mesendodermal progenitors and exhibit transcriptional similarities[2,8]. As the reviewer noted, when tracing all progenitor cells of these two lineages using URD, it is easy to inadvertently include other cell types—such as ventral epiblast cells—which may compromise the accuracy of the analysis. We therefore concluded that directly using embryonic data to dissect the mechanism of fate separation between PP and anterior endoderm might not yield highly precise results.

      By contrast, our group’s earlier study published in Cell Reports demonstrated that the Nodal-induced explant system specifically enriches dorsal mesendodermal cells, including anterior endo, PP, and notochord[5]. This makes the Nodal explant system a highly suitable model for studying the fate separation between PP and anterior endo. Ultimately, by analysing in vivo embryonic data and Nodal explant data, we consistently found that the anterior endoderm likely originates from PP progenitors—a conclusion further supported by live imaging experiments.

      As we answered above, we first used the analyses of single-cell RNA sequencing and live imaging to demonstrate that anterior endoderm can originate from PP progenitor cells. Understanding the mechanism underlying the fate segregation between these two cell populations became a key focus of our research. We began by applying cell communication analysis to our single-cell data to identify signaling pathways that may be involved. This analysis specifically highlighted the Nodal-Lefty signaling pathway. Since Lefty acts as an inhibitor of Nodal signaling, we hypothesized that differences in Nodal signaling strength might regulate the fate of these two cell populations. By overexpressing different concentrations of Nodal mRNA and examining the fates of PP and anterior Endo cells, we confirmed this hypothesis.

      Thus, we propose that even subtle differences in Nodal signaling levels may influence anterior mesendoderm fate decisions. To test this, we generated systems with slightly reduced Nodal signaling (via ndr1 knockdown) and slightly elevated Nodal signaling (via lft1 knockout). Using these models, we precisely captured the critical stage of fate segregation between PP and anterior endo cells and identified a novel transcriptional repressor, Ripply1, which works in concert with Gsc to suppress anterior endoderm differentiation.

      (3) ripply1 mRNA injection phenotype inconsistent with previous literature: The phenotype presented in this manuscript from overexpressing ripply1 mRNA (Fig S11) is inconsistent with previous observations. This study shows a much more dramatic phenotype, suggesting that the overexpression may be to a non-physiological level that makes it difficult to interpret the gain-of-function experiments. For instance, Kawamura et al 2005 perform this experiment but do not trigger loss of head and eye structures or loss of tail structures. Similarly, Kawamura et al 2008 repeat the experiment, triggering a mildly more dramatic shortening of the tail and complete removal of the notochord, but again no disturbance of head structures as displayed here. These previous studies injected 25 - 100 pg of ripply1 mRNA with dramatic phenotypes, whereas this study uses 500 - 1000 pg. The phenotype is so much more dramatic than previously presented that it suggests that the level of ripply1 overexpression is sufficiently high that it may no longer be regulating only its endogenous targets, making the results drawn from ripply1 overexpression difficult to trust.

      We sincerely thank the reviewer for raising this question. First, we apologize for not providing a detailed description of the amount of HA-ripply1 mRNA injected in our previous manuscript. We injected 500 pg of HA-ripply1 mRNA at the 1-cell stage and allowed the embryos to develop until 6 hpf for the CUT&Tag experiment. In the supplementary materials, we included a bright-field image of an 18 hpf-embryo injected with HA-ripply1 mRNA, which morphologically exhibited severe developmental abnormalities. The reviewer pointed out that the amount of ripply1 mRNA we injected might be excessive, potentially leading to non-specific gain-of-function effects. The injection dose of 500 pg was determined based on conclusions from our previous study. In that study, injecting 24 pg of ripply1 mRNA into one cell of zebrafish embryos at the 16–32 cell stage was sufficient to induce a secondary axis lacking the forebrain[5]. From this, we estimated that an injection concentration of approximately 500–1000 pg would be appropriate at the 1-cell stage, so that after several rounds of cell division, each cell gained 20-30 pg mRNA at 32 cell stage. Additionally, we conducted supplementary experiments injecting 100 pg, 250 pg, and 500 pg of ripply1 mRNA, and observed 500 pg of ripply1 mRNA led to a dramatic suppression of endoderm formation (Author response image 4).

      Finally, our study focuses on the mechanism of cell fate segregation in the anterior mesendoderm, primarily during gastrulation. The embryos injected with ripply1 mRNA underwent normal gastrulation, and our CUT&Tag experiment was performed at 6 hpf. Therefore, we believe that the amount of ripply1 mRNA injected in this study is appropriate for addressing our research question.

      Author response image 4.

      Different concentrations of ripply1 mRNA were injected into zebrafish embryos at the one-cell stage, with RFP fluorescence labeling sox17-positive cells.

      (4) Ripply1 binding to sox17 and sox32 regulatory regions not convincing: The Cut and Tag data presented in Fig 6J-K does not seem to be high quality and does not seem to provide strong support that Ripply 1 binds to the regulatory regions of these genes. The signal-to-noise ratio is very poor, and the 'binding' near sox17 that is identified seems to be even coverage over a 14 kb region, which is not consistent with site-specific recruitment of this factor, and the 'peaks' highlighted with yellow boxes do not appear to be peaks at all. To me, it seems this probably represents either: (1) overtagmentation of these samples or (2) an overexpression artifact from injection of too high concentration of ripply1-HA mRNA. In general, Cut and Tag is only recommended for histone modifications, and Cut and Run would be recommended for transcriptional regulators like these (see Epicypher's literature). Given this and the previous point about Ripply1 overexpression, I am not convinced that Ripply1 regulates endodermal genes. The existing data could be made somewhat more convincing by showing the tracks for other genes as positive and negative controls, given that Ripply1 has known muscle targets (how does its binding look at those targets in comparison) and there should be a number of Nodal target genes that Ripply1 does not bind to that could be used as negative controls. Overall this experiment doesn't seem to be of high enough quality to drive the conclusion that Ripply1 directly binds near sox17 and sox32 and from the data presented in the manuscript looks as if it failed technically.

      We sincerely thank the reviewer for raising this question. We apologize that the binding regions of sox17 marked in our previous analysis were incorrect, and we have made the corresponding revisions in the latest version of the manuscript.

      The reviewer noted that our CUT&Tag data contain considerable noise. To address this, we further refined our data processing: we annotated all peaks enriched in the treatment group and performed differential analysis, selecting genes with log<sub>2</sub>FoldChange > 3, padj < 0.5, and baseMean > 30 as candidate targets of Ripply1 binding. Subsequent GO enrichment analysis of these genes revealed significant enrichment of muscle development-related GO terms, which is consistent with previously reported roles of Ripply1 in regulating somite development. Therefore, we believe our filtering method effectively removes a large number of noise peaks and their associated genes.

      Under these screening criteria, we found that sox32 meets the threshold, while sox17 does not. In addition, following the reviewer’s suggestion, we examined mespbb—a known gene repressed by Ripply1—and gsc, a Nodal target gene, as a negative control.

      Based on these new analyses, we have revised our figures and text accordingly. Our data now support the possibility that Ripply1 may directly bind to the promoter region of sox32. Since sox32 acts as a direct upstream regulator of sox17, this binding could influence sox17 expression (Figure 6 and Figure S11).

      Finally, we would like to note that studies have reported Ripply1 as a transcriptional repressor, which may function by recruiting other co-factors, such as Groucho, to form a complex[14,15]. This might explain why our CUT&Tag data detected Ripply1 binding to a broad set of genes.

      (5) "Cooperatively Gsc and ripply1 regulate": I suggest avoiding the term "cooperative," when describing the relationship between Ripply1 and Gsc regulation of PP and anterior endoderm - it evokes the concept of cooperative gene regulation, which implies that these factors interact with each biochemically in order to bind to the DNA. This is not supported by the data in this manuscript, and is especially confusing since Ripply1 is thought to require cooperative binding with a T-box family transcription factor to direct its binding to the DNA.

      We sincerely thank the reviewer for raising this important issue. The reviewer pointed out that the term "Cooperatively" may not be entirely appropriate in the context of our study. In accordance with the reviewer's suggestion, we have replaced "Cooperatively" with "Collectively" in the relevant sections.

      (6) SWI/SNF: The differential expression of srcap doesn't seem very remarkable. The dot plots in the supplement S7H don't help - they seem to show no expression at all in the endoderm, which is clearly a distortion of the data, since from the violin plots it's obviously expressed and the dot-size scale only ranges from ~30-38%. Please add to the figure information about fold-change and p-value for the differential expression. Publicly available scRNAseq databases show scrap is expressed throughout the entire early embryo, suggesting that it would be surprising for it to have differential activity in these two cell types and thereby contribute to their separate specification during development. It seems equally possible that this just mildly influences the level of Nodal or FGF signaling, which would create this effect.

      Thank the Reviewer for this question. As suggested, we performed Wilcoxon tests to compare srcap expression between PP and Endo populations. The analysis shows that while srcap expression is moderately elevated in PP compared to in Endo, this difference is not statistically significant. The corresponding p-value and fold change have now been included in the revised figure (Please see Figure 4J and S7H). Although the transcriptional level of srcap shows no significant difference between PP and anterior endoderm, our subsequent experiments—using AU15330 (an inhibitor of the SWI/SNF complex) and injecting morpholino targeting srcap, a key component of the SWI/SNF complex—demonstrated that its inhibition indeed promotes anterior endoderm fate while reducing PP cell specification. Therefore, we propose that subtle differences in the SWI/SNF complex may regulate the fate specification of PP and anterior endoderm through two mechanisms. First, as mentioned in our study, these chromatin remodelers modulate the expression of master regulators such as Gsc and Ripply1, thereby influencing cell fate decisions. Second, as noted by the reviewer, these chromatin remodelers may affect the interpretation of Nodal signaling, ultimately contributing to the divergence between PP and anterior endoderm fates.

      The multiome data seems like a valuable data set for researchers interested in this stage of zebrafish development. However, the presentation of the data doesn't make many conclusions, aside from identifying an element adjacent to ripply1 whose chromatin is open in prechordal plate cells and not endodermal cells and showing that there are a number of loci with differential accessibility between these cell types. That seems fairly expected since both cell types have several differentially expressed transcriptional regulators (for instance, ripply1 has previously been demonstrated in multiple studies to be specific to the prechordal plate during blastula stages). The manuscript implies that SWI/SNF remodeling by Srcap is responsible for the chromatin accessibility differences between these cell types, but that has not actually been tested. It seems more likely that the differences in chromatin accessibility observed are a result of transcription factors binding downstream of Nodal signaling.

      We thank the reviewer for recognizing the value of our newly generated data. Through integrative analysis of single-cell data from wild-type, ndr1 kd, and lft1 ko groups of Nodal-injected explants at 6 hours post-fertilization (hpf), we identified a critical branching point in the fate segregation of the prechordal plate (PP) and anterior endoderm (Endo), where chromatin remodelers may play a significant role. Based on this finding, we performed single-cell RNA and ATAC sequencing on zebrafish embryos at 6 hpf. Analysis of this multi-omics dataset revealed that transcriptional repressors such as Gsc, Ripply1, and Osr1 exhibit differences in both transcriptional and chromatin accessibility levels between the PP and anterior Endo. Subsequent overexpression and loss-of-function experiments further demonstrated that Gsc and Ripply1 collaboratively suppress endodermal gene expression, thereby inhibiting endodermal cell fate. Previous studies have reported that for the activation of certain Nodal downstream target genes, the pSMAD2 protein of the Nodal signaling pathway recruits chromatin remodelers to facilitate chromatin opening and promote further transcription of target genes[16]. Therefore, our data provide chromatin accessibility profiles for Gsc and Ripply1, offering a valuable resource for future investigations into their pSMAD2 binding sites.

      Minor issues:

      Figure 2 E-F: It's not clear which cells from E are quantitated in F. For instance, the dorsal forerunner cells are likely to behave very differently from other endodermal progenitors in this assay. It would be helpful to indicate which cells are analyzed in Fig F with an outline or other indicator of some kind. Or - if both DFCs and endodermal cells are included in F, to perhaps use different colors for their points to help indicate if their fluorescence changes differently.

      Thank you for the reviewer's suggestion. In the revised version of the figure, we have outlined the regions of the analyzed cells.

      Fig 3 J: Should the reference be Dubrulle et al 2015, rather than Julien et al?

      Thanks, we have corrected.

      References:

      Alexander, J. & Stainier, D. Y. A molecular pathway leading to endoderm formation in zebrafish. Current biology : CB 9, 1147-1157 (1999).

      Barone, V. et al. An Effective Feedback Loop between Cell-Cell Contact Duration and Morphogen Signaling Determines Cell Fate. Dev. Cell 43, 198-211.e12 (2017).

      Economou, A. D., Guglielmi, L., East, P. & Hill, C. S. Nodal signaling establishes a competency window for stochastic cell fate switching. Dev. Cell 57, 2604-2622.e5 (2022).

      Gritsman, K. et al. The EGF-CFC protein one-eyed pinhead is essential for nodal signaling. Cell 97, 121-132 (1999).

      Gritsman, K., Talbot, W. S. & Schier, A. F. Nodal signaling patterns the organizer. Development (Cambridge, England) 127, 921-932 (2000).

      Kawamura, A. et al. Groucho-associated transcriptional repressor ripply1 is required for proper transition from the presomitic mesoderm to somites. Developmental cell 9, 735-744 (2005).

      Kawamura, A., Koshida, S. & Takada, S. Activator-to-repressor conversion of T-box transcription factors by the Ripply family of Groucho/TLE-associated mediators. Molecular and cellular biology 28, 3236-3244 (2008).

      Sako, K. et al. Optogenetic Control of Nodal Signaling Reveals a Temporal Pattern of Nodal Signaling Regulating Cell Fate Specification during Gastrulation. Cell Rep. 16, 866-877 (2016).

      Rogers, K. W. et al. Nodal patterning without Lefty inhibitory feedback is functional but fragile. eLife 6, e28785 (2017).

      Warga, R. M. & Nüsslein-Volhard, C. Origin and development of the zebrafish endoderm. Development 126, 827-838 (1999).

      References:

      (1) Steinbeisser, H., and De Robertis, E.M. (1993). Xenopus goosecoid: a gene expressed in the prechordal plate that has dorsalizing activity. C R Acad Sci III 316, 959-971.

      (2) Warga, R.M., and Nusslein-Volhard, C. (1999). Origin and development of the zebrafish endoderm. Development (Cambridge, England) 126, 827-838. 10.1242/dev.126.4.827.

      (3) Sako, K., Pradhan, S.J., Barone, V., Inglés-Prieto, Á., Müller, P., Ruprecht, V., Čapek, D., Galande, S., Janovjak, H., and Heisenberg, C.P. (2016). Optogenetic Control of Nodal Signaling Reveals a Temporal Pattern of Nodal Signaling Regulating Cell Fate Specification during Gastrulation. Cell reports 16, 866-877. 10.1016/j.celrep.2016.06.036.

      (4) van Boxtel, A.L., Economou, A.D., Heliot, C., and Hill, C.S. (2018). Long-Range Signaling Activation and Local Inhibition Separate the Mesoderm and Endoderm Lineages. Developmental cell 44, 179-191.e175. 10.1016/j.devcel.2017.11.021.

      (5) Cheng, T., Xing, Y.Y., Liu, C., Li, Y.F., Huang, Y., Liu, X., Zhang, Y.J., Zhao, G.Q., Dong, Y., Fu, X.X., et al. (2023). Nodal coordinates the anterior-posterior patterning of germ layers and induces head formation in zebrafish explants. Cell reports 42, 112351. 10.1016/j.celrep.2023.112351.

      (6) Economou, A.D., Guglielmi, L., East, P., and Hill, C.S. (2022). Nodal signaling establishes a competency window for stochastic cell fate switching. Developmental cell 57, 2604-2622 e2605. 10.1016/j.devcel.2022.11.008.

      (7) Schier, A.F., and Talbot, W.S. (2005). Molecular genetics of axis formation in zebrafish. Annual review of genetics 39, 561-613. 10.1146/annurev.genet.37.110801.143752.

      (8) Barone, V., Lang, M., Krens, S.F.G., Pradhan, S.J., Shamipour, S., Sako, K., Sikora, M., Guet, C.C., and Heisenberg, C.P. (2017). An Effective Feedback Loop between Cell-Cell Contact Duration and Morphogen Signaling Determines Cell Fate. Developmental cell 43, 198-211.e112. 10.1016/j.devcel.2017.09.014.

      (9) Muller, P., Rogers, K.W., Jordan, B.M., Lee, J.S., Robson, D., Ramanathan, S., and Schier, A.F. (2012). Differential diffusivity of Nodal and Lefty underlies a reaction-diffusion patterning system. Science (New York, N.Y.) 336, 721-724. 10.1126/science.1221920.

      (10) Rogers, K.W., Lord, N.D., Gagnon, J.A., Pauli, A., Zimmerman, S., Aksel, D.C., Reyon, D., Tsai, S.Q., Joung, J.K., and Schier, A.F. (2017). Nodal patterning without Lefty inhibitory feedback is functional but fragile. eLife 6. 10.7554/eLife.28785.

      (11) Thisse, B., Wright, C.V., and Thisse, C. (2000). Activin- and Nodal-related factors control antero-posterior patterning of the zebrafish embryo. Nature 403, 425-428. 10.1038/35000200.

      (12) Eroglu, B., Wang, G., Tu, N., Sun, X., and Mivechi, N.F. (2006). Critical role of Brg1 member of the SWI/SNF chromatin remodeling complex during neurogenesis and neural crest induction in zebrafish. Developmental dynamics : an official publication of the American Association of Anatomists 235, 2722-2735. 10.1002/dvdy.20911.

      (13) Hensley, M.R., Emran, F., Bonilla, S., Zhang, L., Zhong, W., Grosu, P., Dowling, J.E., and Leung, Y.F. (2011). Cellular expression of Smarca4 (Brg1)-regulated genes in zebrafish retinas. BMC developmental biology 11, 45. 10.1186/1471-213X-11-45.

      (14) Kawamura, A., Koshida, S., Hijikata, H., Ohbayashi, A., Kondoh, H., and Takada, S. (2005). Groucho-associated transcriptional repressor ripply1 is required for proper transition from the presomitic mesoderm to somites. Developmental cell 9, 735-744. 10.1016/j.devcel.2005.09.021.

      (15) Kawamura, A., Koshida, S., and Takada, S. (2008). Activator-to-repressor conversion of T-box transcription factors by the Ripply family of Groucho/TLE-associated mediators. Mol Cell Biol 28, 3236-3244. 10.1128/MCB.01754-07.

      (16) Ross, S., Cheung, E., Petrakis, T.G., Howell, M., Kraus, W.L., and Hill, C.S. (2006). Smads orchestrate specific histone modifications and chromatin remodeling to activate transcription. EMBO J 25, 4490-4502. 10.1038/sj.emboj.7601332.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to elucidate the recruitment order and assembly of the Cdv proteins during Sulfolobus acidocaldarius archaeal cell division using a bottom-up reconstitution approach. They employed liposome-binding assays, EM, and fluorescence microscopy with in vitro reconstitution in dumbbellshaped liposomes to explore how CdvA, CdvB, and the homologues of ESCRT-III proteins (CdvB, CdvB1, and CdvB2) interact to form membrane remodeling complexes.

      The study sought to reconstitute the Cdv machinery by first analyzing their assembly as two subcomplexes: CdvA:CdvB and CdvB1:CdvB2ΔC. The authors report that CdvA binds lipid membranes only in the presence of CdvB and localizes preferentially to membrane necks. Similarly, the findings on CdvB1:CdvB2ΔC indicate that truncation of CdvB2 facilitates filament formation and enhances curvature sensitivity in interaction with CdvB1. Finally, while the authors reconstitute a quaternary CdvA:CdvB:CdvB1:CdvB2 complex and demonstrate its enrichment at membrane necks, the mechanistic details of how these complexes drive membrane remodeling by subcomplexes removal by the proteasome and/or CdvC remain speculative.

      Although the work highlights intriguing similarities with eukaryotic ESCRT-III systems and explores unique archaeal adaptations, the conclusions drawn would benefit from stronger experimental validation and a more comprehensive mechanistic framework.

      Strengths:

      The study of machinery assembly and its involvement in membrane remodeling, particularly using bottom-up reconstituted in vitro systems, presents significant challenges. This is particularly true for systems like the ESCRT-III complex, which localizes uniquely at the lumen of membrane necks prior to scission. The use of dumbbell-shaped liposomes in this study provides a promising experimental model to investigate ESCRT-III and ESCRT-III-like protein activity at membrane necks.

      The authors present intriguing evidence regarding the sequential recruitment of ESCRT-III proteins in crenarchaea-a close relative of eukaryotes. This finding suggests that the hierarchical recruitment characteristic of eukaryotic systems may predate eukaryogenesis, which is a significant and exciting contribution. However, the broader implications of these findings for membrane remodeling mechanisms remain speculative, and the study would benefit from stronger experimental validation and expanded contextualization within the field.

      We thank the Referee for his/her appreciation of our work.

      Weaknesses:

      This manuscript presents several methodological inconsistencies and lacks key controls to validate its claims. Additionally, there is insufficient information about the number of experimental repetitions, statistical analyses, and a broader discussion of the major findings in the context of open questions in the field.

      We have now added more controls, information about repetitions, and discussion.

      Reviewer #2 (Public review):

      Summary:

      The Crenarchaeal Cdv division system represents a reduced form of the universal and ubiquitous ESCRT membrane reverse-topology scission machinery, and therefore a prime candidate for synthetic and reconstitution studies. The work here represents a solid extension of previous work in the field, clarifying the order of recruitment of Cdv proteins to curved membranes.

      Strengths:

      The use of a recently developed approach to produce dumbbell-shaped liposomes (De Franceschi et al. 2022), which allowed the authors to assess recruitment of various Cdv assemblies to curved membranes or membrane necks; reconstitution of a quaternary Cdv complex at a membrane neck.

      We thank the Referee for his/her appreciation of the work.

      Weaknesses:

      The manuscript is a bit light on quantitative detail, across the various figures, and several key controls are missing (CdvA, B alone to better interpret the co-polymerisation phenotypes and establish the true order of recruitment, for example) - addressing this would make the paper much stronger. The authors could also include in the discussion a short paragraph on implications for our understanding of ESCRT function in other contexts and/or in archaeal evolution, as well as a brief exploration of the possible reasons for the discrepancy between the foci observed in their liposome assays and the large rings observed in cells - to better serve the interests of a broad audience.

      We have now added more controls, information about repetitions, and discussion.

      Reviewer #3 (Public review):

      Summary:

      In this report, De Franceschi et al. purify components of the Cdv machinery in archaeon M. sedula and probe their interactions with membrane and with one-another in vitro using two main assays - liposome flotation and fluorescent imaging of encapsulated proteins. This has the potential to add to the field by showing how the order of protein recruitment seen in cells is related to the differential capacity of individual proteins to bind membranes when alone or when combined.

      Strengths:

      Using the floatation assay, they demonstrate that CdvA and CdvB bind liposomes when combined. While CdvB1 also binds liposomes under these conditions, in the floatation assay, CdvB2 lacking its C-terminus is not efficiently recruited to membranes unless CdvAB or CdvB1 are present. The authors then employ a clever liposome assay that generates chained spherical liposomes connected by thin membrane necks, which allows them to accurately control the buffer composition inside and outside of the liposome. With this, they show that all four proteins accumulate in necks of dumbbell-shaped liposomes that mimic the shape of constricting necks in cell division. Taken altogether, these data lead them to propose that Cdv proteins are sequentially recruited to the membrane as has also been suggested by in vivo studies of ESCRT-III dependent cell division in crenarchaea.

      We thank the Referee for his/her appreciation of the work.

      Weaknesses:

      These experiments provide a good starting point for the in vitro study the interaction of Cdv system components with the membrane and their consecutive recruitment. However, several experimental controls are missing that complicate their ability to draw strong conclusions. Moreover, some results are inconsistent across the two main assays which make the findings difficult to interpret:

      (1) Missing controls.

      Various protein mixtures are assessed for their membrane-binding properties in different ways. However, it is difficult to interpret the effect of any specific protein combination, when the same experiment is not presented in a way that includes separate tests for all individual components. In this sense, the paper lacks important controls. For example, Fig 1C is missing the CdvB-only control. The authors remark that CdvB did not polymerise (data not shown) but do not comment on whether it binds membrane in their assays. In the introduction, Samson et al., 2011 is cited as a reference to show that CdvB does not bind membrane. However, here the authors are working with protein from a different organism in a different buffer, using a different membrane composition and a different assay. Given that so many variables are changing, it would be good to present how M. sedula CdvB behaves under these conditions.

      We thank the referee for raising this point. We have now added these data in Figure 1C. Indeed it turns out that CdvB from M. sedula exhibits clear membrane binding on its own in a flotation assay.

      Similarly, there is no data showing how CdvB alone or CdvA alone behave in the dumbbell liposome assay.

      Without these controls, it's impossible to say whether CdvA recruits CdvB or the other way around. The manuscript would be much stronger if such data could be added.

      We have now added these data in Figure 1E, 1F and 1G. Overall, we can confirm that CdvA binds the membrane better in the presence of CdvB (although both proteins can bind the membrane on their own). Both proteins appear to recognize the curved region of the membrane neck.

      (2) Some of the discrepancies in the data generated using different assays are not discussed.

      The authors show that CdvB2∆C binds membrane and localizes to membrane necks in the dumbbell liposome assay, but no membrane binding is detected in the flotation assay. The discrepancy between these results further highlights the need for CdvB-only and CdvA-only controls.

      We have now added these controls in Figure 1. In addition, we would like to clarify that the flotation assay and the SMS dumbbell assay serve different purposes and are not directly comparable in quantitative terms. In the flotation assay, all the protein present as input is eventually recovered and visualized. Thus, quantitative information on the proportion of the fraction of the total protein bound to lipids can be inferred from this assay. The SMS assay, in contrast, provides a very different kind of information. Because of the particular protocol required to generate dumbbells (De Franceschi, 2022), the total amount of protein in the inner buffer in dumbbells is not accurately defined, because protein that is not correctly reconstituted (e.g. which aggregates while still in the droplet phase) will interfere with vesicle generation, with the result that dumbbell with such aggregates is generally not formed in the first place. This renders it impossible to draw any quantitative conclusions about the proportion of the sample bound to lipids. The SMS is therefore not directly comparable to the flotation assay, and it is rather complementary to it. Indeed, the purpose of the SMS is to provide information about curvature selectivity of the protein.

      (3) Validation of the liposome assay.

      The experimental setup to create dumbbell-shaped liposomes seems great and is a clever novel approach pioneered by the team. Not only can the authors manipulate liposome shape, they also state that this allows them to accurately control the species present on the inside and outside of the liposome. Interpreting the results of the liposome assay, however, depends on the geometry being correct. To make this clearer, it would seem important to include controls to prove that all the protein imaged at membrane necks lie on the inside of liposomes. In the images in SFig3 there appears to be protein outside of the liposome. It would also be helpful to present data to show test whether the necks are open, as suggested in the paper, by using FRAP or some other related technique.

      We thank the Referee for his/her appreciation. The proteins are encapsulated inside the liposomes, not outside of them. While Figure S3 might give the appearance that there is some protein outside, this is actually just an imaging artifact. Author response image 1 (below) explains this: When the membrane and protein channel are shown separately, it is clear that the protein cluster that appeared to be ‘outside’ actually colocalizes with an extra small dumbbell lobe (yellow arrowhead). The protein appeared to be outside of it because (1) the protein fluorescent signal is stronger than the signal from the membrane, and (2) there is a certain time delay in the acquisition of the two channels (0.5-1 second), thus the membrane may have slightly shifted out of focus when the fluorescence was being acquired. We are confident that the protein is inside in these dumbbells because the procedure for preparing the dumbbells requires extensive emulsification by pipetting, which requires ≈ 1 minute. This time is more than sufficient for proteins with high affinity for the membrane, like ESCRT and Cdv, to bind the membrane. For an example of how fast binding under confinement can be, please see movie 2 from this paper: De Franceschi N, Alqabandi M, Miguet N, Caillat C, Mangenot S, Weissenhorn W, Bassereau P. The ESCRT protein CHMP2B acts as a diffusion barrier on reconstituted membrane necks. J Cell Sci. 2018 Aug 3;132(4):jcs217968.

      Moreover, in many instances, we observed that the protein is inside because, by increasing the gain in the images post-acquisition, a clear protein signal appear in the lumen (see Author response image 2).

      Author response image 1.

      Separate channels showing colocalization of protein and lipids (adapted from Figure S3). The zoom-in shows separate channels, highlighting that the CdvB2 cluster that seems to be ‘outside the dumbbell’ actually colocalizes with the small terminal lobe of the dumbbell, indicating that the protein is encapsulated within that lobe.

      Author response image 2.

      Residual protein present inside lumen of dumbbells as visualized by increasing the brightness post-acquisition.

      We are not sure what the referee means by “test whether the necks are open, as suggested in the paper”. We are confident that the lobes of dumbbells originated from a single floppy vesicle, and were therefore mutually connected with an open neck (at least at the onset of the experiment). We have performed extensive FRAP assays on dumbbells in previous papers (De Franceschi et al., ACS nano 2022 and De Franceschi et al., Nature Nanotech 2024) which unequivocally proved that these chains of dumbbells are connected with open necks. We now also performed a few FRAP assay with reconstituted Cdv proteins, which confirmed this point. We have added a movie of such an experiment to the manuscript (Movie 1).

      Investigating whether the necks are open or closed after Cdv reconstitution is indeed a very relevant question, that could be rephrased as “verify whether Cdv proteins or their combination can induce membrane scission”. This is however beyond the scope of this manuscript, as the current work merely addressed the question of hierarchical recruitment of Cdv proteins at the membrane. We plan to examine this in future work.

      (4) Quantification of results from the liposome assay.

      The paper would be strengthened by the inclusion of more quantitative data relating to the liposome assay. Firstly, only a single field of view is shown for each condition. Because of this, the reader cannot know whether this is a representative image, or an outlier? Can the authors do some quantification of the data to demonstrate this? The line scan profiles in the supplemental figures would be an example of this, but again in these Figures only a single image is analyzed.

      The images that we showed are indeed representative. The dumbbells that are generated by the SMS approach contain an “internal control”: in each dumbbell, the protein has the option of localizing at the neck or localizing elsewhere in the region of flat membrane. We see consistently that Cdv proteins have a strong preference for localizing at the neck.

      We would recommend that the authors present quantitative data to show the extent of co-localization at the necks in each case. They also need a metric to report instances in which protein is not seen at the neck, e.g. CdvB2 but not CdvB1 in Fig2I, which rules out a simple curvature preference for CdvB2 as stated in line 182.

      While the request for better quantitation is reasonable, this would require carrying out very significant new experiments at the microscope, which is rendered near-impossible since both first authors left the lab on to new positions.

      Secondly, the authors state that they see CdvB2∆C recruited to the membrane by CdvB1 (lines 184-187, Fig 2I). However, this simple conclusion is not borne out in the data. Inspecting the CdvB2∆C panels of Fig 2I, Fig3C, and Fig3D, CdvB2∆C signal can be seen at positions which don't colocalize with other proteins. The authors also observe CdvB2∆C localizing to membrane necks by itself (Fig 2E). Therefore, while CdvB1 and CdvB2∆C colocalize in the flotation assay, there is no strong evidence for CdvB2∆C recruitment by CdvB1 in dumbbells. This is further underscored by the observation that in the presented data, all Cdv proteins always appear to localize at dumbbell necks, irrespective of what other components are present inside the liposome. Although one nice control is presented (ZipA), this suggests that more work is required to be sure that the proteins are behaving properly in this assay. For example, if membrane binding surfaces of Cdv proteins are mutated, does this lead to the accumulation of proteins in the bulk of the liposome as expected?

      In the particular example of Figure 2I, it indeed appears that there are some clusters of CdvB2ΔC that do not contain CdvB1 (we indicated them in Author response image 3 by red arrowheads), while the yellow arrowheads indicate clusters that contain both proteins. It can be clearly seen that the clusters that do contain both proteins (yellow arrows) are localized at necks, while those that only contain CdvB2ΔC (red arrows) are not localized at necks. This is no coincidence. The clusters indicated by the red arrow do contain CdvB1. However, these clusters rapidly diffuse on the membrane plane because they are not fixed at the neck: therefore, they constantly shift in and out of focus. Because there is a time delay in the acquisition of each channel (between 0.5 and 1 second), these cluster were in focus when the CdvB2ΔC signal was being acquired, but sifted out of focus when the CdvB1 signal was being acquired. This implies that the clusters indicated by the yellow arrowheads are stably localized at necks, which is precisely the point we wished to make with this experiment: because Cdv proteins have an affinity for curved geometry, they preferentially and stably localize at necks. Why don’t all the clusters localize at necks then? We estimate that the simple answer is that, in this particular case, there are more clusters than there are necks, so some of the clusters must necessarily localize somewhere else.

      Author response image 3.

      Current Figure 2H, where clusters that are double-positive for both CdvB1 and CdvB2ΔC are indicated by yellow arrowheads, while cluster that apparently only contain CdvB2ΔC are indicated by red arrowheads. It is observed that all the double-positive clusters are localized at necks.

      (5) Rings.

      The authors should comment on why they never observe large Cdv rings in their experiments. In crenarchaeal cell division, CdvA and CdvB have been observed to form large rings in the middle of the 1 micron cell, before constriction. Only in the later stages of division are the ESCRTs localized to the constricting neck, at a time when CdvA is no longer present in the ring. Therefore, if the in vitro assay used by the authors really recapitulated the biology, one would expect to see large CdvAB rings in Figs 1EF. This is ignored in the model. In the proposed model of ring assembly (line 252), CdvAB ring formation is mentioned, but authors do not discuss the fact that they do not observe CdvAB rings - only foci at membrane necks. The discussion section would benefit from the authors commenting on this.

      The referee is correct: it is intriguing that we don’t see micron-sized rings for CdvA and CdvB. We do note that our EM data (Fig.S1) show that CdvA in its own can form rings of about 100-200nm diameter, well below the diffraction limit, that could well correspond to the foci that we optically resolve in Figure 1. We now added a brief comment on this to the manuscript on lines 256-264.

      (6) Stoichiometry

      It is not clear why 100% of the visible CdvA and 100% of the the visible CdvB are shifted to the lipid fraction in 1C. Perhaps this is a matter of quantification. Can the authors comment on the stoichiometry here?

      We agree that this was unclear. Since that particular gel was stained by coumassie, the quantitative signals might be unreliable, and hence we have repeated this experiment using fluorescently labelled proteins, which show indeed a less extreme distribution. This was also done to make the data more uniform, as requested by the referees.

      (7) Significance of quantification of MBP-tagged filaments.

      Authors use tagging and removal of MBP as a convenient, controllable system to trigger polymerisation of various Cdv proteins. However, it is unclear what is the value and significance of reporting the width and length of the short linear filaments that are formed by the MBP-tagged proteins. Presumably they are artefactual assemblies generated by the presence of the tag?

      Providing a measure of the changes induced by MBP removal, in fact, validates that this actually has an effect. But perhaps this places too much emphasis on the short filaments. We now opted for a compromise, removing the quantification of the width and length of short filaments formed by MBPtagged protein from the text, but keeping the supplementary figure showing their distribution as compared to the other filaments (Figure S2E, SF).

      Similar Figure 2C doesn't seem a useful addition to the paper.

      We removed panel 2C, and now merely report these values in the text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would suggest the authors perform a deeper discussion about their findings, such as what are the evolutionary implications, how they think lipids from these archaea may affect the recruitment process,...

      Because there is no exact homology between Archaea Cdv proteins and Eukaryotic ESCRT-III proteins, we do not feel our work brings new evolutionary implications beyond what we already state in the manuscript. We also dis not perform experiments using Archaea lipids, thus we would rather not speculate on how they may potentially affect the recruitment of Cdv proteins.

      In general, the manuscript lacks information regarding some scale bars, number of experimental repetitions (n or N), statistical analysis when needed, information about protein concentrations used in their assays.

      We have now added this information in the manuscript.

      Below, I provide a list of comments that I think the authors should address to improve the manuscript:

      (1) Line 113-114: The authors test protein-membrane interactions using flotation assays with positively curved SUV membranes but encapsulate proteins in dumbbell-shaped liposomes with negative curvature at the connecting necks. Might the use of membranes with opposite curvatures affect the recruitment process? Since the proteins are fluorescently labeled, I suggest testing recruitment using flat giant unilamellar vesicles or supported lipid bilayers (with zero curvature) to validate their findings.

      We thank the referee for this suggestion. Please do note that we are not claiming in our paper that Cdv proteins recognize negative curvature. We merely observe that they localize at necks. The neck of a dumbbell exhibits the so-called “catenoid” geometry, which is characterized by having both positive and negative curvature.

      Experimentally, on the SUVs, we now realize there was a mistake in the method section: In the flotation assay we in fact used multilamellar vesicles, not SUVs, precisely for the reason mentioned by the referee. We apologize for the oversight and have now corrected this in the methods. Multilamellar vesicles are not characterized by a strong positive curvature as SUVs do, but we do agree that they likely don’t have negative curvature there either. Because of the heterogeneous nature of the multilamellar vesicles, they provide a binding assay that was rather independent of the curvature. Complementary to the flotation assay, the SMS approach was employed to reveal the curvature preference of proteins.

      Finally, we performed the experiment on large GUVs suggested by the referee using CdvB as an example, but this turned out to be inconclusive because the protein forms clusters: these clusters may be creating local curvature at the nanometer scale, which cannot be resolved by optical microscopy (Author response image 4). This is quite typical for proteins that recognize curvature (cf. for instance: De Franceschi N, Alqabandi M, Miguet N, Caillat C, Mangenot S, Weissenhorn W, Bassereau P. The ESCRT protein CHMP2B acts as a diffusion barrier on reconstituted membrane necks. J Cell Sci. 2018 Aug 3;132(4):jcs217968.)

      Author response image 4.

      Fluorescently labelled CdvB bound to giant unilamellar vesicle. The protein was added in the outer buffer. CdvB forms distinct clusters, which may generate a local region of high membrane curvature.

      (2) Line 138-139: How is His-ZipA binding the membrane? Wouldn't Ni<sup>2+</sup>-NTA lipids be required? If not, how is the binding achieved?

      Indeed, NTA-lipids were present. This is now stated both in the legend and in the methods.

      (3) In the encapsulated protein assays, why does the luminal fluorescence intensity of the encapsulated protein sometimes appear similar to the bulk fluorescence signal? Since only a small fraction of the protein assembles at membrane necks, shouldn't the luminal pool of unbound protein show higher fluorescence intensity inside the liposomes?

      We thank the referee for raising this point and giving us the opportunity to explain this. The reason is that Cdv proteins have a very high affinity for the neck, and when they cluster at the neck the fluorescence intensity of the cluster is many times higher than the background fluorescence. Because we were interested in imaging the clusters and avoiding overexposing them, we adjusted the imaging conditions accordingly, with the result that the fluorescence from both the lumen and the bulk is at very low level.

      By choosing different imaging conditions, however, it can be actually seen that the signal inside the lumen is clearly higher than the bulk: this can be seen for instance in Author response image 2, where the brightness has been properly adjusted.

      (4) Line 184-185: In Fig. 2I, some CdvB2ΔC puncta seem independent of CdvB1 and are not localized at membrane necks. How many such puncta exist? For example, in the provided micrograph, 2 out of 5 clusters are independent of CdvB1. This proportion is significant. Could the authors quantify the prevalence of these structures and discuss why they form?

      We thank the referee for giving us the opportunity to explain this apparent discrepancy. We’ll like to stress the fact that CdvB2ΔC and CdvB1 form an obligate heterodimer: in all our experiments, without exception, we find that they form a strong complex when we mix the two proteins. This is true both in dumbbells and in flotation assays.

      In the particular example of Figure 2I, it indeed appears that there are some clusters of CdvB2ΔC that do not contain CdvB1 (we indicated them in Author response image 3 by red arrowheads), while the yellow arrowheads indicate clusters that contain both proteins. It can be clearly seen that the clusters that do contain both proteins (yellow arrows) are localized at necks, while those that only contain CdvB2ΔC (red arrows) are not localized at necks. This is no coincidence. The clusters indicated by the red arrow do contain CdvB1. However, these clusters rapidly diffuse on the membrane plane because they are not fixed at the neck: therefore, they constantly shift in and out of focus. Because there is a time delay in the acquisition of each channel (between 0.5 and 1 second), these cluster were in focus when the CdvB2ΔC signal was being acquired, but sifted out of focus when the CdvB1 signal was being acquired. This implies that the clusters indicated by the yellow arrowheads are stably localized at necks, which is precisely the point we wished to make with this experiment: because Cdv proteins have affinity for curved geometry, they preferentially and stably localize at necks. Why don’t all the clusters localize at necks then?

      (5) Figure 1E and 1F: Why do lipids accumulate and colocalize with the proteins? How can the authors confirm lumen connectivity between vesicles? Performing FRAP assays could validate protein localization and enrichment at the lumen of the membrane necks.

      At first sight, indeed some lipid enrichment seems to be observed at the neck between lobes of dumbbells.

      This is, however, an imaging artifact due to the fact that the neck is diffraction limited. As shown in the Author response image 5, we are acquiring the membrane signal from both lobes at the neck region, and therefore the signal is roughly double, hence the apparent lipid enrichment.

      Author response image 5.

      Schematic illustrating that the neck between two lobes is smaller than the diffraction limit of optical microscopy (the size of a typical pixel is indicated by the green square). Because of this technical limitation, the fluorescence intensity of the membrane at the neck is twice that of a single membrane.

      The referee is correct in pointing out that these images do not prove that the lobes are connected, and that FRAP assays is the only way to prove this point. However, in previous papers we have confirmed extensively that in chains of dumbbells the lobes are connected:

      - De Franceschi N, Pezeshkian W, Fragasso A, Bruininks BMH, Tsai S, Marrink SJ, Dekker C. Synthetic Membrane Shaper for Controlled Liposome Deformation. ACS Nano. 2022 Nov 28;17(2):966–78. doi: 10.1021/acsnano.2c06125.

      - De Franceschi N, Barth R, Meindlhumer S, Fragasso A, Dekker C. Dynamin A as a one-component division machinery for synthetic cells. Nat Nanotechnol. 2024 Jan;19(1):70-76. doi: 10.1038/s41565023-01510-3.

      Random sticking of liposomes would also generate clusters of vesicles, not linear chains. We now provide also a Movie (Movie 1) supporting this point.

      Investigating whether the necks are open or closed after Cdv reconstitution is indeed a very relevant question, that could be rephrased as “verify whether Cdv proteins or their combination can induce membrane scission”. This is however beyond the scope of this manuscript, as the current work merely addressed the question of hierarchical recruitment of Cdv proteins at the membrane. We plan to examine this in future work.

      (6) Why didn't the authors use the same lipid composition, particularly the same proportion of negatively charged lipids, on the SUVs of the flotation assays and on the dumbbell-shaped liposomes?

      In flotation assays, it is typical to use a relatively large proportion of negatively charged lipids, to promote protein binding. This is because the aim is to maximize membrane coverage by the protein. The SMS procedure to generate dumbbell-shaped GUVs is completely different, however. Rather than covering the membrane with protein, the idea is to reduce the amount of protein to a minimum, so that any curvature preference can be best visualized. This is e.g. routinely done in tube pulling experiments, for the same reason (See for instance Prévost C, Zhao H, Manzi J, Lemichez E, Lappalainen P, Callan-Jones A, Bassereau P. IRSp53 senses negative membrane curvature and phase separates along membrane tubules. Nat Commun. 2015 Oct 15;6:8529. doi: 10.1038/ncomms9529).

      (7) Line 117-119: The suggestion that polymer formation between CdvA and CdvB facilitates membrane recruitment is intriguing. However, fluorescence microscopy experiments could better elucidate whether there is sequential recruitment of CdvB followed by CdvA, or if these proteins form a heteropolymer composite for membrane binding. Can CdvB bind membranes independently, or does this require synergy between CdvA and CdvB.

      We thank the referee for prompting us to perform this experiment. As we now show in Figure 1C, CdvB indeed is able to bind the membrane independently of CdvA. Whether this happens sequentially or simultaneously is an interesting question, but one that is impossible to address with either the SMS or the flotation assay, because in both cases we can only observe the endpoint of the recruitment.

      We would also like to clarify one specific experimental detail. Perhaps unsurprisingly, the results from the flotation assay are dependent on the way the assay is performed. In particular, we observed that the same protein can exhibit a different binding profile depending on whether it is being loaded either at the top or at the bottom of the gradient. This can be seen in Author response image 6. This is counterintuitive, since once the equilibrium is reached, the result should only depend on the density of the sample. We performed an overnight centrifugation (> 16 hours) on a short tube (< 3 cm tall), thus equilibrium is being reached (which is corroborated by the fact that CdvB1 and CdvB2 can float to the top of the gradient within this timespan, as shown in Figure 2C, 2E, 2G). We ascribe the difference between top and bottom loading to the fact that, when the sample is loaded at the bottom, it has to be mixed with a concentrated sucrose solution, while in the case of loading from the top, this is not done.

      In literature, both loading from top and from bottom have been used:

      - Lata S, Schoehn G, Jain A, Pires R, Piehler J, Gottlinger HG, Weissenhorn W. Helical structures of ESCRTIII are disassembled by VPS4. Science. 2008 Sep 5;321(5894):1354-7. doi: 10.1126/science.1161070

      - Moriscot C, Gribaldo S, Jault JM, Krupovic M, Arnaud J, Jamin M, Schoehn G, Forterre P, Weissenhorn W, Renesto P. Crenarchaeal CdvA forms double-helical filaments containing DNA and interacts with ESCRT-III-like CdvB. PLoS One. 2011;6(7):e21921. doi: 10.1371/journal.pone.0021921.

      - Senju Y, Lappalainen P, Zhao H. Liposome Co-sedimentation and Co-flotation Assays to Study LipidProtein Interactions. Methods Mol Biol. 2021;2251:195-204. doi: 10.1007/978-1-0716-1142-5_14. In performing the flotation assay for CdvB1 and CdvB2ΔC, or when using all 4 proteins together, we loaded the sample at the bottom, and we could detect reproducible binding to liposomes (Figures 2D, 2F, 2H, 3A). However, CdvB does not bind the membrane when loaded at the bottom. Thus, for the experiments shown in figure 1C, we loaded the proteins at the top. This experimental setup allowed us to highlight that CdvB indeed induce a stronger interaction between CdvA and the membrane.

      Author response image 6.

      CdvB binding to multilamellar vesicles in a flotation assay. In the left panel, the sample was loaded at the top of the sucrose gradient; in the right panel it was loaded at the bottom.

      (8) Line 165-173: The authors claim that filament curvature differs between CdvB2ΔC alone and the CdvB1:CdvB2ΔC complex. Are these differences statistically significant? What is the sample size (N)? Furthermore, how do the authors confirm interactions between these proteins in the absence of membranes based solely on EM micrographs?

      We can confirm that the filaments are composed by both proteins, because the filaments have different curvature when both proteins are present. However, as requested by referee 3, point (7), we removed the quantification of curvature from panel 2C. We report the N number in the text.

      (9) Line 121-123: Are the authors referring to positive or negative membrane curvatures? The cited literature suggests ESCRT-III proteins either lack curvature preferences (e.g., Snf7, CHMP4B) or prefer high positive curvature (e.g., late ESCRT-III subunits). This is confusing since the authors later test recruitment to negatively curved necks.

      We do not claim that Cdv proteins prefer positive or negative curvature, because the necks present in dumbbells have a catenoid geometry, which include both positive and negative curvature. We have now clarified this in the discussion.

      (10) Since the conclusions rely on the oligomeric state of the proteins, providing SEC-MALS spectra to show the protein oligomeric state right after the purification would strengthen the claims.

      While such SEC-MALDI experiments may be interesting, practical implementation of this is not possible since both first authors left the lab on to new positions.

      (11) Line 157-160: Suppl. Fig. 2 shows only a single EM micrograph of a small filament. Could the authors provide lower magnification images showing more filaments?

      As requested by Referee 3, point (7), we have toned down the importance of these short filaments.

      Also, why are the sample sizes for filament length (N=161) and width (N=129) different?

      Protein filaments formed by Cdv tend to stick to each other side by side, so that for some filaments the width could not be accurately assessed, and accordingly those were removed from the analysis.

      (12) The introduction states that CdvA binds membranes while CdvB does not. However, the results suggest CdvB facilitates membrane binding, helping CdvA attach. This discrepancy needs further explanation.

      We thank the referee for raising this point. We have now performed additional experiments (both SMS assay and flotation assays) showing that indeed CdvB from M. sedula is (unlike CdvB from Sulfolobus) able to bind the membrane on its own (Figure 1C, 1F).

      Reviewer #2 (Recommendations for the authors):

      Best practice would be to show single fluorescence channels in grayscale or inverted grayscale, retaining pseudocolouring only for the merged multichannel image.

      We decided to retain and standardize the colors, both for gels and for microscopy images, in order to have the same color-code for each protein. We believe this improves readability, and this was also a request from Referee 3. Thus, throughout the manuscript, CdvA is in grayscale, CdvB in yellow, CdvB1 in green, CdvB2ΔC in cyan and the membrane in magenta.

      It would be great to include a quantification of liposome curvature vs focal intensity of the various Cdv components - across figures.

      Quantification of liposome curvature at the neck can be done (De Franceschi et al., Nature Nanotech. 2024). However, in practice, this requires transferring of the sample post-preparation into a new chamber in order to increase the signal-to-noise ratio of the encapsulated dye, a procedure that drastically reduces the yield of dumbbells. The very sizeable amount of work required to obtain reliable measurements, especially considering all the proteins and protein combinations used in this study, indicates that this represents a project in itself, which goes well beyond the scope of this manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) We would encourage the authors to consider including the length of the scale bar next to the scale bar in each image and not in the figure description. This would greatly aid in clarity and interpretation of figures.

      We have now written the length of the scale bar in the figures.

      (2) In a similar vein, could the authors consider labeling panels throughout the manuscript, writing that sample is being presented? This goes mainly for the negative stain and the dumbbell fluorescence images, as having to continuously consult the figure legend again hinders clarity.

      We have now labelled the EM images as requested by the referee.

      (3) Lines 254-256: would the statement hold not only for CdvB2∆C, but for all imaged proteins? They all seem to localize to membrane necks, presumably favoring membrane binding to a specific membrane topology.

      We agree with the referee, and changed the phrasing accordingly.

      (4) CdvB2∆C construct - presumably this was a truncation of helix 5 of the ESCRT-III domain? Figure 1A shows that the ESCRT-III domain spans residues 34-170 and therefore implies that all five ESCRT-III helices (which make up the ESCRT-III domain) are present in the C-terminal truncation. Could the authors clarify?

      Indeed, the truncation was done at residue 170.

      (5) Results of the liposome flotation assays are presented inconsistently across the three figures (Figs 1C, 2DFH, and 3A). This makes it more difficult than it needs to be to interpret and compare results. Could the authors consider presenting the three gels in a more similar, standardized way across the three figures?

      To improve readability, we now standardized the colors, both for gels and for microscopy images, in order to have the same color-code for each protein. Thus, throughout the manuscript, CdvA is in grayscale, CdvB in yellow, CdvB1 in green, CdvB2ΔC in cyan and the membrane in magenta.

      (6) From the data presented in Fig 1EF, it cannot be concluded whether CdvB and CdvA colocalize, as only one protein is labelled. Is there a technical reason for this?

      We have now repeated the same experiment by having both proteins labelled, confirming that there is co-localization at the neck (Figure 1G).

      (7) Fig 2C: is the difference between the two samples significant

      As requested by Referee 3, we have removed Figure 2C.

      (8) Fig 2I is missing a 'merged' panel.

      We have now added the merged panel.

      (9) The fluorescence intensity plots in Supp Figs 1C and 3C would be easier to interpret if the lipid and protein signal would be plotted on the same plot (say, with normalized fluorescence intensity)

      It is not immediately obvious to us what the signal should be normalized to. What we wished to convey with these plots was that the intensity of proteins spikes at the neck region. In an attempt to improve clarity, we have now aligned the plots vertically, and highlighted the position of the neck.

      (10) CdvA should have a capital "A" in Figure 3A, panel 3.

      We have now corrected this.

      (11) The discussion doesn't comment on the need to truncate CdvB2.

      This is explained in the result session.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Kroeg et al. describe a novel method for 2D culture human induced pluripotent stem cells (hiPSCs) to form cortical tissue in a multiwell format. The method claims to offer a significant advancement over existing developmental models. Their approach allows them to generate cultures with precise, reproducible dimensions and structure with a single rosette; consistent geometry; incorporating multiple neuronal and glial cell types (cellular diversity); avoiding the necrotic core (often seen in free-floating models due to limited nutrient and oxygen diffusion). The researchers demonstrate the method's capacity for long-term culture, exceeding ten months, and show the formation of mature dendritic spines and considerable neuronal activity. The method aims to tackle multiple key problems of in vitro neural cultures: reproducibility, diversity, topological consistency, and electrophysiological activity. The authors suggest their potential in high-throughput screening and neurotoxicological studies.

      Strengths:

      The main advances in the paper seem to be: The culture developed by the authors appears to have optimal conditions for neural differentiation, lineage diversification, and long-term culture beyond 300 days. These seem to me as a major strength of the paper and an important contribution to the field. The authors present solid evidence about the high cell type diversity present in their cultures. It is a major point and therefore it could be better compared to the state of the art. I commend the authors for using three different IPS lines, this is a very important part of their proof. The staining and imaging quality of the manuscript is of excellent quality.

      We thank the reviewer for the positive comments on the potential of our novel platform to address key problems of in vitro neural culture, highlighting the longevity and reproducibility of the method across multiple cell lines.

      Weaknesses:

      (1) The title is misleading: The presented cultures appear not to be organoids, but 2D neural cultures, with an insufficiently described intermediate EB stage. For nomenclature, see: doi: 10.1038/s41586-022-05219-6. Should the tissue develop considerable 3D depth, it would suffer from the same limited nutrient supply as 3D models - as the authors point out in their introduction.

      We appreciate the opportunity to clarify this point. We respectfully disagree that the cultures do not meet the consensus definition of an organoid. In fact, a direct quote from the seminal nomenclature paper referenced by the reviewer states: “We define organoids as in vitro-generated cellular systems that emerge by self-organization, include multiple cell types, and exhibit some cytoarchitectural and functional features reminiscent of an organ or organ region. Organoids can be generated as 3D cultures or by a combination of 3D and 2D approaches (also known as 2.5D) that can develop and mature over long periods of time (months to years).” (Pasca et al, 2022 doi10.1038/s41586-022-05219-6). Therefore, while many organoid types indeed have a more spherical or globular 3D shape, the term organoid also applies to semi-3D or nonglobular adherent organoids, such as renal (Czerniecki et al 2018, doi.org/10.1016/j.stem.2018.04.022) and gastrointestinal organoids (Kakni et al 2022, doi.org/10.1016/j.tibtech.2022.01.006). Accordingly, the adherent cortical organoids described in the manuscript exhibit self-organization to single radial structures consisting of multiple cell layers in the z-axis, reaching ~200um thickness (therefore remaining within the limits for sufficient nutrient supply), with consistent cytoarchitectural topology and electrophysiological activity, and therefore meet the consensus definition of an organoid.

      (2) The method therefore should be compared to state-of-the-art (well-based or not) 2D cultures, which seems to be somewhat overlooked in the paper, therefore making it hard to assess what the advance is that is presented by this work.

      It was not our intention to benchmark this model quantitatively against other culture systems. Rather, we have attempted to characterize the opportunities and limitations of this approach, with a qualitative contrast to other culture methods. Compared to stateof-the-art 2D neural network cultures, adherent cortical organoids provide distinct advantages in:

      (1) Higher order self-organized structure formation, including segregation of deeper and upper cortical layers.

      (2) Longevity: adherent cortical organoids can be successfully kept in culture for at least 1 year, whereas 2D cultures typically deteriorate after 8-12 weeks.

      (3) Maturity, including the formation of dendritic mushroom spines and robust electrophysiological activity.

      (4) Cell type diversity including a more physiological ratio of inhibitory and excitatory neurons (10% GAD67+/NeuN+ neurons in adherent cortical organoids, vs 1% in 2D neural networks), and the emergence of oligodendrocyte lineage cells.

      On the other hand, limitations of adherent cortical organoids compared to 2D neural network cultures include:

      (1) Culture times for organoids are much longer than for 2D cultures and the method can therefore be more laborious and more expensive.

      (2) Whole cell patch clamping is not easily feasible in adherent cortical organoids because of the restrictive geometry of 384-well plates.

      (3) Reproducibility is prominently claimed throughout the manuscript. However, it is challenging to assess this claim based on the data presented, which mostly contain single frames of unquantified, high-resolution images. There are almost no systematic quantifications presented. The ones present (Figure S1D, Figure 4) show very large variability. However, the authors show sets of images across wells (Figure S1B, Figure S3) which hint that in some important aspects, the culture seems reproducible and robust.

      We made considerable efforts to establish quantitative metrics to assess reproducibility. We applied a quantitative scoring system of single radial structures at different time points for multiple batches of all three lines as indicated in Figure S1C. This figure represents a comprehensive dataset in which each dot represents the average of a different batch of organoids containing 10-40 organoids per batch. To emphasize this, we have adapted the graph to better reflect the breadth of the dataset. Additional quantifications are given in Figure S2 for progenitor and layer markers for Line 1 and in Figure 2 for interneurons across all three lines, showing relatively low variability. That being said, we acknowledge the reviewer’s concerns and have modified the text to reduce the emphasis of this point, pending more extensive data addressing reproducibility across an even broader range of parameters.

      (4) What is in the middle? All images show markers in cells present around the center. The center however seems to be a dense lump of cells based on DAPI staining. What is the identity of these cells? Do these cells persist throughout the protocol? Do they divide? Until when? Addressing this prominent cell population is currently lacking.

      A more comprehensive characterization of the cells in the center remains a significant challenge due to the high cell density hindering antibody penetration. However, dyebased staining methods such as DAPI and the LIVE/DEAD panel confirm a predominance of intact nuclei with very minimal cell death. The limited available data suggest that a substantial proportion of the cells in the center are proliferative neural progenitors, indicated by immunolabeling for SOX2 (Figure 2A,D;Figure S4C). Furthermore, we are currently optimizing the conditions to perform single cell / nuclear RNA sequencing to further characterize the cellular composition of the organoids.

      (5) This manuscript proposes a new method of 2D neural culture. However, the description and representation of the method are currently insufficient. (a) The results section would benefit from a clear and concise, but step-by-step overview of the protocol. The current description refers to an earlier paper and appears to skip over some key steps. This section would benefit from being completely rewritten. This is not a replacement for a clear methods section, but a section that allows readers to clearly interpret results presented later.

      We have revised the manuscript to include a more detailed step-by-step overview of the protocol.

      (b) Along the same lines, the graphical abstract should be much more detailed. It should contain the time frames and the media used at the different stages of the protocol, seeding numbers, etc.

      As suggested, we have adapted the graphical abstract to include more detail.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, van der Kroeg et al have developed a method for creating 3D cortical organoids using iPSC-derived neural progenitor cells in 384-well plates, thus scaling down the neural organoids to adherent culture and a smaller format that is amenable to high throughput cultivation. These adherent cortical organoids, measuring 3 x 3 x 0.2 mm, self-organize over eight weeks and include multiple neuronal subtypes, astrocytes, and oligodendrocyte lineage cells.

      Strengths:

      (1) The organoids can be cultured for up to 10 months, exhibiting mature dendritic spines, axonal myelination, and robust neuronal activity.

      (2) Unlike free-floating organoids, these do not develop necrotic cores, making them ideal for high-throughput drug discovery, neurotoxicological screening, and brain disorder studies.

      (3) The method addresses the technical challenge of achieving higher-order neural complexity with reduced heterogeneity and the issue of necrosis in larger organoids. The method presents a technical advance in organoid culture.

      (4) The method has been demonstrated with multiple cell lines which is a strength.

      (5) The manuscript provides high-quality immunostaining for multiple markers.

      We appreciate the reviewer’s acknowledgement of the strengths of this novel platform as a technical advance in organoid culture that reduces heterogeneity and shows potential for higher throughput experiments.

      Weaknesses:

      (1) Direct head-to-head comparison with standard organoid culture seems to be missing and may be valuable for benchmarking, ie what can be done with the new method that cannot be done with standard culture and vice versa, ie what are the aspects in which new method could be inferior to the standard.

      In our opinion, it would be extremely difficult to directly compare methods. Most notably, whole brain organoids grow to large and irregular globular shapes, while adherent cortical organoids have a more standardized shape confined by the geometry of a 384well. Moreover, it was not our intention to benchmark this model quantitatively against other culture systems. Rather, we have attempted to characterize the opportunities and limitations of this approach, with a qualitative contrast to other culture methods, as addressed in response to comment 2 of Reviewer 1 above.

      (2) It would be important to further benchmark the throughput, ie what is the success rate in filling and successfully growing the organoids in the entire 384 well plate?

      Figure S1 shows the success rate of organoid formation and stability of the organoid structures over time. In addition, we have added the number of wells that were filled per plate.

      (3) For each NPC line an optimal seeding density was estimated based on the proliferation rate of that NPC line and via visual observation after 6 weeks of culture. It would be important to delineate this protocol in more robust terms, in order to enable reproducibility with different cell lines and amongst the labs.

      Figure S1 provides the relationship between proliferation rate and seeding density, allowing estimation of seeding densities based on the proliferation rate of the NPCs. However, we appreciate the reviewers' feedback and have modified the methods to provide more detail.

      Reviewer #3 (Public review):

      Summary:

      Kroeg et al. have introduced a novel method to produce 3D cortical layer formation in hiPSC-derived models, revealing a remarkably consistent topography within compact dimensions. This technique involves seeding frontal cortex-patterned iPSC-derived neural progenitor cells in 384-well plates, triggering the spontaneous assembly of adherent cortical organoids consisting of various neuronal subtypes, astrocytes, and oligodendrocyte lineage cells.

      Strengths:

      Compared to existing brain organoid models, these adherent cortical organoids demonstrate enhanced reproducibility and cell viability during prolonged culture, thereby providing versatile opportunities for high-throughput drug discovery, neurotoxicological screening, and the investigation of brain disorder pathophysiology. This is an important and timely issue that needs to be addressed to improve the current brain organoid systems.

      We thank the reviewer for highlighting the strengths of our novel platform. We appreciate that all three reviewers agree that the adherent cortical organoids presented in this manuscript reliably demonstrate increased reproducibility and longevity. They also commend its potential for higher throughput drug discovery and neurotoxicological/phenotype screening purposes.

      Weaknesses:

      While the authors have provided significant data supporting this claim, several aspects necessitate further characterization and clarification. Mainly, highlighting the consistency of differentiation across different cell lines and standardizing functional outputs are crucial elements to emphasize the future broad potential of this new organoid system for large-scale pharmacological screening.

      We appreciate the feedback and have added more detail on consistency and standardization of functional outputs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points

      (1) As the preprint is officially part of the eLife review, I have to remark that the preprint which is made available on bioarxiv, suffers from some serious compatibility or format problem: one cannot highlight sentences as in a regular PDF and when trying to copypaste sentences from it jumbled characters are copied to the clipboard.

      The updated version of the paper on bioRxiv should not suffer from these compatibility issues.

      (2) Since the paper is presenting a new method it should briefly describe how each step, including the hiPSC culture was done, the reference to an earlier publication in this case is not sufficient, and this practice is generally best to avoid for methods papers.

      Each step in the culturing process has now been described in the methods.

      (3) The EB stage is insufficiently described. The "2D - 3D - 2D" transitions should be clearly explained.

      The methods section has been rewritten and expanded to include these processes in more detail.

      (4) Is there one FACS sorting in the protocol, or multiple (additional at IPS culture)? What markers each? What is the motivation for sorting and purifying the neural progenitors? Was the culture impure? What was purity? What cell types are expected after sorting, and what is removed?

      Only one FACS sorting step is performed at the NPC stage. This was added as an improvement to our original neural network protocol (Günhanlar et al 2018) to ensure consistency over different hiPSC source cell lines that can yield variable amounts of frontal cortical patterned NPCs. Positive sorting for neural lineage markers CD184 and CD24, and negative sorting for mesenchymal/neural crest CD217 and CD44 glial progenitor markers, according to Yuan et al 2011, ensures frontal-patterned cortical NPCs as confirmed for all batches by immunohistochemistry for SOX2, Nestin and FOXG1. We have added new text to the Methods section to clarify this more explicitly.

      (5) Seeding protocol and parameters are insufficiently described, and from what I read they are poorly defined: "Specifically, the optimal seeding density was determined by visual inspection of the organoids between 28 to 42 days after seeding a range of cell densities in the 384-well plate wells." For a new method, precise, actionable instructions are needed. I may have overlooked those elsewhere, in this case, please clarify these sections.

      The Methods section was rewritten and expanded to describe the methodology in greater detail with more actionable instructions.

      (6) The timeline in Figure 1 is not clearly delineated; I found it hard to understand which figure corresponds to which stage (e.g. facs sorting is not mentioned in the first part of the results but it is part of Figure 1A, neural rosette formation can happen both before and after facs sorting, simply referring to rosettes is not clear). Later parts of the manuscript 
> clearly introduce the terms sorting and seeding in the context of this method, and how ages (days) refer to these time points.

      Figure 1 was adapted to clarify the generation of Neural Progenitor Cells (NPCs) and subsequent seeding of NPCs to generate Adherent Cortical Organoids (ACOs).

      (7) The authors define: "cortical organized defined as a single radial structure." This is not a commonly used definition of organoids, for nomenclature, please see: doi: 10.1038/s41586-022-05219-6 (Pasca et al 2022).

      To clarify, the statement is not meant to reflect a definition of organoids in general, but rather the scoring of proper structure formation for Figure S1C. For discussion on nomenclature, see our response to point 1 of Reviewer 1 in the public review. We changed the wording to be more accurate.

      (8) In Figure S1d, the authors write: "the fraction of structurally intact cultures decreased to 50%", but I'm looking at that graph there seems to be no notable decrease, but huge variability. The authors should quantify claims of decrease by linear regression and an R square. Variation within and the cross-cell lines seem to be large. Also, it is unclear if dots are corresponding to the same wells/plates, in other words: is this a longitudinal experiment? What is the overall success rate? How is success determined? Are there clear criteria? to the same wells/plates, in other words: is this a longitudinal experiment? What is the overall success rate? How is success determined? Are there clear criteria?

      We agree with the reviewer that the claim on fraction of intact cultures decreasing over time to 50% is an overinterpretation due the large variability. We changed the wording in the manuscript to: While some later batches show moderately reduced success rates compared with the earliest batches, properly formed single-structure organoids were still obtained at 40–90% success across all examined time points (Figure S1C), indicating that long-term culture is feasible albeit with variable efficiency. The data are not longitudinal as each dot represents an endpoint of a different batch of organoids, totaling 18 independent batches across the three lines. We have clarified this in the figure legend. Success was defined at the well level as the presence of a single, continuous radial structure occupying the well, without obvious fragmentation or fusion events, as assessed by LIVE/DEAD that also confirmed viability. Wells were scored as successful only when the radial structure showed predominantly live signal with no large necrotic areas. Wells containing multiple radial structures, fused aggregates, or predominantly dead tissue were scored as unsuccessful.

      (9) Figure s1c: the numbering to this panel should be swapped, because it is referenced after other panels in the text. The reference is confusing: "Plotting the interaction between proliferation and the amount of NPCs required to be seeded for the successful generation of adherent cortical organoids" - success is not present in this graph at all? How is that measured?

      Figures S1C and S1D have been adapted to clarify the measure of ‘successful organoid formation’.

      (a) The description of this plot is confusing: "The doubling time of the NPCs explains more than half the variation (r2 = 0.67) of the required seeding density." What else is there? I thought that this was the formula the authors suggested to determine seeding density, but it seems not. Or is "manual inspection" the determinant, and that seems to correlate with this metric?

      Even though the rate of proliferation, measured as doubling time, is the main determinant of the seeding density, it is not the only determinant of the seeding density. For instance, intrinsic differences in differentiation potential could also play a role. Therefore, NPC lines with similar doubling times might still have slightly different optimal seeding densities. We have added clarification of this conclusion to the Results section.

      (b) Seeding density is a key parameter in many in vitro differentiation and culture protocols. This importance however does not mean that this density is attributable to differences in cell proliferation rate. Alternatively, the amount of cells determines the amount of secreted molecules and cell-to-cell contacts.

      Here, when we refer to the cell density, we specifically refer to the cell density needed to generate the ACO. We show that the most important contributor to the variation in ACO formation is the proliferation, measured here as the doubling time. We agree that there are other factors involved such as the secreted molecules, cell-to-cell contacts as well as the ability of a given NPC line to differentiate into a post-mitotic cell.

      (c) Is it mentioned which cell line this experiment corresponds to?

      The data in Figure S1D is from the 3 reported cell lines, as well as 2 clones from a fourth IPS cell line. This is detailed in the Methods section of the proliferation assay.

      (d) Without a more detailed explanation, seeding density and doubling time could be independent variables.

      These two variables are highly correlated as shown in Figure S1D, but it is true that there can be other variables that account for the observed variance, as discussed above in Point 9b.

      (e) In this figure the success rate is not visible at all so I have no idea how the autors arrive at a conclusion about success rate.

      We have adapted the figure legend to reflect which cell lines the dots in Fig. S1D represent. NPC lines can have substantial variation in proliferation rates. The figure reflects data of NPCs of 5 clones of 4 different hiPSC lines (as indicated in the Methods) with different proliferation rates. Also, the ACO success rate (operationally defined uniformly to the data shown in Fig. S1C) was also included.

      (10) Figure 2: Clean spatial segregation seems to be a strength of the system and therefore I would recommend putting more of the relevant microscopy images to the main figure, which are now currently in Figure S4.

      We have adapted Figure 2 accordingly, and included additional representative cortical layering images in Figure S4.

      (11) The variability in interneuron content seems to be significant, as currently presented in the figure. However, this may be due to a special organization. It would first quantify in consecutive rings around the centers whether interneurons have a tendency to be enriched towards the center or the edge of the culture. Maybe this explains the variability that is currently present in Figure s5b.

      We agree that spatial organization of interneurons could, in principle, contribute to variability. In our analysis, however, images were acquired from positions selected by a random sampling grid across the entire culture, rather than from specific central or peripheral regions. Each field contained on average 130.6 ± 16.1 NeuN+ nuclei, which provided a relatively large sampling volume per position. If interneurons were strongly enriched at the center or edge, we would expect systematic differences in interneuron fraction between fields assigned to central versus peripheral grid positions. We did not observe such a pattern in our dataset, suggesting that spatial organization is not the main driver of the observed variability.

      (12) Because in previous figures it seems like there is considerable variability across individual cultures and images here are coming from separate cultures, please use different shapes of the points coming from different cultures/wells, to see if maybe there is a culture-to-culture difference that explains the variability present in the figure.

      We have added different symbols per organoid for the interneuron quantifications and moved this quantification to main Figure 2.

      (13) I believe it is currently the standard error of the mean which is displayed in the figure, which is not an appropriate representation for variability, or the reproducibility across individual data points. SEM quantifies the reproducibility of the mean, not the reproducibility of the individual data points, which matters here. Mean refers to the mean of this quantification experiment and therefore it's not a biological entity. A box plot showing the interquartile range besides the individual data points would be an accurate representation of the spread of the data.

      We agree and have adapted the data, now in Figure 5, accordingly.

      (14) Again, in general, the main figures should contain much more of the quantification, as opposed to just raw images.

      Quantifications have been added in Figure 2 for the GAD67/NeuN for all cell lines as well as a time course quantification of GAD67/NeuN for 1 of the cell lines. In Figure 4, we have added excitatory and inhibitory synaptic quantifications.

      (15) Figure 2F-I the location of the center of the rosette should be marked with a star so that the conclusion about the direction of processes can be established.

      The suggested addition of a marker at the center of each rosette was evaluated but not implemented, because it reduced rather than improved figure clarity.

      (16) Figure 3 b and c:

      High magnification images of single cells, can't show changes in cell type morphology, and one cannot conclude that these cells are present in significant numbers across time. Zoomed-out images or quantification would be necessary for such a claim. The authors already have such images as presented in the next panels, so quantification without new experiments.
> I am uncertain about the T3 supplement here - do these images correspond to the same conditions?

      (a) It is unclear to me why different markers are used in the different panels, namely why NG2 is not used in any of the other images.

      NG2 was used at early developmental time points to show the presence of Oligodendrocyte Precursor Cells (OPCs). At later time points, the focus switched to MBP staining to indicate more mature oligodendrocyte lineage cells. Although NG2 and MBP are not in the same panels, the staining was performed for both antibodies at the same developmental time point (Day 119) as seen in Figure 3C and 3D.

      (b) Color coding in Figure 3G is ambiguous; the use of two blues should be avoided, and the Sub-sub panels should be individually labeled for the color code.

      We agree, and have now used different colors.

      (c) It is unclear if the presence of the t3 molecule is part of the standard procedure or if it was a side experiment to enhance the survival of oligodendrocytes. Are there no oligodendrocytes without? How does T3 affect other cell types, and the general health and differentiation of the cultures?

      Indeed, T3 is essential for oligodendrocyte formation. We did not observe obvious effects on the general health or differentiation potential of the cultures.

      (d) Is the 2ng/ml t3 from day one to the final day?

      Indeed, in the organoids cultured to study oligodendrocyte formation, T3 was added from Day 1. These details have now been clarified in the Methods and Results sections.

      (17) Figure 4:

      (a) Microscopy in this figure is high quality and very convincing about neural maturity.

      (b) The term "cluster" should be avoided. Unclear what it means here, but my best guess is "cells in a frame of view." Cluster is used with a different meaning in electrophysiology.

      This was adapted to ‘neurons in a field of view (FOV)’.

      (c) Panel J: I assume each row corresponds to a single cell? Could this be clarified? Are these selected cells from each frame, or all active cells are represented?

      Indeed, each row corresponds to a single cell, showing all active cells in the frame. This is now clarified in the legend.

      (d) How many Wells do these data correspond to, and in which line it was measured?

      As reported in the legend for Figure 5, these data correspond to 2 wells at Day 61 to which we have now added calcium imaging data from 3 wells from a different batch at Day 100. We have included in the legend that these recordings were from Line 1.

      (e) Panels G to I, again, the use of standard error of the mean is inappropriate and misleading: looking at the error bar one must conclude that there is minimal variation, which is the exact opposite of the conclusions, when one would look at the variability of the raw data points.

      As suggested, the graphs have been adapted as boxplots with interquartile ranges to highlight the distribution of data points.

      (f) It is unclear how many neurons and how many total actively firing neurons are present in the videos analyzed

      All neurons that were active in the field of view and showed at least one calcium event during the ~10 minute recording were included in the analysis. Using this method, we cannot comment on the proportion of neurons that were active from the total amount of neurons present, since the AAV virus we used does not transduce all neurons.

      (g) This figure shows the strength of the method in achieving neural maturity and function. There seems to be that there is considerable activity in the neuronal cultures analyzed. To conclude how reliably the method leads to such mature cultures one would need to measure at least a dozen wells (even if with some simpler and low-resolution method). Concluding reproducibility from one or two hand-picked examples is not possible.

      We agree with the reviewer that the number of wells used for calcium imaging analysis was limited. We are currently working on more advanced methods to increase the throughput of this analysis. However, we’ve now added another timepoint to the calcium imaging data in Figure 5 from an independent batch of 3 adherent cortical organoids, which demonstrates continued robust activity at Day 100, as well as Day 61.

      Methods:

      (1) Stem cell culture. The artist described that line 3 is grown on MEFs. Is this true for the other two lines, furthermore were they cultured in identical conditions?

      Line 2 and 3 were not grown on MEFs. We specifically chose different sources of NPCs to reflect the robust nature of the differentiation protocol. We have recently also adapted the protocol from Line 3 NPCs to confirm that the protocol also works starting from hiPSCs grown in feeder-free conditions in StemFlex medium, by adapting NPC differentiation according to our recent publication in Frontiers in Cellular Neuroscience (Eigenhuis et al 2023).

      (2) "NPCs were differentiated to adherent cortical organoids between passages 3 and 7 after sorting." Please clarify this sentence. I assume it refers to the first facs sorting of the protocol, but a section is not sufficiently detailed.

      We have adapted the methods to clarify that the FACS purification step occurs at the NPC stage.

      (3) I didn't fully understand: It seems to be that there are two steps of fact sorting involved, one after passage 3 and one after week 4. This should be represented in the graphical abstract of Figure 1.

      As outlined above, there is only 1 FACS sorting step at NPC stage. We have adapted this in the Methods and in the graphical abstract.

      (4) Neural differentiation: The authors write that optimal seeding density was determined by visual inspection of the organoids - this is.

      We have clarified the Methods section to better explain the process of optimizing the seeding density for each NPC line to generate the ACOs.

      (5) What does the following sentence mean: "Cells were refreshed every 2-3 days." Does it mean in replacement of the complete media? How much Media was added to the Wells?

      This is a very good point that we have now clarified in the Methods, as full replenishment of media is neither feasible, nor desirable. From the total volume of 110 µl per well, 80 µl is taken out and replaced with 85 µl to compensate for evaporation.

      (6) Calcium imaging: can the authors explain the decision to move the cultures one day before imaging into brainphys neural differentiation medium? In 3D organoid protocols, brainphys is gradually introduced to avoid culture shock (very different composition), and used for multiple months to enhance neural differentiation. For recording electrophysiological activity, artificial CSF is the most common choice.

      Indeed, for whole cell recordings of 2D neural networks as performed in Günhanlar et al 2018, we used gradual transition to aCSF. For the current ACOs, we found that using BrainPhys from the start of organoid differentiation prevents structure formation, probably because of increased speed of maturation disrupting proliferation and organization of radial glia differentiation. However, by changing the media to BrainPhys just one day before recording (reflecting a gradual change as not all medium is fully replenished and easier than switching to aCSF during recording), we saw greatly improved neuronal activity.

      (7) Statistical analysis : As I pointed out before, the standard error of the mean is not an appropriate metric to represent the variability of the data. It is meant to represent the variability of the estimated average. The following thought experiment should make it clear: I measured the expression of a gene in my system. 50 times I measured 0 and 50 times I measured 100. The average is 50, but of course it is a very bad representation of the data because no such data points exist with that value. Yet the standard error of the mean would be plus minus 5.

      We have revised Figures 5C–5D to boxplots displaying the interquartile range with all individual data points overlaid, which more accurately represents the variability in the dataset.

      Discussion

      (1) The discussion focuses on human cortical development, however, the methods presented by the authors entail dissociation and replating through multiple stages not part of brain development. I see the approach as more valuable as a possibly reliable method that generates both diverse and mature neural cultures.

      We have revised the Discussion to avoid explicitly invoking an in vitro recapitulation of human cortical development. Nevertheless, given that the NPCs from which the organoids originate exhibit frontal cortical identity, coupled with the timely emergence of cortical neuronal markers and rudimentary cortical layering, we are increasingly confident that the development of these cultures most likely mirrors that of the frontal cortex. To further substantiate this hypothesis, single-cell RNA sequencing experiments will be conducted in the future to provide additional insights.

      (2) One of the major claims of the authors is that the method is very reproducible. However, there is almost no data on reproducibility throughout the paper. Mostly single, high magnification images are presented, which therefore represent a small region of a single well of a single batch of a single cell line. Based on the data presented it is not possible to evaluate the reproducibility of the method.

      We agree that the original version did not sufficiently document reproducibility. To address this, we have refined and expanded our presentation of reproducibility data. The previous success-rate panel (original Figure S1D) has been moved and adapted as the new Figure S1C. In this updated version, each dot still represents the endpoint success rate of an independent batch, but dot size now scales with batch size (10–40 organoids), and the legend specifies the total numbers of organoids analyzed per line (line 1: n=248; line 2: n=70; line 3: n=70). Together with the distribution of success rates between ~40– 90% across multiple time points and three iPSC lines, this more detailed representation allows readers to directly assess the robustness of line-to-line and batch-to-batch performance. In addition, new time course quantifications of interneuron proportion (Figure 2G,H), synaptic marker densities (Figure 4H, I), and late-stage calcium imaging (Figure 5C,D,E) further demonstrate that key structural and functional read-outs show overlapping ranges across lines and independent differentiations, reinforcing that the method yields reproducible core phenotypes despite some biological variability.

      (3) The data presented is very promising, and it suggests that the authors derived optimal conditions for neural differentiation and neural culture diversification. I am confident that the authors can show that reproducibility, at least in a practical sense (e.g. in wells that form a culture) is high.

      Overall, this is a very promising and exciting work, that I am looking forward to reading in a mature manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) Direct head-to-head comparison with standard organoid culture seems to be missing and may be valuable for benchmarking, ie what can be done with the new method that cannot be done with standard culture and vice versa, ie what are the aspects in which new method could be inferior to the standard.

      We have now more clearly elaborated the differences with other methods. As addressed in our response to point 2 of Reviewer 1 in the public reviews, there are several limitations and advantages to the adherent cortical organoids model listed as follows:

      Advantages of adherent cortical organoids:

      (1) Higher order self-organized structure formation, including segregation of deeper and upper cortical layers.

      (2) Longevity: adherent cortical organoids can be successfully kept in culture for at least 1 year, whereas 2D cultures typically deteriorate after 8-12 weeks.

      (3) Maturity, including the formation of dendritic mushroom spines and robust electrophysiological activity.

      (4) Cell type diversity including a more physiological ratio of inhibitory and excitatory neurons (10% GAD67+/NeuN+ neurons in adherent cortical organoids, vs 1% in 2D neural networks), and the emergence of oligodendrocyte lineage cells.

      On the other hand, limitations of adherent cortical organoids compared to 2D neural network cultures include:

      (1) Culture times for organoids are much longer than for 2D cultures and the method can therefore be more laborious and more expensive.

      (2) Whole cell patch clamping is not easily feasible in adherent cortical organoids because of the restrictive geometry of 384-well plates.

      (2) It would be important to further benchmark the throughput, ie what is the success rate in filling and successfully growing the organoids in the entire 384 well plate?

      We have addressed this question in the current version of Fig. S1C, in which multiple batches of organoids of all three lines were scored for their success rate. The graph reflects the proportion of properly formed organoids of +/- 400 seeded wells scored at different timepoints, in which each timepoint is a different batch. As mentioned in the response to Reviewer 1, we have also added data on the number of organoids seeded per line in the figure legend.

      (3) For each NPC line an optimal seeding density was estimated based on the proliferation rate of that NPC line and via visual observation after 6 weeks of culture. It would be important to delineate this protocol in more robust terms, in order to enable reproducibility with different cell lines and amongst the labs.

      As outlined in the response to Reviewer 1, we have clarified the Methods and Discussion sections on seeding density and proliferation rate.

      Reviewer #3 (Recommendations for the authors):

      Kroeg et al. have introduced a novel method to produce 3D cortical layer formation in hiPSC-derived models, revealing a remarkably consistent topography within compact dimensions. This technique involves seeding frontal cortex-patterned iPSC-derived neural progenitor cells in 384-well plates, triggering the spontaneous assembly of adherent cortical organoids consisting of various neuronal subtypes, astrocytes, and oligodendrocyte lineage cells. Compared to existing brain organoid models, these adherent cortical organoids demonstrate enhanced reproducibility and cell viability during prolonged culture, thereby providing versatile opportunities for high-throughput drug discovery, neurotoxicological screening, and the investigation of brain disorder pathophysiology. This is an important and timely issue that needs to be addressed to improve the current brain organoid systems. While the authors have provided significant data supporting this claim, several aspects necessitate further characterization and clarification. Particularly, highlighting the consistency of differentiation across different cell lines and standardizing functional outputs are crucial elements to emphasize the future broad potential of this new organoid system for large-scale pharmacological screening.

      (1) Considering the emergence of astrocyte markers (GFAP, S100b) and upper layer neuron marker (CUX1) around Day 60, the overall differentiation speed is significantly faster compared to other forebrain organoid protocols. Are these accelerated sequences of neurodevelopment consistent across different hiPSC lines?

      As shown in Fig. S5, astrocytes are present around Day 60 for all three lines. For comparison with other organoid protocols, an important consideration is that the timeline for these organoids starts at NPC plating, while for other protocols timing often starts from the hiPSC stage. We have clarified the timeline in the graphical abstract in Figure 1A and in the Methods.

      (2) The calcium imaging results in Figure 4G were recorded at a single time point, Day 61, a relatively early time window compared to other forebrain organoid protocols (more than 100 days, PMID: 31257131; PMID: 36120104). Are the neurons in adherent cortical organoids functionally mature enough around Day 61? How consistent is this functional activity across different cell lines and independent differentiation batches?

      As discussed above in Point 1, it is important to consider that the specified timeline starts from NPC plating. In analogy to 2D neural networks, robust neuronal activity can be observed after ~8 weeks in culture. In addition, we have now added calcium imaging data for an additional batch of organoids at Day 100 in Figure 5, which exhibit comparable levels of neuronal activity as observed on Day 61.

      (3) Along the same line, Various cell types, such as oligodendrocytes and astrocytes, are believed to influence neuronal maturation. Therefore, longitudinal studies until the late stage are necessary to observe changes in electrophysiological activity based on the degree of neuronal maturation (at least two more later time points, such as 100 days and 150 days).

      As described in the previous points, we have now included a Day 100 time point in the calcium imaging data, in addition to the recordings at Day 61 (Figure 5C-E).

      (4) The authors assert that heterogeneity among organoids has been diminished using the human adherent cortical organoids protocol. However, there is inadequate quantitative data to prove the consistency of neuronal activities between different wells. Therefore, experiments quantifying the degree of heterogeneity between organoids, such as through methods like calcium imaging, are necessary to determine if neuron activity occurs consistently across each organoid well.

      We agree with the review and have added several quantitative experiments: a) we’ve added another timepoint to the calcium imaging data in Figure 5 from an independent batch of 3 adherent cortical organoids, which demonstrates continued robust activity at day 100, as well as day 61; b) we added synapse quantification in Figure 4, and c) interneuron quantification in Figure 2. We are currently also pursuing high throughput measures of activity to assess the longitudinal activity of ACOs in a larger number of wells. This way we can more definitively quantify the time-dependent variance in organoid activity.

      (5) Is this platform applicable to other functional measurements for neuronal activity, such as the MEA system? When observing the morphology of neurons formed in organoids, they appear to extend axons and dendrites in a consistent direction, suggesting a radial structure that demonstrates high reproducibility across wells. A culture system where neurons are arranged with such consistency in directionality could be highly beneficial for experiments utilizing the MEA system to assess parameters such as the speed of electrical activity transmission and stimulus-response. Therefore, there seems to be a need for a more detailed explanation of the utility of the structural characteristics of the culture system.

      The ACO platform is indeed suitable for MEA recordings. We are in the process of engineering the required geometry using HD-MEA systems through specialized inserts to generate ACOs on MEA systems.

      (6) In Figure 2E-I, authors suggest morphological diversity of GFAP+/S100b+ astrocyte, but the imaging data presented in Figure F-I is only based on GFAP immunoreactivity.

      Since GFAP is also expressed in radial glial cells at this stage (Figure 2I), many fibrous astrocytes and interlaminar astrocytes are likely radial glial neural progenitor cells instead of astrocytes. It appears necessary to perform additional staining using astrocyte markers such as S100B or outer radial glia markers such as HOPX to demonstrate that the figure depicts subtype-specific morphologies of astrocytes.

      In Figure 2M, we stained for GFAP and PAX6 to mark radial glia that look different than the astrocyte morphologies we describe in Figure 2J-L. We see a large overlap in GFAP and S100B staining in Figure 2I, in which most GFAP+ cells are double positive for S100B (yellow) that is more consistent with astrocyte maturation than radial glia. Furthermore, we have not seen PAX6 staining outside the dense edges of the center of the ACO.

      (7) In Figure 4D, the axon appears to exhibit directionality. Additional explanation regarding the organization of the axon is necessary. Further research utilizing sparse staining to examine the morphology of single neurons seems warranted.

      The polarized directionality of the axons is something we indeed have also noticed. We are looking into options to further investigate this intriguing property of the ACOs.

      (8) Figure 1E-F only showed cell viability in the early stages around Day 40-50. To demonstrate the superior long-term viability of ACO culture, it appears necessary to illustrate the ratio of dead cells to live cells over the course of a time course.

      Figure S1B shows LIVE/DEAD staining for ACOs of all three lines, revealing minimal DEAD staining at Day 56. A longitudinal time course experiment was not performed, however the line- and batch-specific quantifications over developmental timepoints in Figure S1C provide an indication of the robust long-term viability of the ACOs.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this fMRI study, the authors wished to assess neural mechanisms supporting flexible temporal construals. For this, human participants learned a story consisting of fifteen events. During fMRI, events were shown to them, and participants were instructed to consider the event from "an internal" or from "an external" perspective. The authors found distinct patterns of brain activity in the posterior parietal cortex (PPC) and anterior hippocampus for the internal and the external viewpoint. Specifically, activation in the posterior parietal cortex positively correlated with distance during the external-perspective task, but negatively during the internal-perspective task. The anterior hippocampus positively correlated with distance in both perspectives. The authors conclude that allocentric sequences are stored in the hippocampus, whereas egocentric sequences are supported by the parietal cortex.

      We thank the reviewer for the accurate summary of our study.

      Strengths:

      The research topic is fascinating, and very few labs in the world are asking the question of how time is represented in the human brain. Working hypotheses have been recently formulated, and the work tackles them from the perspective of construals theory.

      We appreciate the reviewer's positive and encouraging comments.

      Weaknesses:

      Although the work uses two distinct psychological tasks, the authors do not elaborate on the cognitive operationalization the tasks entail, nor the implication of the task design for the observed neural activation.

      We thank the reviewer for bringing this issue to our attention. In the revised manuscript, we have added a paragraph to the Discussion acknowledging this potential limitation of the study. Please see our response below.

      Reviewer #1 (Recommendations for the authors):

      Overall, I thank the authors for providing clear responses and much-needed detail on their original work, which enables a better understanding of their perspectives. I still have some detailed questions about the reported work, which I provide below. It could help clarify the work for a more general audience and its replicability by the community.

      We thank the reviewer for their positive evaluation of our previous revisions.

      Main general concern:

      I have one remaining core concern, which I distill as being a very different take on the usefulness of task design with neuroimaging. This concern follows from the authors' response to my original comment, which suggested possible confounds in fMRI data analysis and interpretation, as differences in task design and behavioral outcomes were not incorporated in the analytical approach.

      The authors confirmed that "there is a substantial difference between the two tasks" but argue that these differences are not relevant seing that "the primary goal of this study was not to directly compare these tasks to isolate a specific cognitive component " However, the authors do perform such contrasts in their analysis (e.g. p. 10: "We first directly contrasted the activity level between external- and internal-perspective tasks in the time window of...") and build inferences on brain activation from them (e.g., p. 10: "Compared with the internal-perspective task, the externalperspective task specifically activated the...").

      To clarify, my original concern was not about comparing neural activity in response to the two tasks but about the brain activity generated by two distinct tasks, which aim to reveal fundamentally distinct neural processes. The authors' response raises several concerns about the theoretical, methodological and empirical foundation of the work that are beyond the scope of a single empirical study and too long to detail here. Cognitive neuroscience relies on tasks to infer neural processes; this is the fertile and essential ground for using behavior in neuroscience to get to a mechanistic understanding of brain functions (e.g., Krakauer et al., 2017). In short, task design is fundamental because it shapes what neural processes are being investigated. Any inferences about brain activity recorded while a participant performs a task result from manipulated variables that should be under the control of the experimenter. Acknowledging that two tasks are distinct is acknowledging that different (neural) processes may govern their resolution. My initial remark was meant to highlight that, from basic signal detection theory, a same/different task and a temporal order task may not yield the same kind of basic biases and decision-making processes; these are far below and more basic than the posited sophisticated representations herein (construals, perspective taking).

      In short, the general approach is far coarser than the level of interpretational granularity being pushed forward in the paper would suggest.

      We greatly appreciate the reviewer’s comments and agree that this is a very fair point. We acknowledge that the two tasks differ in their underlying decision-making processes. In the revised manuscript, we have added a paragraph at the end of the Discussion to explicitly acknowledge this limitation and to outline possible avenues for future research (Page 23).

      “One limitation of the present study is that the external- and internal-perspective tasks differed not only in the type of perspective-taking they were intended to elicit, but also in their underlying decision-making processes. The external-perspective task explicitly required participants to compare two events with respect to external temporal landmarks and judge whether they occurred in the same or different parts of the day (i.e., a same/different judgment), whereas the internalperspective task explicitly required participants to project themselves into a reference event and judge whether the target event occurred in the future or the past relative to that reference (i.e., a temporal-order judgment). This task design ensured that participants adopted two distinct perspectives on the event series, but at the expense of coherence in the cognitive operations required to make the two types of judgments. One alternative approach would be to more closely align the response demands of the two tasks by drawing on McTaggart’s (1908) A-series and Bseries distinction: in the external-perspective task, participants could judge whether the target event occurred before or after the reference event (i.e., a before/after judgment), whereas in the internal-perspective task they could judge whether the target event occurred in the past or future relative to the reference event (i.e., a past/future judgment). Although such a design would improve coherence in the underlying decision-making processes (i.e., both are temporal-order judgments), it would reduce experimental control over the perspective-taking manipulation. For example, before/after judgments could still be made from an internal perspective. Future studies are therefore needed to determine whether findings obtained from these two task designs converge.”

      Additional clarifications:

      Intro/theory

      In this revised MS, the authors provided some clarifications of their theoretical perspective in the introduction. From my standpoint, the motivation remains insufficiently precise for a scientific report. Some theoretical aspects, such as construals or perspective taking remain evasive in relation to ego and allocentric representations. A couple of paragraphs dedicated to explaining what the authors mean precisely when using these terms would greatly help to situate the validity of the working hypothesis. In the absence of clear definitions, it remains difficult to evaluate what is being tested. For instance, what do the authors mean by "time construal"? How is a time construal the same or not as a "temporal distance" or a "temporal sequence"? This would greatly help the readership.

      Additionally, some assertions are not clearly identified or fairly attributed. For instance, the assertion that EST provides a means to spatialize time is the authors' point of view or interpretation of this work, not an original proposition of the theory. Another example is McTaggart's metaphysics on time series (in the ontology of time in physics) "echoed" in linguistics; it has effectively been proposed and popularized by L. Boroditskty. The prospective and retrospective views of time should not be attributed to Tsao et al but to Hicks or Block in the 70's, who studied the psychology of time in humans.

      We sincerely thank the reviewer for this criticism, which prompted us to clarify the relevant concepts in our manuscript. In the revised version, we made the following three main changes to the Introduction.

      In the second paragraph of the Introduction (page 3), we clarify that event segmentation theory is independent of, but related to, the spatial construal of time hypothesis. We also clarify what we mean by time construals and explain that the two temporal components—duration and sequence—can be represented within such time construals, rather than constituting time construals themselves. These revisions were intended to prevent potential misunderstandings for the reader. In addition, we incorporated Boroditsky’s contributions relevant to this framework:

      “One solution, which might be unique to humans, is to conceptualize time in terms of space (i.e., the spatial construal of time; e.g., Clark, 1973; Traugott, 1978; Lakoff & Johnson, 1980). Within this framework, time is usually first segmented into events—the basic temporal entities that observers conceive as having a beginning and an end (Zacks & Tversky, 2001). These temporal entities are then ordered in space, such that events occurring at different times can be maintained in working memory, allowing them to be flexibly accessed from different perspectives and easily referenced during communication (e.g., Casasanto & Boroditsky, 2008; Núñez & Cooperrider, 2013; Bender & Beller, 2014; Abrahamse et al., 2014; Figure 1A). The two core temporal components—duration and sequence—can be readily represented in such time construals.”

      In the third paragraph of the Introduction (pages 3-4), we acknowledge the contributions of earlier behavioral studies on prospective and retrospective timing by citing the work suggested by the reviewer (Block & Zakay, 1997), which indicates that two distinct cognitive systems underlie timing processes. These behavioral findings converge with the conclusions of more recent neuroimaging studies:

      “Unlike prospective timing tracking the continuous passage of time, durations in time construals are event-based (Sinha & Gärdenfors, 2014): the interval boundaries are constituted by events, and the event durations reflect their span (Figure 1A). Accumulating evidence suggests that distinct cognitive systems underlie these two types of duration (e.g., Block & Zakay, 1997). The motor and attentional system—particularly the supplementary motor area—has been associated with prospective timing (e.g., Protopapa et al., 2019; Nani et al., 2019; De Kock et al., 2021; Robbe, 2023), whereas the episodic memory system—particularly the hippocampus—is considered to support the representation of duration embedded within an event sequence (e.g., Barnett et al., 2014; Thavabalasingam et al., 2018; see also the comprehensive review by Lee et al., 2020).”

      Block, R. A., & Zakay, D. (1997). Prospective and retrospective duration judgments: A meta-analytic review. Psychonomic Bulletin & Review, 4(2), 184-197.

      In the fifth paragraph of the Introduction (page 5), we added a sentence to clarify the relationship between allocentric and egocentric reference frames and perspective taking:

      “However, the neural mechanisms that enable the brain to generate distinct construals of an event sequence remain largely unknown. Valuable insights may be drawn from research in the spatial domain, which posits the existence of stable allocentric representations that are independent of viewpoint, from which variable egocentric representations corresponding to different perspectives can be generated.”

      Methods:

      While more detail is provided in the Methods, some additional detail would be helpful to enable the replication of this work. For instance,

      - The table reports a sequence of phrases with assigned durations. Are the event phrases actual sentences given to participants? If so, how were participants made aware of the duration of the events, seeing that these sentence parts do not provide time information?

      We apologize that we did not make this clear. The full text used during the reading phase of learning has already been provided in Figure 1—source data 1, which includes the information about event durations. In the revised manuscript, we now explicitly refer to this information in the Methods section (page 38): In the reading phase, participants read a narrative describing the whole ritual on a computer screen twice (Figure 1—source data 1).

      - One of my original questions was about the narrative. In the Methods section, the authors state that participants read a text. Providing the full text would be helpful, also as a sanity check for sequentiality.

      As clarified in the previous response, the texts are provided in Figure 1—source data 1, which illustrates the texts for both even- and odd-numbered participants.

      - In the imagination phase, the authors introduce proportionality between imagination and experience (p. 37). What scale was used? What motivated it?

      We thank the reviewer for bringing this issue to our attention. In this study, participants did not directly experience the events; instead, they learned the event information through narrative reading or imagination to ensure experimental control and efficiency. As clarified in the Methods section, the ratio between imagination duration and actual event duration was 30 seconds to 1 hour. In the revised manuscript, we have further explained our motivation for this design choice (page 39):

      Here, we let participants learn the event information through narrative reading or imagination. Compared to learning through actual experience, this approach prioritizes experimental control and efficiency. The timing of the events is compressed, akin to the process of retrospectively recalling our experiences, in which we mentally traverse events without requiring the actual time they originally took. However, future studies may be needed to investigate whether the encoding of events from first- and second-hand experience differs.

      Results:

      - p. 10: the interpretation of the data on chunking and boundary effects should be properly referenced to e.g. Davachi's published work.

      We thank the reviewer for highlighting Davachi’s important work on event boundaries. We have appropriately cited these studies in the revised manuscript (page 10), as reflected in the following passage: This pattern can be interpreted as a categorical effect: sequential distances within the same part of the day were perceived as shorter (i.e., a chunking effect), whereas distances spanning different parts of the day were perceived as longer (i.e., a boundary effect). Similar boundary- or chunking-related effects on event cognition have been reported in previous studies (e.g., Ezzyat & Davachi, 2011; DuBrow & Davachi, 2013; Radvansky & Zacks, 2017).

      Ezzyat, Y., & Davachi, L. (2011). What constitutes an episode in episodic memory?. Psychological Science, 22(2), 243-252.

      DuBrow, S., & Davachi, L. (2013). The influence of context boundaries on memory for the sequential order of events. Journal of Experimental Psychology: General, 142(4), 1277.

      Radvansky, G. A., & Zacks, J. M. (2017). Event boundaries in memory and cognition. Current Opinion in Behavioral Sciences, 17, 133-140.

      Reviewer #2 (Public review):

      Summary:

      Xu et al. used fMRI to examine the neural correlates associated with retrieving temporal information from an external compared to internal perspective ('mental time watching' vs. 'mental time travel'). Participants first learned a fictional religious ritual composed of 15 sequential events of varying durations. They were then scanned while they either (1) judged whether a target event happened in the same part of the day as a reference event (external condition); or (2) imagined themselves carrying out the reference event and judged whether the target event occurred in the past or will occur in the future (internal condition). Behavioural data suggested that the perspective manipulation was successful: RT was positively correlated with sequential distance in the external perspective task, while a negative correlation was observed between RT and sequential distance for the internal perspective task. Neurally, the two tasks activated different regions, with the external task associated with greater activity in the supplementary motor area and supramarginal gyrus, and the internal condition with greater activity in default mode network regions. Of particular interest, only a cluster in the posterior parietal cortex demonstrated a significant interaction between perspective and sequential distance, with increased activity in this region for longer sequential distances in the external task but increased activity for shorter sequential distances in the internal task. Only a main effect of sequential distance was observed in the hippocampus head, with activity being positively correlated with sequential distance in both tasks. No regions exhibited a significant interaction between perspective and duration, although there was a main effect of duration in the hippocampus body with greater activity for longer durations, which appeared to be driven by the internal perspective condition. On the basis of these findings, the authors suggest that the hippocampus may represent event sequences allocentrically, whereas the posterior parietal cortex may process event sequences egocentrically.

      We sincerely appreciate the reviewers for providing an accurate, comprehensive, and objective summary of our study.

      Strengths:

      The topic of egocentric vs. allocentric processing has been relatively under-investigated with respect to time, having traditionally been studied in the domain of space. As such, the current study is timely and has the potential to be important for our understanding of how time is represented in the brain in the service of memory. The study is well thought out and the behavioural paradigm is, in my opinion, a creative approach to tackling the authors' research question. A particular strength is the implementation of an imagination phase for the participants while learning the fictional religious ritual. This moves the paradigm beyond semantic/schema learning and is probably the best approach besides asking the participants to arduously enact and learn the different events with their exact timings in person. Importantly, the behavioural data point towards successful manipulation of internal vs. external perspective in participants, which is critical for the interpretation of the fMRI data. The use of syllable length as a sanity check for RT analyses as well as neuroimaging analyses is also much appreciated.

      We thank the reviewer for the positive and encouraging comments.

      Suggestions:

      The authors have done a commendable job addressing my previous comments. In particular, the additional analyses elucidating the potential contribution of boundary effects to the behavioural data, the impact of incorporating RT into the fMRI GLMs, and the differential contributions of RT and sequential distance to neural activity (i.e., in PPC) are valuable and strengthen the authors' interpretation of their findings.

      My one remaining suggestion pertains to the potential contribution of boundary effects. While the new analyses suggest that the RT findings are driven by sequential distance and duration independent of a boundary effect (i.e., Same vs. Different factor), I'm wondering whether the same applies to the neural findings? In other words, have the authors run a GLM in which the Same vs. Different factor is incorporated alongside distance and duration?

      We thank the reviewer for their positive evaluation of our previous revisions and are pleased that the additional analyses adequately address the boundary effects in the behavioral data and the RT effects in the neural data.

      With respect to boundary effects in the neural data, we followed the reviewer’s suggestion and constructed a more complex GLM that incorporated the Same/Different part of the day as an additional regressors modulating the target events. Importantly, the same PPC region continued to show an interaction effect between Task Type and Sequential Distance. We have added this important control analysis in our revised manuscript (Pages 13–14):

      “To further assess whether the observed PPC reactivation can be attributed to boundary or chunking effects introduced by the Parts of the Day, as well as other behavioral outputs, we performed an additional control analysis. Using a more complex first-level model, we included two extra regressors modulating the target events in both internal- and external-perspective tasks, alongside Sequential Distance and Duration: (1) Same/Different parts of the day (coded as 1/−1) and (2) Future/Past (coded as 1/−1). Even with these additional controls, the same PPC region remained the strongest area across the entire brain, showing an interaction effect between Task Type and Sequential Distance, although the cluster size was slightly reduced (voxel-level p < 0.001; clusterlevel FWE-corrected p = 0.054).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their enthusiasm and insightful suggestions. Our responses to specific concerns and questions are detailed below.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors use Flow cytometry and scRNA seq to identify and characterize the defect in gdT17 cell development from HEB f/f, Vav-icre (HEB cKO), and Id3 germline-deficient mice. HEB cKO mice showed defects in the gdT17 program at an early stage, and failed to properly upregulate expression of Id3 along with other genes downstream of TCR signaling. Id3KO mice showed a later defect in maturation. The results together indicate HEB and Id3 act sequentially during gdT17 development. The authors further showed that HEB and TCR signaling synergize to upregulate Id3 expression in the Scid-adh DN3-like T cell line. Analysis of previously published Chi-seq data revealed binding of HEB (and Egr2) at overlapping regulatory regions near Id3 in DN3 cells.

      The study provides insight into mechanisms by which HEB and Id3 act to mediate gdT17 specification and maturation. The work is well performed and clearly presented. We only have minor comments.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Selvaratnam et al. defines how the transcription factor HEB integrates with TCR signaling to regulate Id3 expression in the context of gdT17 maturation in the fetal thymus. Using conditional HEB ablation driven by Vav Cre, flow cytometry, scRNA-seq, and reanalysis of ChIP-seq data the authors, provide evidence for a sequential model in which HEB and TCR-induced Egr2 cooperatively upregulate Id3, enabling gdT17 maturation and limiting diversion to the ab lineages. The work provides an important mechanistic insight into how the E/ID-protein axis coordinates gd T cell specification and effector maturation.

      Strengths include:

      (1) The proposed model that HEB primes, TCR induces, and Id3 stabilizes gdT17 cells in embryonal development is elegant and consistent with the findings.

      (2) The choice of animal models and the study of a precise developmental window.

      (3) The cross-validation of flow, scRNA-seq, and ChIP-seq reanalyses strengthens the conclusions.

      (4) The study clarifies the dual role of Id3, first as an HEB-dependent maturation factor for gdT17 cells, and as a suppressor of diversion to the ab lineages.

      Weaknesses:

      (1) The ChIP-seq reanalysis indicates overlapping HEB, E2A, and Egr2 peaks ~60 kb upstream of Id3. Given that the Egr2 data are not generated using the same thymocyte subsets, some form of validation should be considered for the co-binding of HEB and Egr2, potentially ChIP-qPCR in sorted gdT17 progenitors.

      We agree that this is a valid concern and continue to work on confirming the mechanism from several other angles. Validating HEB/E2A and Egr2 co-binding in gdT17 cell progenitors by ChIP-qPCR would/will be a very precise and definitive experiment, but it will be very challenging to perform, in part due to the low numbers of gdT17 precursors in the fetal thymus (note the y-axis scales in Fig. 1F, J). As a complementary approach, we have analyzed additional ChIP-seq data for HEB/E2A binding in Rag2<sup>-/-</sup> DN3 cells retrovirally transduced with the KN6 gdTCR cultured with stroma expressing the weak KN6 ligand T10 for 4 days. This analysis revealed that the binding of HEB/E2A on those sites persisted after weak gdTCR signaling, strengthening the likelihood that concurrent binding of HEB/E2A and Egr2 occurs during this developmental transition. We noted that HEB/E2A binding was slightly dampened in Rag2<sup>-/-</sup> DN3 + gdTCR cells relative to Rag2<sup>-/-</sup> DN3 cells, consistent with the induction of Id3 and subsequent Id3-mediated disruption of E protein binding. We also located HEB/E2A and Egr binding sites in close proximity in the two regions that shared peaks between HEB/E2A and Egr2 analyses (HE1 and HE2), in line with the potential participation of these two transcription factors in an enhanceosome binding complex.

      Furthermore, we examined the chromatin landscape of the Id3 locus by sorting WT DN3 and DN4 cells, as well as Rag2<sup>-/-</sup> DN3 cells to provide a genuine pre-selection context, and performing ATAC-seq (Figure 7–suppl 7A). Given the known ability of E2A and HEB to induce chromatin remodeling, we also examined accessibility in DN3 and DN4 cells from HEB cKO mice. Alignment of ATAC-seq and ChIP-seq peaks in the Id3 locus revealed accessibility of HE1 and HE2 in Rag2<sup>-/-</sup>, WT DN3, and WT DN4 cells. However, accessibility of HE1 and HE2 was dampened in HEB cKO cells, especially at the DN3 stage, suggesting that HEB may be involved in remodeling the Id3 locus, resulting in a poised state that enables TCR-dependent transcription factors to induce Id3 proportionally to TCR signal strength. These data are now presented as a new “Figure 7 – figure supplement 1” with corresponding Results, Discussion, and Methods updates.

      Our next story will be focused on a finer dissection of the Id3 cis-regulatory elements and their combinatorial regulation by HEB/E2A and other transcription factors, and how they relate to specific signaling pathways. For this study, we will modify the language regarding Egr2 to reflect the open questions that still remain to be addressed.

      (2) E2A expression is not affected in HEB-deficient cells, raising the question of partial compensation, a point that should be specifically discussed.

      This confounding factor is always an issue with E proteins. We have now added a section to the discussion that highlights previous literature and relates it to our findings.

      (3) All experiments are done at E18, when fetal gdT17 development predominates. The discussion could address whether these mechanisms extend to neonatal or adult gdT17 subsets.

      In our 2017 paper (PMID 29222418) we showed that HEB cKO mice have defects in the production of functional gdT17 cells in fetal and neonatal thymus and in the adult periphery (in lungs and spleen). While the adult thymus does not support the development of fully functional innate gd T cells, it does contain gdTCR+ cells that have activated the Sox-Maf-Rorc network (Yang 2023, PMID 37815917). It will be very interesting to assess the impact of HEB loss on these cells, and we are actively pursuing this goal. For now, we will add a paragraph to the discussion addressing what we know from previous work and what is yet to be learned.

      Reviewer #3 (Public review):

      Summary:

      The authors of this manuscript have addressed a key concept in T cell development: how early thymus gd T cell subsets are specified and the elements that govern gd T17 versus other gd T cell subsets or ab T cell subsets are specified. They show that the transcriptional regulator HEB/Tcf12 plays a critical role in specifying the gd T17 lineage and, intriguingly, that it upregulates the inhibitor Id3, which is later required for further gd T17 maturation.

      Strengths:

      The conclusions drawn by the authors are amply supported by a detailed analysis of various stages of T cell maturation in WT and KO mouse strains at the single cell level, both phenotypically, by flow cytometry for various diagnostic surface markers, and transcriptionally, by single cell sequencing. Their conclusions are balanced and well supported by the data and citations of previous literature.

      Weaknesses:

      I actually found this work to be quite comprehensive. I have a few suggestions for additional analyses the authors could explore that are unrelated to the predominant conclusions of the manuscript, but I failed to find major flaws in the current work.

      I note that HEB is expressed in many hematopoietic lineages from the earliest progenitors and throughout T cell development. It is also noteworthy that abortive gamma and delta TCR rearrangements have been observed in early NK cells and ILCs, suggesting that, particularly in early thymic development, specification of these lineages may have lower fidelity. It might prove interesting to see whether their single-cell sequencing or flow data reveal changes in the frequency of these other T-cell-related lineages. Is it possible that HEB is playing a role not only in the fidelity of gdT17 cell specification, but also perhaps in the separation of T cells from NK cells and ILCs or the frequency of DN1, DN2, and DN3 cells? Perhaps their single-cell sequencing data or flow analyses could examine the frequency of these cells? That minor caveat aside, I find this to be an extremely exciting body of work.

      Excellent question, and the underlying answer is yes, loss of HEB renders the cells more open to divergence to non-T lineages, even at the DN3 stage. Although our datasets did not reveal those cells, we have examined this question previously. In our 2011 paper (Braunstein, 2011, PMID 21189289) where we identified “DN1-like” cells arising from HEB-/- DN3 cells in OP9-DL1 co-cultures. These cells responded to IL-15 and IL-7 by differentiating into cytotoxic NK-like cells. We did not detect TCRb rearrangements but did not look for gdTCR rearrangements. Subsequently, multiple papers from other labs showed that ILC2 were greatly expanded in the thymus using Id-overexpression transgenic mice and HEB/E2A-double deficient mice (Miyazaki, 2023, PMID 28514688; Miyazaki, 2025, PMID 39904558; Berrett, 2019, PMID 31852728; Qian, 2019, PMID 30898894; Peng, 2020, PMID:32817168). The ILCs in these mice had TCRg rearrangements, consistent with a shared origin with WT thymic-derived ILCs. In unpublished data from our lab, we found an increase in the numbers of ILC2 but not ILC3 in HEB cKO fetal thymic organ cultures. We did not follow up on this work any further since the topic was being heavily pursued in other labs, but remain very interested in this branchpoint, and will mention the literature in the discussion.

      Joint recommendations for the authors:

      (1) Experimental validation (for mechanistic clarity)

      The ChIP-seq reanalysis indicates overlapping HEB, E2A, and Egr2 peaks ~60 kb upstream of Id3. Given that the Egr2 data are not generated using the same thymocyte subsets, some form of validation should be considered for the co-binding of HEB and Egr2, potentially ChIP-qPCR in sorted gdT17 progenitors to substantiate the proposed cooperative mechanism.

      See above; new experiments with ATAC-seq and additional ChIP-seq analysis.

      (2) Figures

      Potential inconsistencies in Figure 1H: In the legend to Figure 1H, Vg1-Vg5- cells are considered Vg6+ cells. Flow plots show reduced A Vg1-Vg5- population in HEBc ko mice, but the accompanying bar plot shows increased frequency of Vg6+ cells.

      Vg6 cells are actually considered to be Vg4-Vg5-Vg1- cells (not Vg4- Vg1- cells, which is important in the fetal context). The flow plot shows the percentage of Vg6 cells out of the Vg1-Vg4- population, whereas the bar plot shows the percentage of Vg6 cells out of all gdTCR+ cells. The ratio of Vg6 to Vg5 cells decreases within the Vg1-Vg4- population, whereas the overall percentages and numbers of Vg6 cells in all gd T cells is increased in HEB cKO mice. We have now more clearly explained this in the text and the figure legend.

      Clarify which cells produce IL-17A in Figure 1L.

      This plot is gated on all gd T cells stimulated with PMA/ionomycin; this has been added to the results and figure legend.

      In Supplementary Figure 2, legend, do the authors mean that TRGV4 was depleted? The authors write TRDV4. Please check.

      Thank you for catching this mistake, we have corrected it.

      In Figure 7, the Author showed Id3 mRNA expression. Can the expression of Id2 be included?

      That is a really interesting question, and we will follow up on it in future studies.

      If Id1 or Id4 are relevant for any of these studies, can their expression be shown in Supplementary Figure 3A? If these are minimally expressed or not expressed, this could be mentioned.

      Id1 and Id4 were not detectable in our studies, this is now stated in the results section describing expression of E proteins and Id proteins.

      (3) Discussion

      Discuss possible redundancy between HEB and E2A, as E2A expression appears unaffected in HEB-deficient cells.

      See above

      Address whether the mechanisms identified at E18 (embryonic stage) also apply to neonatal or adult γδT17 subsets.

      See above

      Expand on how HEB function may relate to other hematopoietic or early lymphoid lineages (NK/ILC, DN1-DN3 stages), based on reviewer curiosity.

      See above

      (4) Methods and terminology

      Define the terms γδTe1 and γδTe2 (e.g., early effector subsets).

      This has been defined more clearly in several sections of the text.

      Add details to the scRNA-seq methods section (average number of cells analyzed and sequencing depth per cell).

      These details have been added.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We now performed new experiments that were included in the manuscript. Our new results show that that monocyte-derived dendritic cells primed in vivo during P. chabaudi infection, or in vitro with TNF express high levels or GLUT-1 (Figures 4M, 5D, 6L). Furthermore, our new data show that mice treated with 2-DG (na inhibitor of glycolysis) are more susceptible to infection (Figures 6N, O). In addition, new results of glucose uptake by muscle and adipose tissues were added to the manuscript. Finally, figure legends were revised, densitometric analysis performed, and other issues addressed in the text.

      Please see below a point-by-point reply to the Reviewers’ comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Kely C. Matteucci et al. titled "Reprogramming of host energy metabolism mediated by the TNF-iNOS-HIF-1α axis plays a key role in host resistance to Plasmodium infection" describes that TNF induces HIF-1α stabilization that increases GLUT1 expression as well as glycolytic metabolism in monocytic and splenic CD11b+ cells in P. chabaudi infected mice. Also, TNF signaling plays a crucial role in host energy metabolism, controlling parasitemia, and regulating the clinical symptoms in experimental malaria.

      This paper involves an incredible amount of work, and the authors have done an exciting study addressing the TNF-iNOS-HIF-1α axis as a critical role in host immune defense during Plasmodium infection.

      Reviewer #2 (Public Review):

      Summary:

      The premise of the manuscript by Matteucci et al. is interesting and elaborates on a mechanism via which TNFa regulates monocyte activation and metabolism to promote murine survival during Plasmodium infection. The authors show that TNF signaling (via an unknown mechanism) induces nitrite synthesis, which (via yet an unknown mechanism), and stabilizes the transcription factor HIF1a. Furthermore, HIF1a (via an unknown mechanism) increases GLUT1 expression and increases glycolysis in monocytes. The authors demonstrate that this metabolic rewiring towards increased glycolysis in a subset of monocytes is necessary for monocyte activation including cytokine secretion, and parasite control.

      Strengths:

      The authors provide elegant in vivo experiments to characterize metabolic consequences of Plasmodium infection, and isolate cell populations whose metabolic state is regulated downstream of TNFa. Furthermore, the authors tie together several interesting observations to propose an interesting model.

      Weaknesses:

      The main conclusion of this work - that "Reprogramming of host energy metabolism mediated by the TNF-iNOS-HIF1a axis plays a key role in host resistance to Plasmodium infection" is unsubstantiated. The authors show that TNFa induces GLUT1 in monocytes, but never show a direct role for GLUT1 or glucose uptake in monocytes in host resistance to infection (nor the hypoglycemia phenotype they describe).

      We kindly disagree with the Reviewer. There is a series of experiments showing that TNFR KO (Figures 1, 2, 4), HIF1a KO (Figure 5) and iNOS KO (Figure 6) mice have partially impaired inflammatory response and control of parasitemia (Figures Figures 1E, 5G and 6B).

      To further address the issue raised by the reviewer, we performed two sets of experiments. First, we show, in vitro, the impact of TNF stimulation on GLUT1 expression and glucose uptake (Figure 4M, 5D, 6L). Our results show that GLUT1 is increased after 18 hours with TNF (100 ng/mL) stimulation in MODCs from WT mice but not from iNOS KO, HIF1a KO e TNFR KO mice. Similar results were obtained with monocytic cells derived from infected mice (Figure 4L, 5C, 6K). The results support the discussion by demonstrating that TNF stimulation influences GLUT1 expression in monocytic cells. This aligns with the proposed mechanism that TNF signaling regulates HIF-1α stabilization and glycolytic metabolism via RNI. The absence of GLUT1 upregulation and glucose uptake in TNFR KO, iNOS KO and HIF-1α KO mice further reinforces the role of RNI in promoting HIF-1α stabilization, as suggested in the discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points

      All Figure legends are not precise about the data express means {plus minus} standard errors of the means (SEM) or SD. Figure 1D shows no SD in the data from the uninfected group. It strongly suggests precise and improving all figure legends, giving more details in terms of including an explanation of all symbols, non-standard abbreviations, error bars (standard deviation or standard error), experimental and biological replicates, and the number of animals, and representative of the independent experiments.

      We apologize for the lack of details in the Figure legends. As requested, we are now indicating whether we used SEM or STDV, number of mice per group, number of replicate experiments. We also clarified the groups that are being compared, and the statistical significance indicated by the symbols. We also standardized symbols as asterisk only, and number of asterisk indicating the significance.

      Figure 1. The figure legend has no information about the organ for which TNF mRNA was measured (Figure 1D). Also, regarding the TNF data, Figure 1 C e 1D shows that the circulating levels of TNF and the expression of TNF mRNA in the liver peaked at the same time point, and after 6h, there is no difference between infected and uninfected mice. It would be expected that the TNF mRNA expression would be detected earlier than the protein, assuming that the primary source of TNF is from the liver. Is there another organ that could mainly source blood TNF levels? Did the authors have a chance to measure the blood TNF levels during infection (0-8dpi), besides the measurement at different times only on day 8?

      We included in the legend of Figure 1D that mRNA was extracted from liver.

      Liver and spleen are the main reservoir of infected erythrocytes and the main source of cytokines during the infection with the erythrocytic stage of malaria. The results presented in Figures 1C and 1D are from in vivo experiments, not a controlled cellular experiment in vitro. So, we can not conclude about exact time and synchronous production of TNF mRNA and protein. We have published earlier that during P. chabaudi infection, the peaks of TNF mRNA expression and the levels of circulating TNF protein occur between midnight and 6 am (Hirako at al., 2018). Hence the results are consistent in the results described here. In addition, this earlier study also shows that the same pattern of TNF at days 6 and 8 post-infection are similar. Furthermore, in another studies, we reported that the peak of TNF production occurs between days 6 and 10 post P. chabaudi infection (Franklin et al, PNAS, 2009; Franklin et al, Microbes and Infection, 2007). This is now clarified in the text (page 05, line 132):

      “As previously demonstrated, the circulating levels of TNF and expression of TNF mRNA in the liver peaked at 6 am (end of dark cycle) at 8 dpi (Figure 1C and 1D), and has been reported to peak between days 6 and 10 post-infection, with a consistent pattern observed on days 6 and 8.”

      Figure 2. "We observed that in naïve animals, all of these parameters were similar in TNFR<sup>-/-</sup> and C57BL/6 mice (Figures 2A-D, top panels, and Figures 2E-H)." Interestingly, the respiratory exchange rate of TNFR<sup>-/-</sup> uninfected mice seems higher in TNFR<sup>-/-</sup> uninfected mice than in naïve uninfected mice, and this pattern seems to be more pronounced in TNFR<sup>-/-</sup> uninfected mice. Is there any suggestion that could explain the change in respiratory exchange rate behavior without infection in those animals?

      At the moment, we have not investigated the basis of this difference between uninfected WT and TNFR KO mice, which goes beyond the scope of this research. This is indeed an interesting observation that should be pursued in the future by our group and elsewhere. We mentioned this difference, when describing the results (page 06, lines 155):

      “We observed that in naïve animals, all of these parameters were similar in TNFR<sup>-/-</sup> and C57BL/6 mice (Figures 2A-D, top panels and Figures 2E-H), with a slightly higher respiratory exchange rate in uninfected TNFR<sup>-/-</sup> mice. In contrast, all the evaluated parameters were decreased in infected C57BL/6 mice compared to their naïve counterparts during the light and dark cycles. When we analyzed only infected mice, the alterations in all parameters were milder in TNFR<sup>-/-</sup> compared to C57BL/6 mice (Figures 2A-D bottom panels and 2E-H).”

      Figure 3. To give an idea of the main population of non-parenchymal cells, it will be helpful to clarify briefly how non-parenchymal cells from the liver of infected or uninfected mice were isolated.

      We described in detail at Material and Methods (Page 19, Lines 566.)

      Figure 3, B, C, D, G and Figure 4K and Figure 5 A and B - Semi-quantitative data through the densitometric analysis of western blots should be included in all figures.

      Thank you for the suggestion. We now included the densitometric analysis for all Western blot results in Supplementary figure.

      Figure 4. The author describes, "We observed that except for Hexokinase-3, the expression of mRNAs of glycolytic enzymes (Hexokinase-1, PFKP, and PKM) was increased in C57BL/6 but not TNFR-/- 8dpi." Sometimes, it is hard to understand which groups have been compared to some data. Be precise in describing the statistical analysis between the groups. It seems that those genes were increased in "infected C57BL/6 in comparison to uninfected mice, but not TNFR-/- 8-dpi. Moreover, even though the authors include statistic symbols "ι, ιι, ιιι" in other legends, there is no explanation about statistic symbols in the legend of Figure 4.

      As mentioned above, we improved the descriptions of all figures in the legend, and when necessary in the main text describing the results.

      Figure 5. The authors describe, "We found that GLUT1 protein and glycolysis (ECAR) was impaired, respectively, in monocytic cells and splenic CD11b+ cells from infected, as compared to uninfected HIF-1aΔLyz2 mice (Figures 5C-5E)." The GLUT-1 expression was inhibited in both cells compared to HIF-1afl/fl mice but not even close to impaired GLUT-1 expression. There is still a robust amount of GLUT-1 expression, and significantly higher when compared to cells from uninfected mice.

      We tuned our statement to partially impaired, indicating that other host or parasite components maybe be also influencing GLUT-1 expression. In fact, we have recently published that IFNγ has also an important role in regulating GLUT1 expression in MO-DCs and this reference is mentioned in the text (page 10, line 291):

      “We found that glycolysis (ECAR) and GLUT1 expression were impaired, though partially, in monocytic and splenic CD11b+ cells from infected HIF-1aΔLyz2 mice (Figures 5C-5E) compared to infected WT mice. The level of GLUT1 expression that is still maintained is likely due to other host or parasite factors, such as IFN-γ (Ramalho 2024).”

      Figure 6. It is essential to have more information about the number of replicates in Figure 6A. However, there are just two dots replicates in the condition CD11b+ splenic cells from C57BL/6 stimulated with or without LPS (purple bars). It is essential to be precise regarding the number of experimental and biological replicates in each experiment and the statistical analysis that has been applied, including this group. Furthermore, the author concludes, "...these data demonstrated that RNI induces HIF-1α expression...." This conclusion needs a more careful description since no data supports that monocytic cells or splenic CD11b+ cells from iNOS-/- infected mice decrease stabilization of HIF-1αm using blotting, as shown in Figure 5 A.

      As mentioned above the number of replicates for each experiment was included in the figure legends.

      Minor Points.

      Figure 3. "Hepatocytes have an important role in glucose uptake from the circulation, and they do this primarily through GLUT2 (38), whose mRNA expression was downregulated (Figure 3A) and protein expression unchanged in response to Pc infection (Figure 4K)." I suggest moving the Figure 4K to Figure 3 to make it easy to follow the data description.

      We thank the reviewer for the suggestion. However, we chose to keep Figure 4K in Figure 4, as this panel includes data from TNF receptor deficient mice, and the analysis of TNF knockout models is first introduced and discussed in Figure 4. For clarity and consistency, we therefore maintained this panel within Figure 4.

      Line 433. Replace iNOS for iNOS-/- mice.

      iNOS is now replaced for iNOS-/- mice.

      Reviewer #2 (Recommendations For The Authors):

      The premise of the manuscript by Matteucci et al. is interesting and elaborates on a mechanism via which TNFa regulates monocyte activation and metabolism to promote murine survival during Plasmodium infection. The authors show that TNF signaling (via an unknown mechanism) induces nitrite synthesis, which (via yet an unknown mechanism), and stabilizes the transcription factor HIF1a. Furthermore, HIF1a (via an unknown mechanism) increases GLUT1 expression and increases glycolysis in monocytes. The authors demonstrate that this metabolic rewiring towards increased glycolysis in a subset of monocytes is necessary for monocyte activation including cytokine secretion, and parasite control.

      The main goal of this work is to study the interplay of TNF/HIF1a/iNOs in the pathogenesis in an experimental model of malaria. To dissect the molecular mechanism by which TNF induces reactive nitrogen species and regulates HIFa expression is beyond the scope of our research. Nevertheless, there is a vast literature addressing these issues. We now include in the discussion a paragraph describing the main conclusion of these studies published previously (page 12, line 363):

      "Previous studies have shown that TNF induces the production of RNI through the upregulation of iNOS via the NF-κB pathway (63, 64). TNF-mediated iNOS expression is critical for NO production, which in turn stabilizes HIF-1α by inhibiting prolyl hydroxylases (PHDs) even under normoxic conditions (58, 59). HIF-1α then upregulates the expression of glycolytic genes, including GLUT1 (22, 62).”

      Major comments

      Issues concerning novelty

      Some of the reported observations are not novel. TNFa and TNFa signaling has been demonstrated to contribute to the release of certain cytokines, and to contribute to the control parasitemia (PMID: 10225939). TNFa has been shown to increase glucose uptake in tissues (PMID: 2589544). There is a textbook about the role of INOS during the pathogenesis of malaria, including its association with parasite control (https://link.springer.com/chapter/10.1007/0-306-46816-6_15). Furthermore, other mechanisms controlling glycemia during Plasmodium infection have been shown (PMID: 35841892). The authors should adequately discuss other papers which have reported some of their findings.

      Thanks for the comments on previously existing literature. We are well aware of some of this earlier literature. Some of these earlier findings are mentioned in our manuscript. We emphasized these fundamental findings in the discussion, as requested (page 12, line 368):

      “TNF has been described as a critical mediator in malaria, driving cytokine release and parasitemia control (PMID: 10225939). It also enhances glucose uptake in tissues, aligning with our findings of increased glycolysis in monocytes (PMID: 2589544). The role of iNOS in malaria is well documented. IFN-γ and TNF induced the production of NO, which inhibits parasite growth but can cause tissue damage and organ dysfunction, especially in severe malaria (Mordmüller et al., 2002). Recent studies also highlight the complexity of glycemia regulation during Plasmodium infection describing its role in modulating parasite virulence and transmission (PMID:35841892). These studies demonstrate the critical function of TNF and iNOS in immune responses against Plasmodium, aligning with our findings of this axis and metabolic rewiring that are essential for monocyte activation and outcome of Pc infection.”

      The authors claim that "Reprogramming of host energy metabolism mediated by the TNF-iNOS-HIF1a axis plays a key role in host resistance to Plasmodium infection," and contributes significantly to their effector functions (particularly parasite clearing), and the systemic drop in glycemia observed during Pc infection. Although the authors show that TNFa does result in altered metabolism and increased GLUT1 levels in a subpopulation of monocytes, the evidence that TNFa-induced glylcolysis plays a key role in host resistance is correlative at best.

      This is an important question. We did show that TNFR KO have higher parasitemia. But TNF is pleiotropic cytokine and has multiple roles on innate and acquired immunity. The experiment we have performed and helps to address this issue is the in vivo treatment with 2DG. We found that treatment with this inhibitor of glycolysis results in a increase of parasitemia. These results are now included in Figure 6.

      When considering that the majority of monocytic populations are reduced in frequency and only a small subset (i.e., Monocyte-derived DCs) increase in frequency (Fig 3K) during Pc infection, this makes it very difficult to demonstrate that a cell population whose overall frequency reduces contributes significantly to the drop in glycemia during Pc infection. The authors should therefore include experiments that demonstrate that the inhibition of glycolysis induced by TNFa in monocytes is protective and/or contributes to a decrease in extracellular glucose. The authors could assess the impact of the loss of function of GLUT1 on activated monocytes and monocyte-derived DCs on glycemia upon TNFa stimulation.

      We agree. We focused on monocytes and the derived inflammatory monocytes and MO-DCs. In fact, the frequency of monocytes, considering the inflammatory monocytes and MO-DCs, is increased both in spleen and liver. One interesting result is that the HIF1a Lysm KO mice has impaired metabolism, attenuated hypoglycemia and increased parasitemia (Figure 5). Nevertheless, we agree that our current data thus not proof that the glycemia is due to the consumption of glucose by the activated monocytes, and that these are the only cells with increased glucose consumption. This is now added to the discussion (page 13, line 395):

      "Although the frequency of MO-DCs increases during infection, other cell populations may also contribute to glucose consumption. Further experiments, including the assessment of GLUT1 function in these populations, are needed to clarify their contribution to glucose consumption during infection."

      Furthermore, in the current state of the manuscript, it is unclear how activated monocyte populations uptake glucose. The authors claim that glucose uptake by activated monocytes is GLUT1-dependent, however, glucose transport via GLUT1 is insulin-dependent. Since Plasmodium infection is associated with insulin resistance, and almost unquantifiable levels of insulin (PMID: 35841892), and TNFa itself induces insulin resistance (PMCID: PMC43887), it is unclear how the activated monocyte population uptakes glucose. If the authors consider TNFa to be sufficient for GLUT1 induction, in vitro experiments (TNFa+monocytes) could bolster this claim (and support that GLUT1 is induced in an insulin-independent mechanism.

      There is significant evidences indicating that in contrast to GLUT4, induction of GLUT1 in mice is independent of insulin (PMID: 9801136). In our case, seems to be induced by the cytokines TNF and IFN𝛾(this study and Ramalho et al., 2024). We now performed experiments exposing monocytes to TNF and evaluating GLUT1 expression. The results indicate that monocytes exposed to TNF (100 ng/mL) for 18 hours from WT mice exhibited a significant increase in GLUT1 expression. This increase was comparable to the increased-GLUT1 phenotype observed in infected animals. The results of this experiment were included in the manuscript.

      A text was included to the discussion to clarify the issue of insulin dependence of GLUT1 expression (page 13, line 388):

      “GLUT1 expression is recognized as independent of insulin, in contrast to GLUT4 (PMID: 9801136). In our model, this regulation appears to be driven by pro-inflammatory cytokines, particularly TNF. Supporting this, our results show that in vitro stimulation with TNF, significantly increases GLUT1 expression in monocytes, accordingly to the ex vivo phenotype observed in infected animals.”

      Alternative hypothesis which might explain their phenotypes

      Figure 2 A-H: The metabolic effects of the genetic manipulations including INOS KO, TNFR KO, and HIF-1α∆Lyz2 could be explained by lesser disease morbidity owed to a reduction of inflammatory response during infection. Under this condition, the development of anorexia will not be as profound in the knock-outs compared with wild-type littermate controls, since anorexia of infection is tightly linked to the magnitude of inflammatory response. Accordingly, infected knock-out animals can keep eating, which presumably impacts glycemia, maintenance of core body temperature, and overall energetics of infected mice. The authors should exclude this possibility.

      We consider this possibility and the discussion now elaborates about this alternative hypothesis. We believe, that these two mechanisms are not mutually exclusive (page 16, line 474):

      “Although restored physical activity, food consumption and energy expenditure in knockout mice may contribute to the observed systemic metabolic parameters by altering energy balance, these effects are not mutually exclusive with the TNF-driven, cell-intrinsic metabolic mechanisms described here.”

      Minor comments

      The authors showed increased parasitemia upon TNFR and HIF1a depletion in the LyZ2 compartment. The same was observed upon organismal INOS depletion. This raises the question of whether the TNFHIF-INOS signaling axis is adaptive or maladaptive during Pcc infection. The authors should show host survival in mice lacking TNFR and HIF1a in the LyZ2 compartment, and in mice lacking INOS (presumably, they have these data).

      Despite the fact the various knockout mice have increased parasitemia and signs of disease, they all survive the infection. This is now included in the Figure legends.

      Are the higher tissue glucose levels specific to the liver and the spleen or this is a more general event? Have the authors looked at other organs?

      We now added the results of glucose uptake in the muscle and adipose tissues in figure 2. The fact that the glucose uptake is not increased in muscle and adipose tissue, further suggest that the increased glucose uptake in this model is insulin independent.

      Figure 1F: All core body temperatures are within the physiological range, i.e., >36 degrees C. This makes it unclear why the authors regarded this as hypothermia. The authors should present experiments demonstrating the development of hypothermia in Figure 1F, as they claim this.

      Temperature changes in mouse kept in animal house have been an issue discussed in the field. It is clear, however, that early in the morning (end of active period) mice have torpor. Lower temperature and physical activity.

      In Figure 4, since the authors already suggested that extra-hepatic cells, and not the liver parenchyma, contribute to glucose uptake, the authors should clarify why they analyzed the whole liver in Figure 4, and not extra-hepatic cells. Furthermore, the authors should quantify the hepatic monocytic population in non-infected versus infected wild-type animals.

      The reason we used whole liver, is that the number of non-parenchymal cells obtained from liver is limited for Western blot analysis. We thought that was important to show that expression of GLUT1 was decreased in the liver of TNFR KO mice. Nevertheless, the level of TNFR expression in different cell types in the liver was shown by flow cytometry. In addition, we performed the WB with cells extracted from the spleen, where lymphoid and myeloid cells are more abundant.

      Line 87: Phagocytizing parasitized what?

      This has been corrected in the manuscript.

      Line 111 Define RNI before being used.

      Is there a gender disparity in the TNFR KO phenotype? If yes, the authors should comment about this in their discussion.

      This has been defined and addressed in the manuscript

      Line 192: Did the authors mean 3B??

      In 3M, please plot monocytes from uninfected animals.

      The plot of uninfected animals are now included in Figure 3M

      Line 390 Remove the extra dash in HIF1a.

      Extra dash has been removed.

      Line 397 Define RA

      RA is now defined.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful reading of our manuscript and for the constructive and insightful feedback. In response, we performed several new experiments and analyses that significantly strengthen the study. First, we addressed the important question of optoLARG recruitment dynamics by generating a new cell line expressing optoLARG-mScarlet3 together with paxillin-miRFP, enabling us to directly quantify the dynamics of the optogenetic activator at focal adhesions and the plasma membrane. Second, we introduced a quantitative modeling framework to analyze RhoA activity dynamics during transient optogenetic stimulation. Using the measured optoLARG kinetics as input, we fitted activation and deactivation parameters for both WT and DLC1 KO cells, revealing a loss of negative feedback regulation in the KO condition. Together, these additions clarify the temporal relationships between optogenetic activation, RhoA signaling, and biosensor responses, and provide a more rigorous, mechanistic interpretation of our data. We rewrote large parts of the discussion section to reflect this new information.

      Below, we provide detailed, point-by-point responses to all reviewer comments.

      Recruitment dynamics optoLARG

      Reviewer #1:

      Public Review:

      For the optogenetic experiments, it is not clear if we are looking at the actual RhoA dynamics of the activity or at the dynamics of the optogenetic tool itself.

      Recommendations for the authors:

      For the transient optogenetic activations at FA and PM, it would be great to have one data set where the optoLARG is fused to a fluorescent protein, for example, mCherry, while FAs would be marked with paxillin-miRFP (by transient transfection to avoid making a new stable cell line). The dynamics of the optogenetic activator should be the same (on and off rates), but it can be possible that the activator is retained at FA for example. Such an experiment would help the understanding of the differential observed dynamics, where several timescales are involved: the dynamics of the opto tool, the dynamics of RhoA itself, and the dynamics of the biosensor.

      We agree with the reviewers, this is an essential control for this manuscript and the cell line will be useful in future studies. We developed a new construct containing with the recruitable SSpB domain tagged in red (optoLARG-mScarlet3) compatible with the iLid system, and paxilin-miRFP to locate the focal adhesions. From previous experiments we know that the anchor part of optoLARG system is distributed evenly across the cell membrane and is not affected by cytoskeletal structures like focal adhesions. As for the recruitable part of the optoLARG system, that translocates from the cytosol to the membrane upon blue light stimulation, we illuminated focal adhesion and non-focal adhesion regions, and quantified optoLARG dynamics. The same scripts were used for automated stimulation and analysis as were used for the rGBD recruitment experiments. We illustrate these results in the new Suppl. Fig S3. We found no significant difference in recruitment dynamics between focal adhesion/non-focal adhesion regions (Fig. S3B). We found the optoLARG dynamics fits well with inverse-exponential during recruitment under blue light stimulation, and exponential decay after blue light stimulation (disassociation phase), consistent with the expected iLID dynamics (Fig S3C). This experiment is described in detail at the end of the section "Optogenetic interrogation of the Rho GTPase flux in WT and DLC1 KO cells" (Lines 303-320). We then went on to use the optoLARG dynamics as input for the models describing RhoA activity dynamics (see next comment). This should help to untangle the measured RhoA dynamics from the dynamics of the optogenetic tool.

      Quantitative analysis RhoA activity dynamics

      Public Review:

      There is no model to analyze transient RhoA responses, however, the quantitative nature of the data calls for it. Even a simple model with linear activation-deactivation kinetics fitted on the data would be of benefit for the conclusions on the observed rates and absolute amounts.

      Recommendations for the authors:

      [...] for the transient optogenetic experiments, it would be great to make a simple model, or at least to fit the curves with an on rate, an off rate, and a peak value. This will clarify the conclusions drawn for the experiments. For example, the authors claim that they observe an increased Rho activation rate in DLC1 KO cells (see sections "Optogenetic interrogation of the Rho GTPase flux in WT and DLC1 KO cells" and "Discussion") but the rate is not well-defined. One can have two curves with the same activation rate but one that peaks higher (larger multiplicative prefactor) and it would resemble the presented data. This being said, the higher deactivation rate in DLC1 KO cells is evident from the data.

      We agree that a quantitative analysis and model would improve our understanding of the data. We fit the activation/deactivation kinetics and provide the values in the chapter "Optogenetic interrogation of the Rho GTPase flux in WT and DLC1 KO cells" (Lines 287-299). We then modeled the RhoA activity dynamics at focal adhesions and at the plasma membrane after transient optogenetic stimulation using a system of ODEs, using the new measurements of optoLARG kinetics as activation input. We find a close fit for the experimental data, with WT following classic Michaelis-Menten dynamics. Interestingly, when fitting the DLC1-KO data with the same model as for WT, the parameter modeling the negative feedback loop (active RhoA recruiting a GAP) is set to zero; in other words, the factor that deactivates RhoA is present at a constant concentration. We added an additional main Figure 5 describing the models and fits, and added a new Results section "Modeling indicates loss of negative RhoA autoregulation in DLC1-KO cells" (Lines 326-378), and also updated the Methods and Discussion section of the paper accordingly. We use the findings to more clearly ground the mathematical terms used to describe our results.

      Error figure 6E

      Recommendations for the authors:

      The scheme presented in Figure 6E is not supported by the data and should be modified. In this scheme, the authors show a strongly delayed peak in control cells versus DCL1 KO cells, whereas in the data the peaks appear to be at similar time points. Similarly, the authors show a strongly decreased rate of activation, whereas the initial rates appear identical in the data.

      The delayed peak we illustrated is an error, we thank the reviewers for catching it. The decreased rate of deactivation and activation, although exaggerated in the scheme, is however present in the data (and is now quantified, see answer above). We updated the figure accordingly (now Fig. 7E in the manuscript).

      Clarification term "signaling flux"

      Recommendations for the authors:

      It would be nice to define more precisely several terms that are used throughout the manuscript. For example, could the authors define what they mean by "signaling flux"? Is it the temporal derivative of the Rho levels? Or the spatial derivative?

      We agree that this was not clear in the previous version of the manuscript. We refer to "signaling flux" as the continuous cycle of RhoA activation by GEFs and inactivation by GAPs, processes that persist even when bulk RhoA activity appears steady, as introduced by Miller & Bement (2009). We now explicitly define "signaling flux" in the abstract (Lines 20-24).

      See: Miller, Ann L., and William M. Bement. "Regulation of cytokinesis by Rho GTPase flux." Nature cell biology 11.1 (2009): 71-77. https://doi.org/10.1038/ncb1814

      Recommendations for the authors:

      Also (see above) it would be nice to define precisely what are the rates: the activation rate is in general the k_on of a reaction scheme, but it will differ from the observed rate given by a biosensor. For example, with a k_on and a k_off the observed rate toward the steady-state will be given by the sum of the activation and deactivation rates. In the manuscript, the authors do not make the distinction between the activation rate with the rate of increase of the biosensor which is confounding for the reader and for the interpretation of the data.

      We update the results section to make this distinction more clear (Lines 288-300), and add a note explicitly highlighting the difference between biosensor signal dynamics and the underlying RhoA activation/deactivation rates (Lines 298-300). In addition, our newly introduced model helps disentangle the combined activation/deactivation rates into distinct GEF and GAP activity parameters.

      Improvements to figure 3

      Minor recommendation:

      In Figures 3 B and D, the stars (statistical differences) are not visible. It would be good to make them bigger or move them above the graphs.

      Thank you! We updated the graphics.

      Other changes

      Additional panel (Figure 5D) showing paxillin intensity does not change after weak optogenetic stimulation, to better illustrate the weak stimulation regime that does not trigger FA reinforcement (contrasting Figure 7). Additional small layout changes to Figure 5.

      Addition of authors that contributed to the revisions

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study represents an important advance in our understanding of how certain inhibitors affect the behavior of voltage gated potassium channels. Robust molecular dynamics simulation and analysis methods lead to a new proposed inhibition mechanism with strength of support being mostly convincing, and incomplete in some aspects. This study has considerable significance for the fields of ion channel physiology and pharmacology and could aid in development of selective inhibitors for protein targets 

      We are encouraged by this favorable assessment and thank editors and reviewers for their constructive feedback and recommendations. We trust that the revisions made to the manuscript will clarify the aspects that had been perceived to be incomplete.

      Reviewer #1 (Public review):

      Summary: 

      This study seeks to identify a molecular mechanism whereby the small molecule RY785 selectively inhibits Kv2.1 channels. Specifically, it sought to explain some of the functional differences that RY785 exhibits in experimental electrophysiology experiments as compared to other Kv inhibitors, namely the charged and non-specific inhibitor tetraethylammonium (TEA). This study used a recently published cryo-EM Kv2.1 channel structure in the open activated state and performed a series of multi-microsecond-long all-atom molecular dynamics simulations to study Kv2.1 channel conduction under the applied membrane voltage with and without RY785 or TEA present. While TEA directly blocks K+ permeation by occluding ion permeation pathway, RY785 binds to multiple nonpolar residues near the hydrophobic gate of the channel driving it to a semi-closed non-conductive state. This mechanism was confirmed using an additional set of simulations and used to explain experimental electrophysiology data.

      Strengths:

      The total length of simulation time is impressive, totaling many tens of microseconds. The study develops forcefield parameters for the RY785 molecule based on extensive QM-based parameterization. The computed permeation rate of K+ ions through the channel observed under applied voltage conditions is in reasonable agreement with experimental estimates of the singlechannel conductance. The study performed extensive simulations with the apo channel as well as both TEA and RY785. The simulations with TEA reasonably demonstrate that TEA directly blocks K+ permeation by binding in the center of the Kv2.1 channel cavity, preventing K+ ions from reaching the SCav site. The conclusion is that RY785 likely stabilizes a partially closed conformation of the Kv2.1 channel and thereby inhibits the K+ current. This conclusion is plausible given that RY785 makes stable contact with multiple hydrophobic residues in the S6 helix. This further provides a possible mechanism for the experimental observations that RY785 speeds up the deactivation kinetics of Kv2 channels from a previous experimental electrophysiology study.

      Weaknesses:

      The study, however, did not produce this semi-closed channel conformation and acknowledges that more direct simulation evidence would require extensive enhanced-sampling simulations. The study has not estimated the effect of RY785 binding on the protein-based hydrophobic pore constriction, which may further substantiate their proposed mechanism. And while the study quantified K+ permeation, it does not make any estimates of the ligand binding affinities or rates, which could have been potentially compared to the experiment and used to validate the models. 

      As stated in the original manuscript, we concur that the mechanism we propose remains hypothetical until further studies of the complete conformational cycle of the channel are conducted. The recently determined structure of a Kv2.1 channel in the closed state (Mandala and MacKinnon, PNAS 2025) presents an excellent opportunity to do so. Indeed, a cursory analysis of that structure shows that a Pro-Ile-Pro motif in helix S6 marks the position of the intracellular gate, where the pore domain constricts maximally (aside from the selectivity filter). As illustrated in Fig. 5, this motif is precisely where the benzimidazole and thiazole moieties of RY785 bind in our simulations. The mechanism we outline in Fig. 7 thus seems very plausible, in our view; that is RY785 occludes the K<sup>+</sup> permeation pathway before the pore domain reaches the closed conformation, explaining the observed electrophysiological effects (see Discussion). The Discussion has been revised to note the recent discovery of the aforementioned structure, its implications for the mechanism we propose, and the opportunities for further research that are now open.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Zhang et al. investigate the conductivity and inhibition mechanisms of the Kv2.1 channel, focusing on the distinct effects of TEA and RY785 on Kv2 potassium channels. The study employs microsecond-scale molecular dynamics simulations to characterize K+ ion permeation and compound binding inhibition in the central pore. 

      Strengths:

      The findings reveal a unique inhibition mechanism for RY785, which binds to the channel walls in the open structure while allowing reduced K+ flow. The study also proposes a long-range allosteric coupling between RY785 binding in the central pore and its effects on voltage-sensing domain dynamics. Overall, this well-organized paper presents a high-quality study with robust simulation and analysis methods, offering novel insights into voltage-gated ion channel inhibition that could prove valuable for future drug design efforts.

      Weaknesses:

      (1) The study neglects to consider the possibility of multiple binding sites for RY785, particularly given its impact on voltage sensors and gating currents. Specifically, there is potential for allosteric binding sites in the voltage-sensing domain (VSD), as some allosteric modulators with thiazole moieties are known to bind VSD domains in multiple voltage-gated sodium channels (Ahuja et al., 2015; Li et al., 2022; McCormack et al., 2013; Mulcahy et al., 2019).

      As noted in the manuscript, we designed our simulations to explore the possibility that RY785 binds within the pore domain, because TEA and RY785 are competitive and TEA is known to bind within the pore. That RY785 did in fact spontaneously and reproducibly bind within the pore was however not a predetermined outcome; if the site of interaction for the inhibitor was elsewhere in the channel, the simulation would not have shown a stable associated state, which would have prompted us to examine other possible sites, including the voltage sensors. It was also not predetermined or foreseeable a priori that the mode of interaction we observed in simulation provides a straightforward rationale for the electrophysiological effects of RY785. Based on our results, therefore, we believe that RY785 binds within the pore of Kv2. As stated by the reviewer, other allosteric modulators are known to bind instead to the sensors; to our knowledge, however, there is no precedent of a small-molecule inhibitor that simultaneously acts on the sensors and the pore domain. We therefore believe that future studies should focus on corroborating or refuting the mechanism we propose, through additional experimental and computational work; if, contrary to our claim, RY785 is found not to bind to the pore domain, it would be logical to explore other possible sites of interaction, as the reviewer suggests. The Discussion has been modified to address this point.

      (2) The study describes RY785 as a selective inhibitor of Kv2 channels and characterizes its binding residues through MD simulations. However, it is not clear whether the identified RY785-binding residues are indeed unique to Kv2 channels.

      To clarify this question, we have included a multiple sequence alignment as Supplementary Figure 1; the revised manuscript refers to this figure in the Discussion section. The alignment reveals that the cluster of residues forming contacts with RY785 (Val409, Pro406, Ile405, Ile401, and Val398) is indeed specific to Kv2.1. Among Kv channels, Kv3.1 and Kv4.1 exhibit the greatest similarity to Kv2.1 at these positions, but they differ in a crucial substitution: Ile405 in Kv2.1 is replaced by Val. This replacement shortens the sidechain, undoubtedly reducing the magnitude of the hydrophobic interaction between inhibitor and channel (Val is approximately 6 kcal/mol, i.e. 1,000 times, more hydrophilic than Ile). Kv5.1 differs from Kv2.1 at two positions: Pro406 is replaced by His, and Val409 by Ile. The introduction of His abolishes the hydrophobic interaction at that position, and the need for hydration likely perturbs all adjacent contacts with RY785. Lastly, Kv6-Kv10 and Cav channels feature entirely different residues at these positions. Consistent with these findings, a recent study by the Sack lab (https://elifesciences.org/articles/99410) has demonstrated that Kv5, Kv6, Kv8, and Kv9 pore subunits confer resistance to RY785, while a high-throughput electrophysiological study carried out by Merck (Herrington et al., 2011) reported that RY785 shows no significant activity against Cav channels. The sequence alignment offers a simple interpretation for these experimental observations, namely that RY785 is recognized by Kv2 channels through the abovementioned hydrophobic cluster within the pore domain.

      (3) The study does not clarify the details, rationale, and ramifications of a biasing potential to dihedral angles.

      We refer the reviewer to published work, for example Stix et al, 2023 and Tan et al, 2022. We provide additional comments below.

      (4) The observation that the Kv2.1 central pore remains partially permeable to K+ ions when RY785 is bound is intriguing, yet it was not revealed whether polar groups of RY785 always interact with K+ ions.

      We detected no persistent specific interactions between RY785 and the permeant K+ ions.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The manuscript describes atomistic molecular dynamics (MD) simulations of a voltage-gated potassium channel Kv2.1 using its cryo-EM structure in the open activated state and its inhibition by a classical non-specific cationic blocker tetraethylammonium (TEA) as well as a novel selective inhibitor RY785. Using multi-microsecond-long all-atom MD runs under the applied membrane voltage of 100 mV the authors were able to confirm that the channel structure represents an open conducting state with the computed single-channel conductance lower than experimental values, but still in the same order of magnitude range. They also determined that both TEA and RY785 bind in the channel pore between the cytoplasmic hydrophobic gate and narrow selectivity filter (SF) region near the extracellular side. However, while TEA directly blocks a knock-on K+ conduction by physically obstructing ion access to the SF, the mechanism of action of RY785 is different. It does not directly prevent K+ access to the SF but rather binds to multiple residues in the hydrophobic gate region, which effectively narrows a pore and drives the channel toward a semi-closed nonconductive conformation, which might be distinct from one with the deactivated voltage sensors and closed pore observed at hyperpolarized membrane potentials. However, additional studies beyond the scope of this work might be needed to fully establish this mechanism as suggested by the authors.

      The manuscript is written very well and represents a significant advance in the field of ion channel research. I do not have any major issues, which need to be addressed. However, I have several suggestions.

      For the apo-channel K+ conduction MD simulation under the applied voltage, the authors seem to observe mostly a direct or Coulomb knock-on mechanism across the SF with almost no water copermeation. This is in line with computational electrophysiology studies with dual membrane setup by B. de Groot and others but in disagreement with multiple previous studies by B. Roux and others also using applied electric field and CHARMM force fields as in the present study. I wonder why the outcomes are so different. Is it related to the Kv2.1 channel itself, a relatively small applied electric field used (corresponding to a membrane potential of 100 mV vs. 500-750 mV used in many previous simulations), ion force field (e.g., LJ parameters), or some other factors? Could weak dihedral restraints on the protein backbone and side chains contribute to this mechanism? I also wonder if the authors might have considered different initial SF ion configurations. Related to that, I wonder if the authors observed any SF distortions in their simulations including frequently observed backbone carbonyl flipping and/or dilation/contraction.

      We are aware of these discrepancies between published simulation studies, but cannot offer a satisfactory explanation, beyond speculation. The reviewer is correct that the mechanism of ion permeation we observe is comparable to that reported by de Groot, as we noted in Tan et al, 2022 and Stix et al, 2023. Neither in this nor in those previous studies did we observe any persistent distortions of the selectivity filter – but that outcome was expected by construction. The weak biasing potentials acting on the mainchain dihedral angles allow for local fluctuations but not a persistent deformation, relative to the conductive form determined experimentally.

      For MD simulations with the ligand present, I wonder if the authors can comment on the effect of the ligand especially RY785 on the pore size or more importantly size of the hydrophobic gate. The presence of the ligand itself would definitely result in a narrower pore, but I also wonder if this would also lead to a rearrangement of pore sidechain and/or backbone residues, which would lead to a narrower pore from a protein itself thus confirming the proposed mechanism of driving the channel towards a semi-closed state. It is easy to compute but I wonder if the presence of weak dihedral restraints may preclude this analysis.

      Yes, while the simulation design used in this study allows for local fluctuations in the mainchain structure and nearly unrestricted sidechain dynamics, changes in either the secondary or tertiary structure of the channel are strongly disfavored. This approach is thus sufficient to examine ligand binding or ion flow in the microsecond timescale but not channel gating. In the revised version of the Discussion, we outline a roadmap for future computational studies of that gating process, on the basis of the open-channel structure we used and the recently determined structure of the closed state.

      The authors state that RY785 does not block K+ ion, but it does significantly slow the rate of K+ ion access to the pore Scav site. Is this not a part of the mechanism for inhibition of the channel? The authors seem to focus on the primary mechanism of inhibition as the RY785 promoting channel closing, but would it not also reduce K+ current in the open state by slowing the rate of K+ entry into the cavity and selectivity filter? The authors should address this point in the text. I am also somewhat confused that in the MD simulations performed by the authors, there is still some K+ conduction with RY785 in the pore, which is not in 100% agreement with electrophysiology experiments. Does it mean that the channel in the simulations has not yet reached that semiclosed state or a reduced K+ conduction is not observed experimentally?

      The salient experimental observation is RY785 abrogates K+ currents through Kv2 channels (Herrington et al, 2011; Marquis et al, 2022). In our view, that observation can be explained in one of two ways: either RY785 completely blocks the flow of K+ ions across the channel while the pore domain remains in the conductive, open state – like TEA does – or RY785 induces or facilitates the closing of the channel, thereby abrogating K+ flow. The fact that we observe K+ flow while RY785 is bound to the channel is therefore not in disagreement with the electrophysiological measurements, but it does rule out the first of those two possible interpretations of the existing experiments. As it happens, the second possible explanation, i.e. that RY785 facilitates the closing of the pore domain, also provides a rationale for another puzzling experimental observation, namely that RY785 shifts the voltage dependence of the currents produced by the voltage sensors as they reconfigure to open or close the intracellular gate.

      Also, I wonder if the authors considered that since there are 4 potential equivalent sites in the pore (although, overlapping) more than one RY785 might be needed to prevent K+ conduction, even though the experimental Hill coefficient of ~1 does not indicate cooperativity.

      Admittedly, our simulation design was based on the premise that only one RY785 molecule might be recognized within the pore. Based on the outcome of the simulations, we are confident that this assumption was valid, as the binding pose that we identified rules out multiple occupancy – which would be indeed consistent with a Hill coefficient of ~1.

      I also wonder if the authors considered estimating ligand binding affinities and/or "on" rates from their simulations to have a more direct comparison with experiments and test the accuracy of their models. There are multiple enhanced sampling techniques allowing to do that, although it can be a study on its own.

      We thank the reviewer for this suggestion, which we will consider for future studies.

      The authors also discussed that they could not study Kv2.1 deactivation in a reasonable simulation time. Indeed it is very challenging but they should cite previous studies e.g. 2012 Jensen et al paper (PMID: 22499946) on this subject. There are structures of Kv channels with the deactivated voltagesensing domains (VSDs) available, e..g of EAG1 channel (PDB 8EP1), although they do not have a domain-swapped architecture. There are structural modeling approaches including AlphaFold, which can be potentially used to get a Kv2.1 structure with deactivated VSDs, and targeted MD, string method etc. can be used to study transition between different states with and without bound ligands.

      As noted, a structure of a Kv2 channel with a closed pore has now been determined experimentally. In the revised Discussion, we comment on what this structure tells us about the mechanism of inhibition we propose, and how it could be leveraged in future studies.

      The authors should be commended for doing a thorough QM-based force field parameterization of RY785. However, a validation of the developed force field parameters is lacking. In terms of QM validation, a gas-phase dipole moment can be compared in terms of direction and magnitude (it's normal to be overestimated to implicitly reflect solvent-induced polarization). If there are any experimental data available for this compound, they can be tested as well.

      We agree with the reviewer that forcefield validation is important, but to our knowledge no experimental data exists for RY785 to compare with, such as hydration free energies. We did however compare the gas-phase dipole moment computed with QM and with the MM forcefield we developed based on atomic charges optimized to reproduce QM interactions with water. The MM model yields a gas-phase dipole moment of 3.94 D, which is 20% greater than the QM dipole moment, or 3.23 D. That deviation is within the typical range for electroneutral molecules (Vanommeslaeghe et al, 2010), and as the reviewer notes, reflects the solvent-induced polarization implicit in the derivation of atomic charges. As shown in Author response image 1, the orientation of the dipole moment calculated with MM (right, blue arrow) is also in good agreement with that predicted with QM (left)

      Author response image 1.

      (1) p. 3 "the last two helices in each subunit" -> "the last two transmembrane helices in each subunit".

      Thanks. Corrected.

      (2) p. 5 "and therefore do not cause large density variations e.g. 100-fold or greater.". I would be more specific here and indicate what are the actual variations in density or free energy encountered and how they are compared e.g. with thermal fluctuations (~kT).

      Thanks. The exact variations in K+ density had been included in the original manuscript, in Fig. 2C, but we failed to refer to this figure at this point in the description of the results. The ion density is plotted in a log scale to facilitate conversion to free-energy units. Corrected.

      (3) p. 6 Figure 1 caption "and along the perpendicular to the membrane" -> "perpendicular to the membrane normal"?. "The channel is an assembly of four distinct subunits (in colors);" -> "The channel is an assembly of four identical subunits (distinct by colors);". I would use the same protein coloring method in panels B and C as was used in panel A.

      Thanks. Corrected as needed.

      (4) p. 6 Figure 2 In panel B I would appreciate a representative complete ion permeation event trace. In panel C caption I would indicate corresponding sites "S0-S4, Scav" for each residue mentioned. I also would not use gray color for site names in the figure.

      We appreciate the suggestion, but believe the figure is clear as is. Panel B is meant to focused on the mechanism of knock-on. Panel A includes numerous complete permeation events. 

      (5) p. 7 Figure 3 caption. Please indicate which atoms of residues T373 and P406 were used to define SF and gate positions. Chemical structures of both TEA and RY785 would be useful. In panels C and F channel interacting residues (if any) would be helpful to show.

      The revised caption clarifies that the positions of T373 and P406 are represented by their carbonalpha atoms. A close-up view of the structures of TEA and RY785 is included in the Supplementary Information section.

      (6) p. 8. Figure 4 caption. Please indicate if N atoms ere used for density maps in panels B and C, and which value of the density was used to show meshes. In panel A please indicate what are the units of the density shown by color maps. 

      The caption has been revised to clarify these questions.

      (7) p. 9 "inside the protein" -> "inside the channel pore".

      Thanks. Corrected.

      (8) p. 10 "which lines the cavity" -> "which lines the water-filled cavity"

      We appreciate the suggestion but believe the wording is clear as is.

      (9) p.10 Fig. 5. It would be helpful to distinguish residues from different chains e.g. by different colors rather than using different colors for different residues. The S atom in RY785 is hard to recognize due to the yellow color used for C atoms. Figure 5B is very confusing. It is not clear what this plot represents. For instance, what does it mean that Pro405 has ~10 contacts in 20% of simulation snapshots? Does it mean 10 C..C/S interactions within 4.5 A? I am not sure what the value of this is. I think a bar or radar chart plot showing % of contacts with one, two, or more residues of each type would be more helpful. 

      Thanks. The revised caption ought to clarify how to interpret the plot.

      (10) p. 12 "Due to its 2-fold molecular symmetry". TEA has a tetrahedral point group or Td symmetry. It has several two-fold rotational axes though. 

      Thanks. Corrected.

      (11) p. 12 "it prevents K+ ions in the cytoplasmic space from destabilizing the K+ ions that reside in the selectivity filter" I am not sure if this statement is entirely accurate as there might be destabilization of a multi-ion SF configuration not ions per see.

      We believe this statement is clear as is.

      (12) p. 13 Fig. 7 caption "includes non-conductive or transiently inactivated states" - I am not sure what "transiently inactivated state" is as inactivation is a specific term used in ion channel research and it does not seem to be explicitly considered in this study.

      A reference has been included in the caption for readers interested in the process of inactivation.

      (13) p. 14 "the net charge of these constructs is thus zero". This would depend on the number of basic and acidic residues in the protein. 

      Yes, it does – and as a result the construct we model has a net zero charge.

      (14) p. 14 I wonder if the protein was constrained or heavily restrained during MARTINI membrane building and equilibration procedure. Otherwise, C-alpha mapping would be problematic and clashes with lipid membrane atoms might take place as well.

      It was indeed. When a protein is simulated using the MARTINI coarse-grained forcefield, its fold must be preserved through a network of strong ‘virtual’ bonds between adjacent carbon-alpha atoms. This is standard practice so we do not believe it requires further explanation.

      (15) p. 15 PME - please spell out and provide reference.

      Corrected.

      (16) p. 15 "with a smooth switching function" - is it a special or standard switching function? Also, was it used for energy or forces? 

      The switching function brings both forces and energies to a value of zero at the cut-off value, smoothly. We refer the reviewer to the NAMD manual for further details.

      (17) p. 15 '𝑘 = 1 𝑘B𝑇.' Please confirm that there is a factor of "1" there, which can be actually skipped if this is the case. 

      The value of k = 1 KBT is correct.

      (18) p. 15. Please cite PMID: 22001851 for the transmembrane electric field application technique.

      Corrected.

      (19) p. 15 "and CHARMM36m" -> "and CHARMM36m force field". 

      Corrected.

      (20) p. 16 "the four proteins subunits" -> "the four protein subunits". 

      Corrected.

      (21) p. 16. Please provide the reference for CGenFF. It's reference 49. 

      Corrected.

      Supporting Information (SI): CGenFF is misspelled in multiple figure captions in the SI. All potential energy scans indicate "angle", but some are bond angles while others are dihedral angles. Using subscripts for atom numbers is confusing and does not match the numbering scheme used in Fig. S1. So, please use the same style of numbering throughout, e.g. C46-C42-N43 (without subscripts). Please label the X and Y axes in Figsures S2-S19 and S21. In Figure S22 please perform a linear regression analysis and/or compute Pearson correlation coefficients and indicate trend lines. Table S1. It would be good to compute RMS or mean unsigned errors to get an idea about accuracy. Also, please indicate if reference QM values were scaled by 1.16 for energies or offset for distances. 

      The Supplementary Information has been corrected. We thank the reviewer for their detailed feedback. 

      Reviewer #3 (Recommendations for the authors):

      (1) The study needs to consider the possibility of multiple binding sites for RY785, particularly given its impact on voltage sensors and gating currents. Specifically, the potential for allosteric binding sites in the voltage-sensing domain (VSD) should be assessed, as some allosteric modulators with thiazole moieties are known to bind VSD domains in multiple voltage-gated sodium channels (Ahuja et al., 2015; Li et al., 2022; McCormack et al., 2013; Mulcahy et al., 2019). Molecular docking and/or MD simulations could quickly test this hypothesis. If this hypothesis is not true, a comprehensive search can exclude such a possibility, which can also confirm the long-range allosteric coupling between RY785 binding in the central pore and voltage-sensing domain dynamics. 

      Please see our response above.

      (2) The authors describe RY785 as a selective inhibitor of Kv2 channels and characterize its binding residues through MD simulations. To support this claim, Figure 5 needs to include a multiple sequence alignment with other Kv channels. This would help demonstrate whether the identified RY785-binding residues are indeed unique to Kv2 channels.

      Please see our response above.

      (3) The study applies a biasing potential to 𝜙, 𝜓, and 𝜒1 dihedral angles. Please clarify:

      (a) Is this potential solely to prevent selectivity filter collapse/degradation, as mentioned in a previous D. E. Shaw Research publication (Jensen et al., 2012)?

      Yes, that is correct.

      (b) If it applies to all amino acids, can this potential prevent other changes, such as in the voltagesensing domain?

      Yes, that is correct.

      (c) What specific "large-scale structural changes" does this potential preclude? 

      For example, it would preclude the spontaneous degradation of the secondary or tertiary structure of the protein. We have revised the Methods section to make these points clearer. 

      (d) Given that such biasing potentials on backbone dihedral angles can decrease conformational flexibility, and considering that Kv channel permeability/conductivity could be highly sensitive to filter flexibility, what insights can you provide about the impact of the force constant k on channel conductivity?

      In previous studies based on an identical methodology (Stix et al, 2023; Tan et al, 2022), we have observed good agreement between calculated and experimental conductance values – at least as good as can be hoped for, when all approximations are considered. Based on the data presented in those studies, we have no reason to believe our methodology inhibits the permeability of the channel, which is logical as the local structural fluctuations required for K+ flow across the selectivity filter are not impaired, by definition. To the contrary, the fact that these weak biasing potentials make the conductive form of the filter the most favorable state in simulation enable a clear-cut analysis of conductance under plausible simulation conditions, both in terms applied voltage and K+ concentration. We refer the reviewer to the abovementioned studies for further details and a discussion of this subject.

      (4) The observation that the Kv2.1 central pore remains partially permeable to K+ ions when RY785 is bound is intriguing. Given the compact nature of the central cavity when RY785 is bound, it would be valuable to investigate whether polar groups of RY785 (e.g., nitrogens from the amide, benzimidazole, and thiazole moieties) always interact with K+ ions. Characterizing these interactions could inform the design of similar compounds with differential modulation effects.

      We examined this possibility and detected no convincing interaction patterns between RY785 and K+ ions – logically, inhibitor and ions are in close proximity while residing concurrently within the pore, but we detected no evidence of specific interactions.

      Minor points:

      It is strongly recommended that the refined force field parameters for RY785 be shared as a separate supplementary file in CHARMM force field format. This addition would be valuable for the scientific community, allowing other researchers to use or compare these parameters in future studies.

      We agree entirely. Upon publication of the VOR for this article the forcefield parameters for RY785 will be made freely available for download at https://github.com/Faraldo-Gomez-Lab-atNIH/Download.

      The study uses a KCl concentration of 300 mM, which exceeds typical intracellular K+ levels. While this may be intentional to enhance K+ permeation probability, a brief justification for this choice should be included in the Methods section.

      Yes, what motivated this choice in this and in our previous studies of K+ channels was the expectation of a greater number of permeation events, for a given simulation length, and therefore greater confidence (i.e. statistical significance) in the observed ion conductance, or in the degree to which it might be inhibited by a blocker. It worth noting that 300 mM KCl, while atypical in the intracellular environment, is often used in electrophysiological studies. The Methods section has been amended to clarify this point.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Persistence is a phenomenon by which genetically susceptible cells are able to survive exposure to high concentrations of antibiotics. This is especially a major problem when treating infections caused by slow growing mycobacteria such as M. tuberculosis and M. abscessus. Studies on the mechanisms adopted by the persisting bacteria to survive and evade antibiotic killing can potentially lead to faster and more effective treatment strategies.

      To address this, in this study, the authors have used a transposon mutagenesis based sequencing approach to identify the genetic determinants of antibiotic persistence in M. abscessus. To enrich for persisters they employed conditions, that have been reported previously to increase persister frequency - nutrient starvation, to facilitate genetic screening for this phenotype. M.abs transposon library was grown in nutrient rich or nutrient depleted conditions and exposed to TIG/LZD for 6 days, following which Tnseq was carried out to identify genes involved in spontaneous (nutrient rich) or starvationinduced conditions. About 60% of the persistence hits were required in both the conditions. Pathway analysis revealed enrichment for genes involved in detoxification of nitrosative, oxidative, DNA damage and proteostasis stress. The authors then decided to validate the findings by constructing deletions of 5 different targets (pafA, katG, recR, blaR, Mab_1456c) and tested the persistence phenotype of these strains. Rather surprisingly only 2 of the 5 hits (katG and pafA) exhibited a significant persistence defect when compared to wild type upon exposure to TIG/LZD and this was complemented using an integrative construct. The authors then investigated the specificity of delta-katG susceptibility against different antibiotic classes and demonstrated increased killing by rifabutin. The katG phenotype was shown to be mediated through the production of oxidative stress which was reverted when the bacterial cells were cultured under hypoxic conditions. Interestingly, when testing the role of katG in other clinical strains of Mab, the phenotype was observed only in one of the clinical strains demonstrating that there might be alternative anti-oxidative stress defense mechanisms operating in some clinical strains.

      Strengths:

      While the role of ROS in antibiotic mediated killing of mycobacterial cells have been studied to some extent, this paper presents some new findings with regards to genetic analysis of M. abscessus susceptibility, especially against clinically used antibiotics, which makes it useful. Also, the attempts to validate their observations in clinical isolates is appreciated.

      Weaknesses:

      Amongst the 5 shortlisted candidates from the screen, only 2 showed marginal phenotypes which limits the impact of the screening approach.

      We appreciate the reviewer’s comments, but we note that 4 out of 5 genes displayed phenotypes concordant with findings of the Tn-Seq data, with katG and pafA, as well as MAB_1456c (during starvation only) and blaR (in rich media only) having decreased survival as shown in Figure 3A-D. We do agree that some of the phenotypes were more modest in a single-mutant context than in the pooled Tn-Seq screen. In addition, several mutants that had modest changes in survival also showed profound defects in resuming growth after removal of antibiotics, with the pafA mutants particularly impaired. (Figure 3 - figure supplement 1).

      While the role of KatG mediated detoxification of ROS and involvement of ROS in antibiotic killing was well demonstrated, the lack of replication of this phenotype in some of the clinical isolates limits the significance of these findings.

      While the role of katG varied among strains, the antibiotic-induced accumulation of ROS was seen in all three strains (Figure 6A). This suggests that in some strains other ROS-detoxification pathways are able to compensate for the loss of katG.

      (Figure 2—figure supplements 1–3)

      Figure 1—figure supplement 1.

      Reviewer #2 (Public review):

      Summary:

      The work set out to better understand the phenomenon of antibiotic persistence in mycobacteria. Three new observations are made using the pathogenic Mycobacterium abscessus as an experimental system: phenotypic tolerance involves suppression of ROS, protein synthesis inhibitors can be lethal for this bacterium, and levofloxacin lethality is unaffected by deletion of catalase, suggesting that this quinolone does not kill via ROS.

      Strengths:

      The ROS experiments are supported in three ways: measurement of ROS by a fluorescent probe, deletion of catalase increases lethality of selected antibiotics, and a hypoxia model suppresses antibiotic lethality. A variety of antibiotics are examined, and transposon mutagenesis identifies several genes involved in phenotypic tolerance, including one that encodes catalase. The methods are adequate for making these statements.

      Weaknesses:

      The work can be improved by a more comprehensive treatment of prior work, especially comparison of E. coli work with mycobacterial studies.

      Moreover, the work still has some technical issues to fix regarding description of the methods, supplementary material, and reference formating.

      See detailed responses below.

      Overall impact: Showing that ROS accumulation is suppressed during phenotypic tolerance, while expected, adds to the examples of the protective effects of low ROS levels. Moreover, the work, along with a few others, extends the idea of antibiotic involvement with ROS to mycobacteria. These are fieldsolidifying observations.

      Comments on revisions:

      The authors have moved this paper along nicely. I have a few general thoughts.

      It would be helpful to have more references to specific figures and panels listed in the text to make reading easier.

      Text modified to add more figure references.

      (1) I would suggest adding a statement about the importance of the work. From my perspective, the work shows the general nature of many statements derived from work with E. coli. This is important. The abstract says this overall, but a final sentence in the abstract would make it clear to all readers.

      We appreciate the suggestion and have added a line to the abstract.

      (2) The paper describes properties that may be peculiar to mycobacteria. If the authors agree, I would suggest some stress on the differences from E. coli. Also, I would place more stress on novel findings. This might be done in a section called Concluding Remarks. The paper by Shee 2022 AAC could be helpful in phrasing general properties.

      We have added mention of this in the discussion (lines 354-356).

      (3) Several aspects still need work to be of publication quality. Examples are the materials table and the presentation of supplementary material. Reference formatting also needs attention.

      We respond to the specific details below.

      Reviewer #3 (Public review):

      Summary:

      The manuscript demonstrates that starvation induces persister formation in M. abscesses.

      They also utilized Tn-Seq for the identification of genes involved in persistence. They identified the role of catalase-peroxidase KatG in preventing death from translation inhibitors Tigecycline and Linezolid. They further demonstrated that a combination of these translation inhibitors leads to the generation of ROS in PBS-starved cells.

      Strengths:

      The authors used high-throughput genomics-based methods for identification of genes playing a role in persistence.

      Weaknesses:

      The findings could not be validated in clinical strains.

      Comments on revisions: No more comments for the authors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors are strongly encouraged to check the references. There is some systematic error in the citations of references. Started to list but then they were too many.

      For example Ln 51, Ref #11 cited, should be #10. Ln 59, #18 is wrongly cited. Should be - Ln 104. Ref #27 wrongly cited.

      Ref #26 and #28 identical.

      Even in discussion section a lot of references are mis-cited.

      We very much appreciate the reviewer catching this issue with the import of our references and we have corrected this.

      Reviewer #2 (Recommendations for the authors):

      Below I have listed comments on specific issues that I hope are useful during revision.

      Line 21 population is singular

      Text modified

      Line 21 comma after antibiotic (subordinate clause) Line

      Text modified

      25 is how singular?

      Text modified

      Impression of abstract: the work seems to confirm and therefore generalize concepts derived from studies with E. coli. If the authors agree, such a statement would be appropriate as a final sentence. I would also look for novel features to stress in the abstract.

      Line 41 this challenge is vague

      Text modified

      Line 43 comma such as (also comma at the end of the parenthetical statement). This type of comma error is common throughout the manuscript and slows reading.

      Text modified

      Line 60 paradoxically. Is this the best concept? Or is it the natural effect of evolution (assuming that mycobacteria or their ancestors were exposed to environmental antibiotics)?

      It is certainly problematic for clearing infection.

      Text not modified.

      Line 63 highlighted uncertainties ... meaning is unclear especially since you may have changed what "model" is referring to.

      Text modified

      Line 66 models.... Do you really mean systems? Models of what?

      This refers to mechanistic models. Text not modified.

      Line 67 arrest cell division. This is written as if it were true. Does the evidence point specifically to cell division or perhaps more accurately suppression of metabolism (see Ye et al 2025 mBio).

      Both have been postulated as important. Text modified to add concept of metabolism

      ... targeted by antibiotics non-essential... Do you think that antibiotics work by inactivating essential targets? That seems overly simplistic, as lethal action is more likely the metabolic response to the damage caused. By the end of the paragraph you come around to this view, but you have already misdirected the reader. The reader is not sure what to believe. Line 70 note that there are many inhibitors of transcription and translation that only block growth, they do not rapidly kill cells

      There can be both direct, and indirect secondary killing mechanisms. We devote a significant portion of the Discussion section to this topic.

      Line 71 debate. There was indeed a debate, but reference 22 is not a valid citation for this. I think you mislead the reader by not accurately describing the debate. It was basically about the inability of Kim Lewis and James Imlay to reproduce the work of ref. 22. A great deal of prior work and then subsequent work showed that the challenge to ref. 22 lacked substance.

      (1) Text modified to fix an error in the citation number related to direct β-lactam-mediated lysis.

      (2) We agree that there is a great deal of data supporting antibiotic-induced ROS as important for bactericidal activity in many circumstances and do not argue otherwise. This sentence points out that over the years the paradigm for how antibiotics kill bacteria has evolved.

      Line 80. It seems you are starting a new topic here. What about beginning a new paragraph?

      The paragraph introduces mycobacteria of which Mabs is one. Text not modified.

      Line 85 delete the comma: it implies a compound sentence that is not delivered.

      Text modified.

      Line 109 screen singular

      Text modified.

      Line 156 these conditions is imprecise and vague

      Conditions were described in paragraph above in the manuscript. Text not modified.

      Fig 2 it would be helpful to more clearly define the meaning of the coordinates

      Text modified.

      Line 230 and throughout please indicate the location of the data being cited for rapid reader reference

      Text modified.

      Lines 315-323 You could use this paragraph as the first of the Discussion. Some readers prefer to read the Discussion before the results. For them, a summary at the beginning of the Discussion is useful.

      Text modified.

      Line 328 without underlying mechanism... for E. coli refer to Zeng PNAS 2022. Depending on when the final version of this paper happens, there should be a figure in a Zhao Zhu mLife paper on purA that will have been published. Since it is not yet available, it cannot be cited.

      We agree that the Zeng et al study is interesting and have added this reference to our discussion. However, these findings related to broad Crp-regulated tolerance actually underscore the point that we are making: that there are multiple factors (Crp, RelA, Lon, TisB, MazE, others) that mediate antibiotic tolerance.

      Line 339 where are the data?

      These data are in Figure 5, panels C, D. We have clarified the text to indicate that only a single agent from each of these classes was tested.

      Line 346 here you are summarizing evidence for ROS in killing mycobacteria. You should include the moxifloxacin study by Shee et al 2022 AAC.

      Reference added.

      Line 348 refer to James Collins' work with E. coli in which his lab examined agents with a variety of mechanisms. There seems to be a fundamental difference between E. coli and mycobacteria with respect to rifampicin, a strictly static agent in E. coli but clearly lethal in mycobacteria. Note that chloramphenicol is static in E. coli and blocks ROS production. What does it do in mycobacteria? A brief discussion of this difference might be relevant at line 362

      Text modified.

      Lines 364-368 Here the idea might be simply that there are two modes of killing, one that is a direct extension of class-specific damage (chromosome fragmentation with fluoroquinolones, for example, or cell lysis by beta-lactams) and a second that is a metabolic response to the antibiotic damage (ROS accumulation). The second type is not class specific. Within this context, the mycobacterial killing by rifampicin might be a class-specific extension of inhibition of transcription that does not occur in E. coli.

      Agreed, text modified to include this.

      Line 400 The Key Resource table is not of publication quality. Precision and repeatability can be improved by spelling out the name of the vendor and its location (City, Country). In the present case, use of BD is lab jargon.

      We appreciate the reviewer’s precision. However, this is actually not lab jargon. Becton, Dickinson and Company now refers to itself as BD (see https://www.bd.com/en-us), and the American Type Culture Collection now refers to itself as ATCC (see https://www.atcc.org/about-us/who-we-are).

      Line 639 It would be good to have experienced colleagues critically review the manuscript, especially for English usage. Listing those persons here adds to the credibility of the work

      Text not changed.

      References: please refer to the journal style. Here you use italic for titles and scientific names, thereby obscuring the scientific names. Normally article titles are not italic and scientific names are ALWAYS italic unless prohibited by journal style.

      Our reference format is concordant with eLife submission guidelines, and all references are reformatted by the journal at the time of final publication (see https://elifesciences.org/insideelife/a43f95ca/elife-references-yes-we-take-any-format-no-we-re-not-rekeying).

      Supplemental Material: Please refer to journal style. Normally this is a stand-alone document that includes a title page and carefully crafted figure legends. Supplemental figures would be numbered as 1, 2, ... A professional appearing Supplemental Material section shows author publication experience not obvious in other parts of the paper. The text indicated MIC determinations. I would like to see a table of MIC values.

      (1) MIC table added as Supplemental Table 5.

      (2) The Supplemental figures are submitted and named in accordance with eLife instructions. Please note that for eLife, there is not a stand-alone supplementary figure section with a title page as you are requesting, but instead the figure supplements for each figure are provided as online files linked to each figure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Feng et al. uses mouse models to study the embryonic origins of HSPCs. Using multiple types of genetic lineage tracing, the authors aimed to identify whether BM-resident endothelial cells retain hematopoietic capacity in adult organisms. Through an important mix of various labeling methodologies (and various controls), they reach the conclusion that BM endothelial cells contribute up to 3% of hematopoietic cells in young mice.

      Strengths:

      The major strength of the paper lies in the combination of various labeling strategies, including multiple Cdh5-CreER transgenic lines, different CreER lines (col1a2), and different reporters (ZsGreen, mTmG), including a barcoding-type reporter (PolyLox). This makes it highly unlikely that the results are driven by a rare artifact due to one random Cre line or one leaky reporter. The transplantation control (where the authors show no labeling of transplanted LSKs from the Cdh5 model) is also very supportive of their conclusions.

      We appreciate the Reviewer’s consideration of the strengths of our study supporting the identification of adult endothelial to hematopoietic transition (EHT) in the mouse bone marrow.

      Weaknesses:

      We believe that the work of ruling out alternative hypotheses, though initiated, was left incomplete. We specifically think that the authors need to properly consider whether there is specific, sparse labeling of HSPCs (in their native, non-transplant, model, in young animals). Polylox experiments, though an exciting addition, are also incomplete without additional controls. Some additional killer experiments are suggested.

      Recognizing the importance of the weaknesses pointed by the Reviewer, we provide below our response to the thoughtful recommendations rendered.

      Reviewer #1 (Recommendations for the authors):

      The main model is to label cells using Cdh5 (VE-cadherin) CreERT2 genetic tracing. Cdh5 is a typical marker of endothelial cells. The data shows that, when treating adults with tamoxifen, the model labels PBMCs after ~10 days, and the labeling kinetics plateau by day 14... The authors reach the main conclusion: that adult ECs are making hematopoietic cells.

      We agree that the main tool used in this study is to label endothelial cells (ECs) using Cdh5 (VE-Cadherin) CreERT2 genetic tracing in mice. Indeed, Cdh5 is recognized as a good marker of ECs. As a minor point, we wish to clarify that the results from treating adult Cdh5-CreERT2 mice with tamoxifen (Figure 1F) show that the ZsGreen labeling kinetics plateau by day 28 (not by day 14).

      Important controls should be shown to rule out alternative possibilities: namely, that the CreERT2 reporter is being sparsely expressed in HSPCs. Many markers, specific as they may seem to be, can show expression in non-specific lineages - particularly in the cases of BAC and PAC transgenic models, in which the transgene can be present in multiple tandem copies and subject to genome location-specific effects. As the authors remind readers, the Cdh5 gene is partly transcribed (though at low levels) in HSPCs, and even more clearly expressed in specific subpopulations such as CLPs, DCs, pDCs, B cells, etc. Some options would be to: i) check if the Cdh5-CreERT2 transgene (not endogenous Cdh5, but the BAC/PAC transgene) is expressed in LSKs (at least by qPCR), ii) verify if any CreERT2 protein levels are present in LSKs (e.g., by western blot), and iii) check if tamoxifen is labeling any HSPCs freshly after induction (e.g., flow cytometry data of ZsGreen LSKs at 24-48h post tamoxifen injection).

      We fully agree with the Reviewer that many markers, allegedly specific to a certain cell type, can show expression in other cell lineages. We also agree that excluding sparse or ectopic CreERT2 expression in hematopoietic stem and progenitor cells (HSPCs) is essential for interpreting lineage-tracing results. As suggested by the Reviewer, we have now examined if the Cdh5-CreERT2 transgene is expressed in bone marrow LSKs. To this end, we analyzed the Polylox single-cell RNAseq dataset presented in this study, containing ZsGreen<sup>+</sup> ECs and enriched ZsGreen<sup>+</sup> LSKs. As shown in the revised Figure S4D, CreERT2 transcripts were detected exclusively in Cdh5-expressing endothelial populations and were absent from Ptprc/CD45-expressing hematopoietic cells, except for plasmacytoid dendritic cells (pDCs; Figure S4E). These results are consistent with the RNAseq data from adult mouse bone marrow[1] showing that the Cdh5 gene is not expressed in HSPCs, CLPs, DCs, or B cells. Rather, among hematopoietic CD45<sup>+</sup> cells, Cdh5 is only expressed in a small subset of plasmacytoid dendritic cells (pDCs), which are terminally differentiated cells. These published results are described in the text.

      To further support this conclusion, we provide additional single-cell RNAseq analyses from our unpublished dataset of LSKs isolated from Cdh5-CreERT2/ZsGreen mice and not enriched for ZsGreen expression. These new analyses were performed after integrating the single-cell data from ECs and ZsGreen<sup>+</sup> hematopoietic cells from the Polylox dataset (current study). As shown in Author response images 1 and 2, CreERT2 expression closely matches the expression patterns of Cdh5, Pecam1, and Emcn and is not detected in Ptprc/CD45-expressing hematopoietic cells.

      Author response image 1.

      Expression of CreERT2, Cdh5, Ptprc and ZsGreen in BM cell populations enriched with ECs and hematopoietic cells. The single-cell RNAseq results are derived from ZsGreen-enriched BM ECs and ZsGreen-enriched BM hematopoietic cells were derived from Polylox lineage-tracing experiments (data shown in Fig. 5; 37,667 ECs and 48,065 BM hematopoietic cells) and from LSKs (23,017 cells) independently isolated from tamoxifen-treated Cdh5-CreERT2/ZsGreen mice without ZsGreen enrichment (unpublished data).

      Author response image 2.

      Expression of CreERT2, Cdh5, Ptprc, Pecam1, Emcn, ZsGreen1, Col1a2, Cd19, Cd3e, Itgam (CD11b), Ly6a (Sca-1), Kit(cKit), Cd34, Cd48, Slamf1 (CD150), and Siglech in enriched BM ECs and LSKs from Cdh5-CreERT2/ZsGreen mice treated with tamoxifen 4 weeks prior to harvest (same cell source as indicated in Author response image 1).

      Additionally, we functionally tested whether hematopoietic progenitors could acquire ZsGreen labeling following tamoxifen administration using transplantation assays (Figure 4A-D). ZsGreen<sup>-</sup> LSKs (purity 99%), sorted from Cdh5-CreERT2/ZsGreen donors that had never been exposed to tamoxifen to exclude background Cre leakiness, were transplanted into lethally irradiated wild-type recipients. After stable hematopoietic reconstitution, recipients were treated with tamoxifen. If transplanted HSPCs or their progeny expressed CreERT2, tamoxifen administration would be expected to induce ZsGreen labeling. However, no ZsGreen<sup>+</sup> hematopoietic cells were detected in these recipients, demonstrating that hematopoietic progenitors from Cdh5-CreERT2/ZsGreen and their descendants do not undergo tamoxifen-induced recombination.

      Together, the single-cell transcriptional and transplantation data demonstrate that CreERT2 expression and tamoxifen-induced recombination are restricted to Cdh5-expressing ECs (except for pDCs). These findings support the conclusion that ZsGreen<sup>+</sup> hematopoietic cells arise from adult bone marrow ECs rather than from contaminating hematopoietic progenitors.

      One important missing experiment is to trace how ECs actually do this hematopoietic conversion: meaning, which populations of HSPCs are being produced by adult ECs in the first instance? LT-HSCs? ST-HSCs? MPPs? GMPs? All of the above? What are the kinetics? Differentiation is likely to follow a hierarchical path, but this is unclear at the moment.

      We agree that defining the earliest EC-derived hematopoietic cell progenitors and the kinetics by which these progenitors appear (LT-HSC vs ST-HSC/MPP vs lineage-restricted progenitors) would provide important insights into adult EHT.

      In the current genetic labeling system, a rigorous kinetic analysis of hematopoietic cells first generated by EC-derived in vivo is not straightforward. Specifically, the low-level baseline reporter ZsGreen<sup>+</sup> fluorescence in hematopoietic cells (dependent on EHT occurring prenatally, perinatally or in young mice or other causes (Figure 1 A-D and Figure S1 D-I) impairs identification of newly generated ZsGreen<sup>+</sup> progenitors at early time points and distinguish them from baseline fluorescence. A potential solution might be to introduce serial harvests across multiple time-points in large mouse cohorts to capture rare transitional events with statistical significance.

      We wish to emphasize that the primary objective of this study was to establish whether adult bone marrow ECs have a hemogenic potential. Our data demonstrate adult EC-derived hematopoietic cell output that includes progenitor-containing fractions and multilineage mature progeny, under both steady-state conditions. We acknowledge that the current work does not resolve the order and kinetics of hematopoietic cell emergence following EHT. Therefore, under “Limitations of the study” we explicitly state this limitation and frame the identification of the earliest endothelial-derived progenitors and their kinetics as an important direction for future work.

      One warning sign is how rare the reported phenomenon is. Even when labeling almost 90% of the BM ECs, these make at most ~3% of blood (less than 1% in the transplants in Figure 4F, less than 0.5% in the col1a2 tracing in Figure 7). This means this is a very rare and/or transient phenomenon... The most major warning sign is the fast kinetics of labeling and the fast plateau. We know that: a) differentiation typically follows some hierarchy, b) in situ dynamics of blood production are slow (work by Rodewald and Höfer). Considering how fast these populations need to be replaced to reach a steady state so rapidly (as reported here, 2-4 weeks), the presumably specialized ECs would need to be steadily dividing and producing hematopoietic cells at a fast pace (as a side prediction, the adult "EHT" cluster would likely be highly Mki67+). More importantly, the ZsGreen LSKs produced by the ECs would have to undergo VERY rapid differentiation (much faster than normal LSKs) or otherwise, if 3% of them are produced by a top compartment (the BM ECs) every 4 weeks, then the labeled population would continue to grow with time. The authors could try to challenge this by testing if the ZsGreen LSKs undergo much faster differentiation kinetics or lower self-renewal (which does not seem to be the case, at least in their own transplantation data). We believe a more likely explanation is that the label is being acquired more or less non-specifically, directly across a bunch of HSPC populations.

      The Reviewer correctly notes that that the population of hemogenic ECs in the adult mouse bone marrow is small and the output of hematopoietic cells from these hemogenic ECs accounts for at most 3% of blood cells. We agree that delineating the kinetics by which hematopoietic cells are generated from adult EC is important, as this information would provide important insights into adult EHT.

      Nonetheless, we believe that the rapid appearance and early plateau of labeled blood cells in our experiments may not derive from a sustained, high-rate generation of labeled blood cells from self-renewing top-tier hematopoietic cell compartments, such as LT-HSCs. Rather, our data are more consistent with a predominantly lineage-restricted and biased hematopoietic progenitor cell population being the source of labeled blood cells. Supporting this interpretation, longitudinal analysis of peripheral blood shows that EGFP<sup>+</sup> PBMCs are consistently enriched with myeloid cells, whereas EGFP<sup>-</sup> PBMCs are predominantly B cells (Figure 4G and H). This myeloid lineage skewing is stable over time and contrasts with what would be expected if labeling were acquired broadly and nonspecifically across the hematopoietic hierarchy. Therefore, our results are more consistent with myeloid biased progenitors being among the first populations that EHT generates.

      We acknowledge that our studies do not identify the earliest endothelial-derived hematopoietic cells produced in vivo, and do not define their differentiation kinetics. Addressing rigorously these questions would require temporally resolved lineage tracing with sufficiently powered cohorts at early time point to statistically distinguish from baseline reporter background. These important experiments were beyond the scope of the present study. As noted above, under “Limitations of the study” we explicitly state this limitation and frame the identification of the earliest endothelial-derived progenitors and their kinetics as an important direction for future work.

      Transplant experiments in Figure 4 do offer a crucial experiment in support of the main conclusion of the manuscript. These experiments show that transplanted LSKs bearing the Cdh5-CreERT2 and ZsGreen reporter cannot acquire the tamoxifen-induced label post-transplantation - suggesting that the label is coming from ECs. However, it is also possible that the LSK Cdh5-CreERT expression is partly during the transplantation process... Indeed, we know through the aging data that the labeling is less active in aged mice. In any case, this would be verified by qPCR/western-blot (comparing native vs post-transplant LSKs).

      We agree with the Reviewer that the experiment in Figure 4A-D “offer a crucial experiment in support of the main conclusion of the manuscript.” The results of this experiment show that ZsGreen negative LSKs from the Cdh5-CreERT2-ZsGreen reporter mice do not acquire tamoxifen-induced ZsGreen fluorescence post transplantation, supporting the endothelial cell origin of blood ZsGreen<sup>+ </sup>cells.

      The Reviewer raises the possibility a “that the LSK Cdh5-CreERT expression is partly during the transplantation process... , and that this Cdh5-CreERT expression may occur slowly as learned “through the aging data that the labeling is less active in aged mice.” As we show in Figure 3F, tamoxifen administration induced a similar percentage of ZsGreen<sup>+ </sup>ECs in the bone marrow of Cdh5-Cre<sup>ERT2</sup>(BAC)/ZsGreen mice, whether tamoxifen was administered to 6-week-old, 16-week-old, 26-week-old or 36-week-old mice. Similar results with Cdh5-CreERT2 (BAC) mice are reported in the literature[2]. Since the mice transplanted with ZsGreen<sup>-</sup> LSKs were followed for 25 weeks after tamoxifen administration, we believe that the results in Figure 4A-D address the concern raised by the Reviewer.

      Supporting the conclusion that LSKs from the Cdh5-CreERT2-ZsGreen reporter mice do not express the Cdh5-CreERT2 under a native -non-transplant- setting, we now provide transcriptomic data from Cdh5-CreERT2/ZsGreen mice (not transplanted) showing that CreERT2 expression closely tracks with expression of canonical endothelial markers (Cdh5, Pecam1, Emcn) and is not detectable in Ptprc/CD45-expressing hematopoietic cells (Author response images 1 and 2). These data were obtained from non-transplanted mice treated with tamoxifen at ~12 weeks of age and analyzed four weeks later. Together, these results indicate that CreERT2 expression is endothelial-restricted in Cdh5-CreERT2-ZsGreen reporter mice.

      Figure 5 presents PolyLox experiments to challenge whether adult ECs produce hematopoietic cells through in situ barcoding. Several important details of the experiment are missing in the main text (how many cells were labeled, at which time point, how long after induction were the cells sampled, how many bones/BM-cells were used for the sample preparation, what was the sampling rate per population after sorting, how many total barcodes were detected per population, how many were discarded/kept, what was the clone-size/abundance per compartment). As presented, the authors imply that 31 out of ~200 EC barcodes are shared with hematopoietic cells... This would suggest that ~15% of endothelial cells are producing hematopoietic cells at steady state. This does not align well with the rarity of the behavior and the steady state kinetics (unless any BM EC could stochastically produce hematopoietic cells every couple of weeks, or if the clonality of the BM EC compartment would be drastically reduced during the pulse-chase overlap with mesenchymal cells. Important controls are missing, such as what would be the overlap with a population that is known to be phylogenetically unrelated (e.g., how many of these barcodes would be found by random chance at this same Pgen cut-off in a second induced mouse). Also, the Pgen value could be plotted directly to see whether the clones with more overlapping populations/cells (3HG, 127, 125, CBA) also have a higher Pgen. We posit that there are large numbers of hematopoietic clones that contribute to adult hematopoiesis (anywhere from 2,000-20,000 clones would be producing granulocytes after 16 weeks post chase), and it would be easy to find clones that overlap with granulocytes (the most abundant and easily sampled population) - HSPCs would be the more stringent metric.

      We thank the Reviewer for highlighting the need for a more detailed description of the Polylox experiments. To address this deficiency, we have compiled a document (Additional Supplementary Information file) containing all the specifics of the Polylox experimental and analytical parameters in one location. This includes: (i) the number of cells analyzed per population, (ii) the time points of induction and sample collection, (iii) the number of bones and total bone marrow cells used for preparation, (iv) the sampling rate following cell sorting, (v) the total number of detected barcodes per population, (vi) barcode filtering criteria and numbers retained or discarded, and (vii) clone-size and barcode number across cell compartments. We have updated the manuscript to refer readers to this Supplementary file.

      The Reviewer concluded from our results (Figure 5, Figure S5) that 31 out of ~200 endothelial cell (EC) barcodes shared with hematopoietic cells (HCs), implying that ~15% of ECs produce hematopoietic cell progeny at steady state. This interpretation in inconsistent with our data showing the rare nature of adult EHT and would require either that a large fraction of bone-marrow ECs can generate hematopoietic cells within short time windows, or that EC would clonally expand rapidly during the pulse-chase period, as noted by the Reviewer. The explanation for this apparent problem is technical. Briefly, the ~200 EC barcodes recovered do not represent all barcoded ECs. During Polylox barcode library construction, a mandatory size-selection step is applied prior to PacBio sequencing, retaining fragments that are approximately 800–1500 bp in length, whereas the full Polylox cassette spans ~2800 bp. This is mainly because the PacBio sequencer requires that the library be either 800-1500bp or over 2500bp, for optimal sequencing results. As described in the original Polylox publication[3,4], this size selection eliminates most (approximately 75%) longer barcodes, together with ~85% of the shorter barcodes. Thus, ECs harboring very long or short recombined barcodes are under-represented or excluded from sequencing. As a result, the 22 true barcodes linking ECs and HCs recovered from sequencing do not indicate that ~10–15% of ECs generate hematopoietic progeny. Rather, these barcodes represent a highly selected subset of ECs with barcode configurations compatible with library recovery and sequencing. The observed EC–HC barcode sharing thus reflects qualitative lineage connectivity, not the quantitative frequency of endothelial-derived hematopoiesis at steady state.

      The Reviewer correctly notes that true Polylox barcodes are shared by ECs and mesenchymal-type cells and asks that we examine whether this overlap could occur by chance alone. The Polylox filtering threshold (pGen < 1 × 10<sup>-6</sup>), that we have revised for stringency (from pGen < 1 × 10<sup>-4</sup>, without altering the essential results; new Figure S4 and revised Figure 5C-F) renders such overlap exceedingly unlikely. At this threshold, the expected number of random recombination events among 4,069 barcoded cells is approximately 0.004. Consequently, among the 87 mesenchymal cells identified here, fewer than 0.4 cells would be expected, to share a barcode with another cell by chance alone. Thus, the probability of recovering identical barcodes across unrelated lineages due to random recombination is vanishingly small, and the observed EC–mesenchymal barcode sharing substantially exceeds random expectation.

      Related to this observation, the Reviewer correctly notes that the endothelial and mesenchymal cell lineages are phylogenetically unrelated. However, endothelial-to-mesenchymal cell transition (EndMT), the process by which normal ECs completely or partially lose their endothelial identity and acquire expression of mesenchymal markers, is a well-established process that occurs physiologically and in disease states (Simons M Curr Opin Physiol 2023). In the bone marrow, the occurrence of EndMT has been documented in patients with myelofibrosis, and the process affects the bone marrow microvasculature (Erba BG et al The Amer J Patholl 2017). Single-cell RNAseq of non-hematopoietic bone marrow cells has shown the existence of a rare population of ECs that co-expresses endothelial cell markers (Cdh5, Kdr, Emcm and others) and the mesenchymal cell markers, as shown in Figure 6E and F.

      We fully agree with the Reviewer that given the large number of hematopoietic clones contributing to adult hematopoiesis -particularly granulocyte-producing clones- it may be relatively easy to detect barcode overlap with abundant mature populations, whereas overlap with HSPCs would represent a more stringent and informative metric of lineage relationships. The Polylox results presented here show the sharing of true barcodes between individual ECs and HSPC.

      Reviewer #2 (Public review):

      Summary:

      Feng, Jing-Xin et al. studied the hemogenic capacity of the endothelial cells in the adult mouse bone marrow. Using Cdh5-CreERT2 in vivo inducible system, though rare, they characterized a subset of endothelial cells expressing hematopoietic markers that were transplantable. They suggested that the endothelial cells need the support of stromal cells to acquire blood-forming capacity ex vivo. These endothelial cells were transplantable and contributed to hematopoiesis with ca. 1% chimerism in a stress hematopoiesis condition (5-FU) and recruited to the peritoneal cavity upon Thioglycolate treatment. Ultimately, the authors detailed the blood lineage generation of the adult endothelial cells in a single cell fashion, suggesting a predominant HSPCs-independent blood formation by adult bone marrow endothelial cells, in addition to the discovery of Col1a2+ endothelial cells with blood-forming potential, corresponding to their high Runx1 expressing property.

      The conclusion regarding the characterization of hematopoietic-related endothelial cells in adult bone marrow is well supported by data. However, the paper would be more convincing, if the function of the endothelial cells were characterized more rigorously.

      We thank the Reviewer for the supportive comments about our study.

      (1) Ex vivo culture of CD45-VE-Cadherin+ZsGreen EC cells generated CD45+ZsGreen+ hematopoietic cells. However, given that FACS sorting can never achieve 100% purity, there is a concern that hematopoietic cells might arise from the ones that got contaminated into the culture at the time of sorting. The sorting purity and time course analysis of ex vivo culture should be shown to exclude the possibility.

      We agree that FACS sorting can never achieve 100% cell purity and that sorting purity is critical for interpreting the ex vivo culture experiments presented in our study. As requested by the Reviewer, we have now documented the purity of the sorted endothelial cell (EC) population used in the ex vivo culture experiments. The post-sort purity of CD45<sup->/sup>VE-cadherin<sup>+</sup>ZsGreen<sup>+</sup> ECs was 96.5 %; this data is now shown in the revised Figure 2B (Post Sort Purity panel). This purity level is comparable to purity levels of sorted ECs shown in Figure S2I (94.5 %).

      While we agree that a detailed time-course analysis of hematopoietic cell output from EC cultures could further strengthen the conclusion that bone marrow ECs can produce hematopoietic cells ex vivo, we wish to call attention to the additional critical control in the experiment shown in Figure 2B-D. In this experiment, we co-cultured CD45<sup>+</sup>ZsGreen<sup>+</sup> hematopoietic cells from Cdh5-CreERT2/ZsGreen mice, rather than ECs, and examined if these hematopoietic cells could produce ZsGreen<sup>+</sup> cell progeny after 8-week culture under the same conditions used in EC co-cultures (conditions not designed to support hematopoietic cells long-term). Unlike ECs, the CD45<sup>+</sup>ZsGreen<sup>+</sup> hematopoietic cells did not generate ZsGreen<sup>+</sup> hematopoietic cells at the end of the 8-week culture, indicating that the culture conditions are not permissive for the maintenance, proliferation and differentiation of hematopoietic cells. This provides strong evidence that even if few hematopoietic cells contaminated the sorted ECs, these hematopoietic cells would not contribute to EC-derived production of hematopoietic cells at the 8-week time-point. We have revised the text of the results describing the results of Figure 2B-D.

      (2) Although it was mentioned in the text that the experimental mice survived up to 12 weeks after lethal irradiation and transplantation, the time-course kinetics of donor cell repopulation (>12 weeks) would add a precise and convincing evaluation. This would be absolutely needed as the chimerism kinetics can allow us to guess what repopulation they were (HSC versus progenitors). Moreover, data on either bone marrow chimerism assessing phenotypic LT-HSC and/or secondary transplantation would dramatically strengthen the manuscript.

      The original manuscript reported survival and engraftment up to 12 weeks post transplantation. The recipient mice have now been monitored for up to 10 months post transplantation. These extended survival and engraftment data are now included in the revised Figure 2I and J replacing the previous 10-week analyses.

      We agree with the Reviewer that the time-course kinetics of donor cell repopulation would help define adult endothelial to hematopoietic transition (EHT) and the hematopoietic cell types produced by adult (EHT). We did not perform serial time-course sampling of peripheral blood beyond the 10-week and the 10-month time-points. Given that the recipient mice were lethally irradiated with increased susceptibility to infection, we sought to minimize repeated interventions that could compromise animal health and survival. We therefore prioritized long-term survival and endpoint analysis over repeated longitudinal sampling. Nonetheless, the long-term survival,10 months, and multilineage hematopoietic cell reconstitution after lethal irradiation provides functional evidence that adult EHT produced at least some LT-HSC.

      We acknowledge that phenotypic assessment of bone marrow LT-HSC chimerism /or secondary transplantation would further strengthen the manuscript. We have clarified these limitations in the revised manuscript under “Limitations of the study”.

      (3) The conclusion by the authors, which says "Adult EHT is independent of pre-existing hematopoietic cell progenitors", is not fully supported by the experimental evidence provided (Figure 4 and Figure S3). More recipients with ZsGreen+ LSK must be tested.

      We agree with the Reviewer that, in most cases, a larger number of experimental data points is helpful to strengthen the conclusions, and that having additional mice transplanted with ZsGreen-enriched LSK would be desirable. However, we do not believe that additional mice transplanted with ZsGreen LSKs would strengthen the conclusions drawn from the experimental results shown in Figure 4D, in which we used 6 mice transplanted with ZsGreen-depleted (ZsGreen<sup>-</sup>) LSKs and 2 mice transplanted with ZsGreen<sup>+</sup>-enriched (ZsGreen<sup>+</sup>) LSKs. The independence of adult EHT from “pre-existing hematopoietic cell progenitors” is based on the following experimental results and conclusion from these results.

      First, ZsGreen<sup>-</sup> LSKs (purity 99%) isolated from Cdh5-CreERT2/ZsGreen mice were transplanted into lethally irradiated WT recipients (n = 6). These ZsGreen<sup>-</sup> LSKs robustly reconstituted hematopoiesis, demonstrating successful engraftment. Importantly, tamoxifen administration to the recipients of ZsGreen<sup>-</sup> LSKs produced no detectable ZsGreen<sup>+</sup> cells in the blood for up to 6 months post transplantation (Figure 4D, blue line encompassing the results of the 6 mice). This result demonstrates that the transplanted ZsGreen<sup>-</sup> hematopoietic progenitors and their progeny do not acquire ZsGreen labeling in vivo following tamoxifen treatment, indicating that they lack the Cre-recombinase. This result is consistent with the endothelial specificity of Cdh5 expression.

      Second, ZsGreen<sup>+</sup> LSKs (accounting for ~50% of the LSKs) isolated from Cdh5-CreERT2/ZsGreen mice were transplanted into lethally irradiated WT recipients (n = 2). This arm of the experiment was performed in part as a technical control to confirm successful engraftment and detection of ZsGreen<sup>+</sup> hematopoietic cells in the transplant setting. Importantly, tamoxifen administration to the two recipients of ZsGreen<sup>+</sup> LSKs (Figure 4D, two green lines reflecting these two mice) show that the level of ZsGreen<sup>+</sup> blood cells stabilized in each of the mice between week 10 and 24, showing equilibrium between the proportion of ZsGreen<sup>+</sup> and ZsGreen<sup>-</sup>cells in the blood. This indicates that pre-existing ZsGreen<sup>+</sup> LSK are not responsible for tamoxifen-induced increases in ZsGreen<sup>+</sup> hematopoietic cell in blood.

      Together, the results from this experiment demonstrate that in the setting of transplantation, tamoxifen does not induce ZsGreen labeling of ZsGreen- hematopoietic progenitors/their progeny. This result strongly supports the conclusion that ZsGreen⁺ hematopoietic cells arise independently of pre-existing or inducible hematopoietic progenitors. We have revised the text to clarify these experiments and to present the results in a simplified manner.

      Strengths:

      The authors used multiple methods to characterize the blood-forming capacity of the genetically - and phenotypically - defined endothelial cells from several reporter mouse systems. The polylox barcoding method to trace the adult bone marrow endothelial cell contribution to hematopoiesis is a strong insight to estimate the lineage contribution.

      Weaknesses:

      It is unclear what the biological significance of the blood cells de novo generated from the adult bone marrow endothelial cells is. Moreover, since the frequency is very rare (<1% bone marrow and peripheral blood CD45+), more data regarding its identity (function, morphology, and markers) are needed to clearly exclude the possibility of contamination/mosaicism of the reporter mice system used.

      We agree that the biological significance and functional roles of hematopoietic cells generated de novo from adult bone marrow ECs remain important open questions. We also agree that the output of hematopoietic cells from adult EHT is low, but rare events can be important, particularly as they pertain to stem/progenitor cell biology. Both points are described under “Limitations of the study”. The primary goal of the present study was to address the question whether adult bone marrow ECs can undergo EHT. We believe that the combination of various mouse transgenic lines, different Cre-ER, different reporters (ZsGreen and mTmG), including the s.c. barcoding reporter (PolyloxExpress), different approaches to evaluate hematopoiesis in vivo and ex vivo, makes it rather unlikely that our conclusions are driven by an artifact related to a specific leaky reporter, contamination, or problems with one of the Cre-lines. The experiment where we find no tamoxifen-induced labeling of transplanted ZsGreen<sup>-</sup> LSKs derived from the Cdh5-CreERT2/ZsGreen mice is strongly supportive of the existence of adult EHT, virtually excluding a contribution of contaminant hematopoietic cells.

      Reviewer 2 Recommendations for the authors:

      (1) There is a discrepancy in the proportion of peripheral blood composition between different reporters (mTmG and ZsGreen) (Figure 1G and Figure S1K), especially the contrasting B cell proportion between both models. The additional comments on this data should be mentioned.

      In the revised Results section, we now note that the mTmG and ZsGreen reporters show slightly different efficiencies or kinetics of labeling. These differences have previously been reported[5] and have been attributed to relative reporter leakiness, sensitivity to tamoxifen, or different kinetics of Cre recombination. As suggested, these comments have been added to the text following the description of (Figure S2A).

      (2) Experimental methods concerning cell transplantation/transfer need more information, such as: a) using or not using rescue cells and how many cells are they if using, b) single or split dose of irradiation, c) when were cells transplanted following irradiation, etc. Otherwise, the data are uninterpretable.

      We have ensured that the Material and Methods section under “Bone marrow ablation and transplantation” contains all the information requested by the Reviewer.

      (3) Some of the grouped data haven't been statistically analyzed.

      We have reviewed all data and performed appropriate statistical analyses where comparisons were made. In the revised figures and legends, all grouped datasets now include statistical tests and p-values are indicated (added to Fig. 3H and I; Figure 4G).

      (4) Some flowcytometry plot has the quantitative number, others do not. The quantitative information is absolutely needed in all flow cytometry plots.

      We have updated the flow cytometry figures to include quantitative values (percentages or absolute counts) in all relevant plots (2B (new figure, bottom left); 2C; S1G, S1H).

      (5) It is more relevant to present the Emcn/VE-Cadherin plot from gated CD45+/ZsGreen+, not the CD45-/ZsGreen+ fraction (Figure 2C), as the latter were not the EHT-derived offspring, but rather the common phenotypic endothelial cells

      As requested, we have added the suggested flow cytometry plot. The revised Figure 2C now includes an Emcn vs. VE-Cadherin plot from the gated CD45<sup>+</sup>ZsGreen<sup>+</sup> population. This complements the existing panel and confirms that the cells of interest retain endothelial cell markers after culture, while the CD45<sup>+</sup>ZsGreen<sup>+</sup> cells did not express endothelial markers. The figure legend has been updated to explain the new panel. We agree that this plot more directly highlights the phenotype of the presumed EHT-derived cells.

      (6) To show the effect of the ex vivo culture, the authors should present the absolute number of CD45+ZsGreen+ cells in the pre-/post-culture; otherwise, the data are uninterpretable (Figure 2D).

      Our interpretation of the Reviewer’s comment above (relative to the experiment shown in Figure 2B-D) is that the Reviewer would like that we provide the absolute number of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells introduced into the co-culture (supplemented with unsorted BM cells, ZsGreen<sup>+</sup> hematopoietic cell or ZsGreen<sup>+</sup> ECs) and the absolute number of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells recovered at the end of the 8-week culture. Currently, the results in Figure 2D show the absolute number of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells recovered at the end of the 8-week culture. The input of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells for unsorted BM cells was 2.93e6 on average; for ZsGreen<sup>+</sup> hematopoietic cells was 1.68e6 on average and from sorted ZsGreen<sup>+</sup> ECs was estimate up to 100.

      (7) It is confusing to see Figures 2F and 2G, which apparently show the data from the middle of the experimental procedure (Figure 2E). Those data should be labelled clearly regarding which procedures of the whole experiment protocol.

      As correctly noted by the Reviewer, Figures 2F and 2G provide data that relate to the middle of the graphical representation of the experiment shown in Figure 2E. We see how this may be confusing.

      Therefore, we have updated both the figure labeling and legend to explicitly indicate that Figure 2F and 2G provide the FACS sorting results for the cells used for transplantation. The revised legend now reads: “Representative flow cytometry plots of the non-adherent cell fraction after 8 weeks of co-culture (cells used for transplantation).”

      References

      (1) Kucinski, I., Campos, J., Barile, M., Severi, F., Bohin, N., Moreira, P.N., Allen, L., Lawson, H., Haltalli, M.L.R., Kinston, S.J., et al. (2024). A time- and single-cell-resolved model of murine bone marrow hematopoiesis. Cell Stem Cell 31, 244-259.e10. https://doi.org/10.1016/j.stem.2023.12.001.

      (2) Identification of a clonally expanding haematopoietic compartment in bone marrow | The EMBO Journal | Springer Nature Link https://link.springer.com/article/10.1038/emboj.2012.308.

      (3) Pei, W., Shang, F., Wang, X., Fanti, A.-K., Greco, A., Busch, K., Klapproth, K., Zhang, Q., Quedenau, C., Sauer, S., et al. (2020). Resolving Fates and Single-Cell Transcriptomes of Hematopoietic Stem Cell Clones by PolyloxExpress Barcoding. Cell Stem Cell 27, 383-395.e8. https://doi.org/10.1016/j.stem.2020.07.018.

      (4) Pei, W., Feyerabend, T.B., Rössler, J., Wang, X., Postrach, D., Busch, K., Rode, I., Klapproth, K., Dietlein, N., Quedenau, C., et al. (2017). Polylox barcoding reveals haematopoietic stem cell fates realized in vivo. Nature 548, 456–460. https://doi.org/10.1038/nature23653.

      (5) Álvarez-Aznar, A., Martínez-Corral, I., Daubel, N., Betsholtz, C., Mäkinen, T., and Gaengel, K. (2020). Tamoxifen-independent recombination of reporter genes limits lineage tracing and mosaic analysis using CreERT2 lines. Transgenic Res 29, 53–68. https://doi.org/10.1007/s11248-019-00177-8.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study provides useful insights into addressing the question of whether the prevalence of autoimmune disease could be driven by sex differences in the T cell receptor (TCR) repertoire, correlating with higher rates of autoimmune disease in females. The authors compare male and female TCR repertoires using bulk RNA sequencing, from sorted thymocyte subpopulations in pediatric and adult human thymuses; however, the results do not provide sufficient analytical rigor and incompletely support the central claims.

      The statement in the editorial assessment that our study “does not provide sufficient analytical rigor” surprised us. TCR repertoire analysis is indeed a highly complex domain, both experimentally and computationally. We consider ourselves to be leading experts in this field and have invested a great deal of effort to ensure the rigor and reproducibility of every analytical step.

      Specifically, our group has previously benchmarked and published validated methodologies for the following areas: (i) TCR repertoire generation (Barennes et al., Nat Biotechnol 2021), (ii) repertoire analysis (Six et al., Frontiers in Immunol, 2013; Chaara et al., Frontiers in Immunol, 2018; Ritvo et al., PNAS, 2018; Mhanna et al., Diabetes, 2021; Trück et al., eLife, 2021; Quiniou et al., eLife, 2023; Mhanna et al., Cell Rep Methods, 2024; Mhanna et al., Nat Rev Primers Methods, 2024), and (iii) the curation and quality control of public TCR databases (Jouannet et al., NAR Genomics and Bioinformatics 2025). The current study applies these optimized and peer-reviewed pipelines, along with additional internal quality controls that we have been implemented over the years, ensuring the highest possible analytical standards for TCR repertoire studies.

      We therefore respectfully feel that the phrase “insufficient analytical rigor” does not accurately reflect the methodological robustness of our work. This perception is also in contrast to the comment made by one of the reviewers, who explicitly noted that “overall, the methodologies appear to be sound.”

      We would therefore be grateful if, upon reviewing our detailed point-by-point responses, the editors could reconsider this statement and tone it down in the final editorial summary.

      With regard to comment that our results “incompletely support the central claims”, we will leave it to the reader’s judgement. We believe that our work provides a robust and transparent basis for future research into TCR repertoire, autoimmunity, and women’s health.

      Reviewer 1 (Public reviews):

      Summary

      The goal of this paper was to determine whether the T cell receptor (TCR) repertoire differs between a male and a female human. To address this, this group sequenced TCRs from doublepositive and single-positive thymocytes in male and female humans of various ages. Such an analysis on sorted thymocyte subsets has not been performed in the past. The only comparable dataset is a pediatric thymocyte dataset where total thymocytes were sorted.

      They report on participant ages and sexes, but not on ethnicity, race, nor provide information about HLA typing of individuals. Though the experiments themselves are heroic, they do represent a relatively small sampling of diverse humans. They observed no differences in TCRbeta or TCRalpha usage, combinational diversity, or differences in the length of the CDR3 region, or amino acid usage in the CD3aa region between males or females. Though they observed some TCRbeta CD3aa sequence motifs that differed between males and females, these findings could not be replicated using an external dataset and therefore were not generalizable to the human population.

      They also compared TCRbeta sequences against those identified in the past using computational approaches to recognize cancer-, bacterial-, viral-, or autoimmune-antigens. They found very little overlap of their sequences with these annotated sequences (depending on the individual, ranging from 0.82-3.58% of sequences). Within the sequences that were in overlap, they found that certain sequences against autoimmune or bacterial antigens were significantly over-represented in female versus male CD8 SP cells. Since no other comparable dataset is available, they could not conclude whether this is a finding that is generalizable to the human population.

      Strengths:

      This is a novel dataset. Overall, the methodologies appear to be sound. There was an attempt to replicate their findings in cases where an appropriate dataset was available. I agree that there are no gross differences in TCR diversity between males and females.

      We appreciate the positive feedback from the reviewer regarding these points.

      Weaknesses:

      Overall, the sample size is small given that it is an outbred population. The cleaner experiment would have been to study the impact of sex in a number of inbred MHC I/II identical mouse strains or in humans with HLA-identical backgrounds.

      We respectfully disagree with the reviewer’s statement. We firmly believe that the issue we are dealing with, namely sex-based differences in thymic TCR selection relevant to autoimmunity, should be investigated more thoroughly in the general human population than in inbred mouse models.

      While inbred mouse strains, being MHC I/II identical, eliminate the complexity of MHC variation, this comes at the cost of biological relevance. Firstly, a discrepancy in TCR generation or selection may only become apparent under specific MHC contexts, which could easily be overlooked when studying a single inbred strain. Secondly, inbred strains frequently contain fixed genetic variants that may influence thymic selection or immune regulation. This has the potential to introduce confounding effects rather than reducing them and not solving the generalization issue.

      We are in full agreement that an HLA-matched human cohort would reduce inter-individual variability. However, such sampling is impossible in practice, as our thymic tissues were obtained from deceased organ donors, a collection effort that was, as the reviewer rightly noted, “heroic”. Despite these inherent limitations, the patterns we observed were consistent across multiple analytical approaches, lending robustness to our findings.

      We now explicitly acknowledge this limitation in the Discussion of the revised manuscript and explain why, despite this constraint, our study provides meaningful and biologically relevant insights into human TCR selection and sex-related immune differences.

      It is unclear whether there was consensus between the three databases they used regarding the antigens recognized by the TCR sequences. Given the very low overlap between the TCR sequences identified in these databases and their dataset, and the lack of replication, they should tone down their excitement about the CD8 T cell sequences recognizing autoimmune and bacterial antigens being over-represented in females.

      The three databases used in this study - McPAS-TCR, IEDB, and VDJdb - provide complementary and partially non-overlapping specificity landscapes. McPAS-TCR is enriched for pathology-associated TCRs, while IEDB and VDJdb contain a higher proportion of viral specificities. Combining them therefore broadens the antigenic spectrum accessible for analysis and represents the most comprehensive approach currently possible to capture the diversity of TCR–antigen annotations.

      With regard to the limited overlap between our dataset and these databases, this observation should be interpreted with caution. While the overlap may appear minimal at first glance, it is a biologically significant phenomenon. The public databases collectively contain only a minute fraction of the total universe of TCR specificities, estimated to exceed 10<sup>15-21</sup> possible receptors in humans. In this context, the observation of any overlap at all, particularly with coherent biological patterns such as the overrepresentation of autoimmune- and bacterialassociated TCRs in females, is noteworthy.

      We have included a short clarification in the Discussion of the revised manuscript to make this point explicit and to further temper the language describing this finding.

      The dataset could be valuable to the community.

      We thank the reviewer for highlighting the potential value of this dataset to the community. It will be made publicly available on the NCBI website. We would like to clarify that our intention has always been to make this dataset publicly available; therefore, we take back any incorrect suggestions made in the original submission.

      Reviewer #1 (Recommendations for the authors):

      I would just recommend toning down the excitement about autoimmune TCRs being overrepresented in females. Then the conclusions will be in alignment with their results.

      We thank the reviewer for this constructive recommendation. We would like to express our full support for the editorial transparency policies of eLife, which allow readers to access to both the reviewers’ comments and our detailed responses, enabling them to form their own informed opinions regarding our conclusions.

      Nevertheless, we have moderated some of our wording.

      Reviewer #2 (Public review):

      Summary:

      This study addresses the hypothesis that the strikingly higher prevalence of autoimmune diseases in women could be the result of biased thymic generation or selection of TCR repertoires. The biological question is important, and the hypothesis is valuable. Although the topic is conceptually interesting and the dataset is rich, the study has a number of major issues that require substantial improvement. In several instances, the authors conclude that there are no sex-associated differences for specific parameters, yet inspection of the data suggests visible trends that are not properly quantified. The authors should either apply more appropriate statistical approaches to test these trends or provide stronger evidence that the observed differences are not significant. In other analyses, the authors report the differences between sexes based on a pulled analysis of TCR sequences from all the donors, which could result in differences driven by one or two single donors (e.g., having particular HLA variants) rather than reflect sex-related differences.

      Strengths:

      The key strength of this work is the newly generated dataset of TCR repertoires from sorted thymocyte subsets (DP and SP populations). This approach enables the authors to distinguish between biases in TCR generation (DP) and thymic selection (SP). Bulk TCR sequencing allows deeper repertoire coverage than single-cell approaches, which is valuable here, although the absence of TRA-TRB pairing and HLA context limits the interpretability of antigen specificity analyses. Importantly, this dataset represents a valuable community resource and should be openly deposited rather than being "available upon request."

      We thank the reviewer for highlighting the potential value of this dataset to the community. It will be made publicly available on the NCBI website. We would like to clarify that our intention has always been to make this dataset publicly available; therefore, we take back any incorrect suggestions made in the original submission.

      Weaknesses:

      Major:

      The authors state that there is "no clear separation in PCA for both TRA and TRB across all subsets." However, Figure 2 shows a visible separation for DP thymocytes (especially TRA, and to a lesser degree TRB) and also for TRA of Tregs. This apparent structure should be acknowledged and discussed rather than dismissed.

      We thank the reviewer for this careful observation. Discussing apparent “trends” rather than statistically significant results is indeed a nuanced issue, as over-interpretation of visual patterns is usually discouraged. We agree that, within the specific context of TCR repertoire analyses, visual structures in multivariate projections such as PCA can provide useful contextual information.

      However, we have not identified a striking trend in our representation. We therefore chose to avoid overemphasizing these visual impressions in the text.

      Supplementary Figures 2-5 involve many comparisons, yet no correction for multiple testing appears to be applied. After appropriate correction, all the reported differences would likely lose significance. These analyses must be re-evaluated with proper multiple-testing correction, and apparent differences should be tested for reproducibility in an external dataset (for example, the pediatric thymus and peripheral blood repertoires later used for motif validation).

      As is standard in exploratory immunogenomic studies, including TCR repertoire analyses, our objective was to uncover broad biological patterns rather than to establish definitive statistical associations. In analyses that are discovery-oriented, correction for multiple testing, while essential in confirmatory contexts, is not mandatory and may even obscure meaningful trends by inflating type II error rates. Our objective was therefore to highlight consistent directional patterns across analytical layers, to guide future confirmatory work rather than to make categorical claims.

      We also note that this comment somewhat contrasts with the earlier suggestion to discuss trends that are not statistically significant.

      With regard to the proposal to verify our observations using an external dataset, we are in full agreement that independent confirmation would be beneficial. However, as reviewer 1 rightly emphasized, the generation of such datasets from sorted human thymocyte subsets is “heroic” and has rarely, if ever, been achieved. We are aware of no existing dataset that provides comparable material or analytical depth.

      The available single-cell thymic dataset (Park et al., Science 2020) includes only a few hundred sequences per donor, which is significantly less than the number of sequences in our study. This limited dataset is not adequate for cross-validation or for representing the full complexity of thymic TCR repertoires.

      As with the pediatric thymus dataset, the lack of statistical power in the dataset due to the small number of female subjects (only three) means that sex-related differences in V/J usage cannot be evaluated.

      Finally, the peripheral blood dataset is not appropriate for validating thymic generation or selection processes, as it reflects post-thymic selection and antigen-driven remodeling, making it impossible to distinguish peripheral effects from thymic influences.

      For these reasons, none of the currently available datasets provides a sufficiently clean or powerful framework to test the reproducibility of subtle sex-associated effects on thymic TCR repertoires. Nevertheless, we fully agree that confirmation in an independent and larger cohort will be an important next step to refine these exploratory findings and assess their generalizability to a broader human population.

      Supplementary Figure 6 suggests that women consistently show higher Rényi entropies across all subsets. Although individual p-values are borderline, the consistent direction of change is notable. The authors should apply an integrated statistical test across subsets (for example, a mixed-effects model) to determine whether there is an overall significant trend toward higher diversity in females.

      We agree that Rényi entropies tend to show a consistent direction of change across subsets, with slightly higher values observed in females. In this section, our objective was to provide a descriptive overview of diversity patterns for each thymic subset. This is because these subsets are biologically distinct and therefore require individual analysis, as we previously demonstrated using the same dataset (Isacchini et al, PRX Life. 2024). Therefore, while a mixed-effects approach could in principle be applied to test for an overall trend, such an analysis would rely on the assumption of a common sex effect across heterogeneous cell types.

      It is important to note that the complete dataset has now been made publicly available, enabling interested researchers to perform additional integrative or model-based analyses to further explore these diversity trends.

      Figures 4B and S8 clearly indicate enrichment of hydrophobic residues in female CDR3s for both TRA and TRB (excluding alanine, which is not strongly hydrophobic). Because CDR3 hydrophobicity has been linked to increased cross-reactivity and self-reactivity (see, e.g., Stadinski et al., Nat Immunol 2016), this observation is biologically meaningful and consistent with higher autoimmune susceptibility in females.

      We thank the reviewer for this insightful comment.

      As correctly noted, increased hydrophobicity at specific CDR3β positions has been linked to enhanced cross-reactivity and self-reactivity, as described by Stadinski et al. (Nat Immunol 2016), and we reference this work in the manuscript.

      In our analysis corresponding to Figure 4B (TRB), hydrophobicity was quantified at the sequence level by computing, for each unique CDR3β sequence, the overall proportion of hydrophobic amino acids across the CDR3 loop. This approach aligns with that of Lagattuta et al. (Nat Immunol 2022), whose code we adapted to accommodate longer CDR3s. This global hydrophobicity metric captures overall composition, but, by its construction, does not account for positional context, the key mechanism implicated by Stadinski et al.

      As outlined in our original Figure 4C, the results were obtained through a position-based amino acid analysis. For each CDR3β sequence, we extracted the amino acid at every IMGTdefined CDR3 position (p104–p118) and quantified, at each position, the percentage of unique sequences containing each amino acid. Positions p109 and p110 correspond to the p6–p7 sites highlighted by Stadinski et al. as functionally relevant for self-reactivity. This analysis evaluates positional composition independently of clonotype frequency, focusing specifically on hydrophobic amino acid classes.

      Following the recommendation of the reviewer, the revised manuscript has removed alanine (which is only weakly hydrophobic) has been excluded from the hydrophobic residue set. With this refined definition, we observe a significant enrichment of hydrophobic amino acids at p109 in CD8 T cell repertoires from females, with similar but non-significant trends at p109 in DP and CD4 Teff cells and at p110 in CD8 cells (see new Figure 4C).

      As outlined in the revised Methods, Results, and Discussion sections, Figure 4C focuses exclusively on positional hydrophobic amino acid usage. This was previously implicit, although it was noted in the legend and visually represented in the plots.

      The majority of "hundreds of sex-specific motifs" are probably donor-specific motifs confounded by HLA restriction. This interpretation is supported by the failure to validate motifs in external datasets (pediatric thymus, peripheral blood). The authors should restrict analysis to public motifs (shared across multiple donors) and report the number of donors contributing to each motif.

      We fully agree that donor-specific and HLA-restricted motifs represent a major potential confounder in repertoire-level comparisons. To minimize this potential bias, our analysis was explicitly restricted to public motifs, as clearly stated in the Materials and Methods section:

      “Additional filters were applied so that: (i) a motif includes public CDR3aa sequences (shared by at least two individuals); (ii) a significant enrichment is detected (Fisher’s exact test, p < 0.01); and (iii) a usage difference between groups of at least twofold (Wilcoxon test, p < 0.05).”

      Accordingly, every motif reported in the manuscript is supported by at least two independent donors, ensuring that no motif reflects an individual- or HLA-specific effect (see Supplementary Figures 10-13[previously Supplementary Figure 9]). We have now added a more explicit mention of the number of donors contributing to each motif in the figure legend and have clarified this point in the revised Methods and Results sections to make this criterion more visible to readers.

      When comparing TCRs to VDJdb or other databases, it is critical to consider HLA restriction. Only database matches corresponding to epitopes that can be presented by the donor's HLA should be counted. The authors must either perform HLA typing or explicitly discuss this limitation and how it affects their conclusions.

      We respectfully disagree with the assertion that HLA typing is necessary for the type of comparative analysis we have conducted. While it is true that HLA molecules present peptides to TCRs and thereby contribute to the tripartite interaction determining T cell activation, extensive evidence indicates that the CDR3 region, particularly CDR3β, is the dominant determinant of antigen specificity. This finding is supported by structural and computational studies (Madi et al., eLife, 2017; Huang et al., Nat. Biotech., 2020; MayerBlackwell et al., Methods Mol. Biol., 2022) showing that CDR3β residues are responsible for the majority of peptide contacts, whereas CDR1 and CDR2 primarily interact with the MHC framework.

      As emphasized in several recent benchmarking studies (e.g., Springer et al., Front Immunol, 2021), CDR3β sequence composition alone captures most of the information required for specificity inference. Consequently, widely used and validated computational tools such as GIANA (Zhang et al. Nat. Commun. 2021), iSMART (Zhang et al. Clin. Cancer Res. 2020), and ATMTCR (Cai et al. Front. Immunol. 2022) rely exclusively on CDR3β aminoacid sequences and still achieve high predictive performance.

      Our analysis aligns with this well-established paradigm. While we agree that integrating donor HLA typing would refine epitope-level annotation and reduce potential noise, the absence of HLA data does not invalidate the comparative framework we used, which focuses on relative representation of annotated specificities across groups rather than on individual TCR–HLA–peptide triads.

      Although the age distributions of male and female donors are similar, the key question is whether HLA alleles are similarly distributed. If women in the cohort happen to carry autoimmuneassociated alleles more often, this alone could explain observed repertoire differences. HLA typing and HLA comparison between sexes are therefore essential.

      To address the issue of any potential differences in HLA background, we examined the subset of adult donors for whom HLA typing information was available (HLA-A, HLA-B, HLADR, and HLA-DQB; n = 16). Within this subset, the distribution of HLA alleles was relatively balanced between males and females (as illustrated by the heatmap showing HLA class II expression patterns and HLA class I family grouping in Author response image 1). This analysis suggests that the sex-associated differences in the repertoire observed in our study are unlikely to be driven solely by unequal representation of autoimmune-associated HLA alleles.

      We acknowledge, however, that complete HLA information was not available for all donors, which remains a limitation of the dataset.

      Author response image 1.

      In some analyses (e.g., Figures 8C-D) data are shown per donor, while others (e.g., Fig. 8A-B) pool all sequences. This inconsistency is concerning. The apparent enrichment of autoimmune or bacterial specificities in females could be driven by one or two donors with particular HLAs. All analyses should display donor-level values, not pooled data.

      While Figures 8A–B present pooled data to summarize global trends, the corresponding donor-level analyses were provided in Supplementary Figures 15B and 16 (previously Supplementary Figures 11B and 12). In these, each individual is shown separately, with each point representing an individual. It is important to note that these donor-resolved plots do not reveal any sample-specific driver: the patterns observed in the pooled data remain consistent across donors, without any single individual accounting for the apparent enrichments. As outlined in the revised manuscript, readers now directed to the relevant supplementary figures for further clarification.

      The reported enrichment of matches to certain specificities relative to the database composition is conceptually problematic. Because the reference database has an arbitrary distribution of epitopes, enrichment relative to it lacks biological meaning. HLA distribution in the studied patients and HLA restrictions of antigens in the database could be completely different, which could alone explain enrichment and depletions for particular specificities. Moreover, differences in Pgen distributions across epitopes can produce apparent enrichment artifacts. Exact matches typically correspond to high-Pgen "public" sequences; thus, the enrichment analysis may simply reflect variation in Pgen of specific TCRs (i.e., fraction of high-Pgen TCRs) across epitopes rather than true selection. Consequently, statements such as "We observed a significant enrichment of unique TRB CDR3aa sequences specific to self-antigens" should be removed.

      We respectfully disagree with the conclusion that our enrichment analysis lacks biological meaning. Our approach directly involves a direct comparison of the same set of observed TCR sequences between males and females. Consequently, any potential biases related to generation probability (Pgen), which affect all sequences equally, cannot account for the observed sex-specific differences. To summarize, because the comparison is performed on the same set of sequences, changes in the probability of generation across epitopes cannot explain the differences seen between the sexes.

      We do agree, however, that the composition of the reference databases may influence apparent enrichment patterns, as these resources contain uneven distributions of epitope categories and often incomplete information regarding HLA restriction. It should be noted that this limitation is inherent to all database-based annotation approaches, a fact which is explicitly acknowledged in the revised Discussion.

      The overrepresentation of self-specific TCRs in females is the manuscript's most interesting finding, yet it is not described in detail. The authors should list the corresponding self-antigens, indicate which autoimmune diseases they relate to, and show per-donor distributions of these matches.

      We thank the reviewer for this constructive suggestion.

      As recommended, we have expanded the description of the self-specific TCRs identified in our dataset and now provide this information in Supplementary Table 2 of the revised manuscript. Specifically, the table lists the corresponding self-antigens and the autoimmune diseases with which they are associated. In our curated database, these annotations primarily correspond to celiac disease and type 1 diabetes, which were the two autoimmune contexts explicitly defined in the manually curated reference datasets.

      For the “cancer” specificity group, we have clarified that antigen assignments were established based on (i) annotations available in the original databases (IEDB, VDJdb, McPAS-TCR) and (ii) cross-referencing with additional resources, including the Human Protein Atlas, the Cancer Antigenic Peptide Database (de Duve Institute), and the Cancer Antigen Atlas (Yi et al., iScience 2021), to ensure consistency in the classification of cancer and neoantigen specificities. Please refer to the Materials and Methods section for a full description of the procedure for this specific assignment.

      Donor-level distributions of these self-specific matches are now shown in Supplementary Figures 15B and 16 (previously Supplemental Figures 11B and 12), allowing direct visualization of inter-donor variability. Importantly, these plots confirm that the observed enrichment in females is not driven by a single individual, further supporting the robustness of the finding.

      The concept of poly-specificity is controversial. The authors should clearly explain how polyspecific TCRs were defined in this study and highlight that the experimental evidence supporting true polyspecificity is very limited (e.g., just a single TCR from Figure 5 from Quiniou et al.).

      We certainly agree (and regret) that the concept of TCR polyspecificity remains a subject of debate and often underappreciated in the field of immunology. As Don Mason famously discussed in his seminal essay “A very high cross-reactivity is an essential feature of the TCR” (doi: 10.1016/S0167-5699(98)01299-7) published over 25 years ago, both theoretical and experimental evidence indicates that each TCR can, in principle, recognize millions of distinct peptides, albeit with variable avidity.

      Although this principle is widely accepted, it is frequently overlooked in the field of experimental immunology. In this area, anything that deviates from strict monospecificity is often disregarded as noise.

      In our own analyses of large-scale TCR repertoires, we have repeatedly observed that many CDR3 sequences are annotated with multiple specificities across different databases, often corresponding to peptides from unrelated organisms. As demonstrated in Quiniou et al. (eLife 2023), such polyreactive TCRs exhibit distinctive features, including biased physicochemical composition, and tend to be enriched in various biological contexts. In our preliminary study of such TCRs, which have the capacity to be specific for multiple viral- and self- epitopes, we hypothesized that they may serve as a first line of defense against pathogens and also be involved in triggering autoimmunity. We therefore consider it important to report this phenomenon rather than omit it, especially given its potential relevance to both protective immunity and autoimmunity.

      In the present study, polyspecific TCRs were defined operationally as TRB CDR3aa sequences associated with a minimum of two distinct specificity groups, corresponding either to different microbial species or to multiple antigen categories within the curated database. Therefore, our definition captures broader antigenic groupings rather than epitope-level binding events.

      We fully acknowledge that direct experimental evidence for true molecular-level polyspecificity remains limited. Indeed, as the reviewer notes, only a single TCR with multiepitope reactivity has been rigorously demonstrated to date (Quiniou et al.2023). Consequently, our analysis does not make claims about structural promiscuity; instead, it uses database-annotated cross-reactivity as a proxy to explore broader repertoire-level patterns.

      As outlined in the Methods section, this definition has been clarified and its discussion expanded in the Discussion to explicitly address these conceptual and methodological nuances.

      Minor:

      Clarify why the Pgen model was used only for DP and CD8 subsets and not for others.

      As noted, computing Pgen values involves two steps: (i) training a generative model of V(D)J recombination using IGoR, and (ii) estimating generation probabilities with OLGA based on that model. Both steps require a significant amount of computing power, especially when applied to large repertoires across multiple subsets. For this reason, we focused the analysis on DP thymocytes, which represent the repertoire prior to thymic selection, and CD8 T cells after CD8 selection.

      The Methods section should define what a "high sequence reliability score" is and describe precisely how the "harmonized" database was constructed.

      Briefly, the annotated database used in this study was constructed in accordance with the procedure established in our previously published work (Jouannet et al., NAR Genomics and Bioinformatics, 2025). The study integrates three publicly available resources, IEDB, VDJdb, and McPAS-TCR, which were collected as of October 2023. These three datasets were then merged into a single harmonized compendium, undergoing extensive standardization. When entries shared identical information across databases (same V–CDR3–J for both TRA and TRB, same epitope, organism, PubMed ID, and cell subset), only one representative was kept; discrepant or incomplete entries were retained to preserve information. We then assigned a sequence reliability score, the Verified Score (VS), following the verification strategy used by IEDB. The scale ranges from 0 to 2 and reflects the concordance between calculated and curated TRA/TRB CDR3 sequences (2 = both TRA and TRB present are verified, 1.1 = only TRA verified, 1.2 = only TRB verified, 0 = no verified chain). A second score, the Antigen Identification Score (AIS), is used to rank antigen-identification methods on a scale of 0 to 5, according to the strength of the experimental evidence supporting them.

      In the present study, “high reliability” refers to sequences with a verified TRB CDR3aa chain (VS ≥ 1.2) and an AIS score corresponding to T cells in vitro stimulation with a pathogen, protein or peptide, or pMHC X-mer sorting (> 3.2, excluding categories 4.1 and 4.2), ensuring that downstream analyses were performed on a rigorously curated and biologically trustworthy dataset. The Methods section now explicitly details these criteria.

      The statement "we generated 20,000 permuted mixed-sex groups" is unclear. It is not evident how this permutation corrects for individual variation or sex bias. A more appropriate approach would be to train the Pgen model separately for each individual's nonproductive sequences (if the number of sequences is large enough).

      The objective of this analysis was to determine whether the enrichment of TRBV06-5 in females was due to random grouping of individuals or whether it was attributable to sex itself. To do so, we generated all possible perfectly mixed groups of donors (i.e., groups containing an equal number of male and female donors) for the concerned thymocyte subset, and then performed 20,000 random pairwise comparisons between such mixed groups. For each comparison, we tested the TRBV06-5 usage between the two mixed groups. This procedure directly evaluates whether group composition (independent of sex) could spuriously generate differences in TRBV usage. Notably, none of these 20,000 comparisons between the two mixed groups yielded a statistically significant difference in TRBV06-5 usage. In contrast, when comparing the true male and female groups, a significant difference was identified. This demonstrates that the signal we observe is not driven by random donor grouping or individual-level variation, but is specifically associated with sex. It is important to note that this analysis, which is designed to exclude spurious group effects, is rarely performed in published repertoire studies, yet it provides an important internal control for robustness.

      Reviewer #2 (Recommendations for the authors):

      (1) Data availability "upon request" is unacceptable. All raw and processed data, as well as scripts used for analysis and figure generation, must be publicly deposited before publication.

      We would like to clarify that our intention has always been to make this dataset publicly available. It was a mistake to suggest otherwise in the original submission.

      (2) At the beginning of the Results section, include a brief description of the dataset: number of donors, sex ratio, age range, number of samples per subset, and sorting strategy. Although Figure 1 shows this, the information should also be mentioned in the main text.

      In line with the recommendation, we have now added a summary of the cohort characteristics at the beginning of the Results section. This includes the number of donors, sex ratio, age range, number of samples per subset, and the sorting strategy used. While this information was already included in Figure 1, we concur that including it directly in the main text enhances readability.

      (3) Report the number of cells and unique clonotypes analyzed per individual. Rank-frequency plots (in log-log coordinates) would be helpful.

      We have now added, for each donor and each subset, the number of cells, and additionally for each chain, the number of total and unique clonotypes analyzed. This information is provided in the revised manuscript in a new supplementary table (Supplemental Table 1).

      These plots have been integrated into the revised manuscript as Supplementary Figure 2.

      (4) For analysis in Figure 4B, the total fraction of hydrophobic amino acids should be calculated for each patient separately, and values for men and women should be compared (analogously to Figure 4C, but for the whole CDR3 and excluding alanine).

      Please note that the TRB CDR3aa composition in Figure 4B has already been quantified at the individual level. For each unique TRB CDR3aa sequence, we computed the proportion of each of the 20 amino acids across the CDR3β loop, then summarized these values per donor (mean per individual). The log2 fold change displayed in Figure 4B (and supplemental Figure 9 for TRA) is calculated from the median donor-level values for females versus males, rather than from pooled CDR3s. It is intended as descriptive, “global” view of amino acid usage within the central CDR3 region. Hydrophobicity was not used directly in the computation, but is indicated only by bar color, based on the Kyte-Doolittle- derived IMGT classification. This provides an observational overview of amino acid composition in the central CDR3 region.

      As the mechanistic link between hydrophobicity and self-reactivity described by Stadinski et al. is explicitly position-dependent, we consider positional analyses to be the most appropriate method for formally interrogating this hypothesis, as we did in Figure 4C. Here, our primary focus was on the position-specific usage of hydrophobic amino acids at IMGT positions p109-p110. These positions correspond to the central p6-p7 positions described by Stadinski et al. For each individual, we computed the proportion of unique TRB CDR3aa sequences carrying a hydrophobic amino acid at a given position.

      Accordingly, in the revised manuscript we refined the Figure 4C by excluding alanine due to its weak hydrophobic property (as recommended by the reviewer) This positional composition analysis now reveals a statistically significant increase in hydrophobic usage at p109 in female CD8 repertoires, with similar, though non-significant, trends at p109 in DP and CD4Teff ad at p110 in CD8 cells. Figure 4B is therefore retained as an exploratory overview of amino acid composition usage along the CDR3 loop, while Figure 4C is used for the more specific question of hydrophobicity and potential cross-reactivity.

      The Methods section has been expanded to provide clearer descriptions of these computations, and the Results and Discussion sections corresponding to Figures 4B-C (and supplemental Figure 9) have been revised to make the rationale, implementation, and interpretation of these hydrophobicity analyses more explicit.

      (5) Figure 6 shows a trend toward higher clustering of Treg TCRs in males, which could relate to the lower incidence of autoimmunity in men. The authors could test whether specific Treg clusters are male-specific and shared among male donors.

      As shown in Figure 6, a clear trend towards higher similarity among Treg CDR3aa sequences in males is evident, as indicated by the proportion of sequences included in clusters and in the overall similarity density. However, identifying “male-specific clusters” shared across donors is not straightforward in our analytical framework.

      In our approach, for each cell subset, CDR3aa sequences were downsampled 100 times to the smallest sample size, and clustering was repeated at each iteration. Therefore, the clusters’ identities are not consistent across iterations. The clusters depend on the specific subset of sequences selected at each downsampling step, as well as on their underlying Pgen distribution. Therefore, it is not possible to reliably assess whether specific clusters are systematically “male-shared”. This is because cluster composition is a function of stochastic resampling rather than of biological structure. For this reason, a comparison of cluster identities across donors would not produce interpretable results.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their constructive feedback, which has helped preparing a substantially improved manuscript. In response to concerns about the conceptual distinction between prediction and stimulus dependency, we have fundamentally restructured the paper around the notion of passive control systems. This involved rewriting the Abstract, Introduction, and large portions of the Results (~60% of text revised).

      Key changes:

      - New analyses on Goldstein et al. (2022) data. We demonstrate that our findings—including the insufficiency of proposed corrections—generalise to the original dataset (Figures S2B, S3B, S5C, S6B).

      - Clarified novel contribution. We now make explicit that prior control analyses (residualisation, bigram removal) do not address the concern, because hallmarks persist in passive systems that cannot predict.

      - Proposed criterion for future work. Pre-onset neural encoding can only count as evidence for prediction if it exceeds a passive baseline (e.g., acoustics).

      We believe the revision offers a clearer, more rigorous contribution and provides a constructive framework for evaluating claims of neural prediction.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper tackles an important question: What drives the predictability of pre-stimulus brain activity? The authors challenge the claim that "pre-onset" encoding effects in naturalistic language data have to reflect the brain predicting the upcoming word. They lay out an alternative explanation: because language has statistical structure and dependencies, the "pre-onset" effect might arise from these dependencies, instead of active prediction. The authors analyze two MEG datasets with naturalistic data.

      Strengths:

      The paper proposes a very reasonable alternative hypothesis for claims in prior work. Two independent datasets are analyzed. The analyses with the most and least predictive words are clever, and nicely complement the more naturalistic analyses.

      Weaknesses:

      I have to admit that I have a hard time understanding one conceptual aspect of the work, and a few technical aspects of the analyses are unclear to me. Conceptually, I am not clear on why stimulus dependencies need to be different from those of prediction. Yes, it is true that actively predicting an upcoming word is different from just letting the regression model pick up on stimulus dependencies, but given that humans are statistical learners, we also just pick up on stimulus dependencies, and is that different from prediction? Isn't that in some way, the definition of prediction (sensitivity to stimulus dependencies, and anticipating the most likely upcoming input(s))?

      We thank the reviewer for this comment, which highlights that the previous version wasn’t sufficiently clear. Conceptually, the difference is critical: it is the difference between passively encoding or representing the stimulus (like e.g., a spectrogram of the stimulus would), and actively generating predictions.

      We have substantially changed the framing of the paper to put the notion of control systems centre-stage. One such control system is the speech acoustics: they encode the stimulus (and thus its dependencies) but cannot predict. When we observe the "hallmarks of prediction" in acoustics, this demonstrates the hallmarks can arise without any prediction.

      This brings me to some of the technical points: If the encoding regression model is learning one set of regression weights, how can those reflect stimulus dependencies (or am I misunderstanding which weights are learned)? Would it help to fit regression models on for instance, every second word or something (that should get rid of stimulus dependencies, but still allow to test whether the model predicts brain activity associated with words)? Or does that miss the point? I am a bit unclear as to what the actual "problem" with the encoding model analyses is, and how the stimulus dependency bias would be evident. It would be very helpful if the authors could spell out, more explicitly, the precise predictions of how the bias would be present in the encoding model.

      Different weights are estimated per time point in the time-resolved regression. This allows the model to learn how the response to words unfolds, but also to learn different stimulus dependencies at each timepoint. Fitting on every second word would reduce but not eliminate the problem. Our control system approach provides a more principled test. We have clarified the mechanism in the Introduction (lines 82-90), explaining how correlations between neighbouring words allow the regression model to predict prior neural activity without assuming pre-activation.

      Reviewer #2 (Public Review):

      Summary:

      At a high level, the reviewers demonstrate that there is an explanation for pre-word-onset predictivity in neural responses that does not invoke a theory of predictive coding or processing. The paper does this by demonstrating that this predictivity can be explained solely as a property of the local mutual information statistics of natural language. That is, the reason that pre-word onset predictivity exists could simply boil down to the common prevalence of redundant bigram or skip-gram information in natural language.

      Strengths:

      The paper addresses a problem of significance and uses methods from modern NeuroAI encoding model literature to do so. The arguments, both around stimulus dependencies and the problems of residualization, are compellingly motivated and point out major holes in the reasoning behind several influential papers in the field, most notably Goldstein et al. This result, together with other papers that have pointed out other serious problems in this body of work, should provoke a reconsideration of papers from encoding model literature that have promoted predictive coding. The paper also brings to the forefront issues in extremely common methods like residualization that are good to raise for those who might be tempted to use or interpret these methods incorrectly.

      Weaknesses:

      The authors don't completely settle the problem of whether pre-word onset predictivity is entirely explainable by stimulus dependencies, instead opting to show why naive attempts at resolving this problem (like residualization) don't work. The paper could certainly be better if the authors had managed to fully punch a hole in this.

      We thank the reviewer for their assessment.

      We believe our paper does punch the hole that can be punched, which is a hole in the method. Our control demonstrates that adjusting the features (X matrix) cannot address dependencies that persist in the signal itself (Y matrix). Because the hallmarks emerge in a system that cannot predict (even after linearly removing the previous stimulus) attributing pre-onset encoding performance to neural prediction (rather than stimulus structure) is fundamentally ambiguous, and different (e.g. variance partitioning) approaches would suffer from the same ambiguity. We have reframed the manuscript to make this argument more clearly.

      Reviewer #3 (Public Review):

      Summary:

      The study by Schönmann et al. presents compelling analyses based on two MEG datasets, offering strong evidence that the pre-onset response observed in a highly influential study (Goldstein et al., 2022) can be attributed to stimulus dependencies, specifically, the auto-correlation in the stimuli—rather than to predictive processing in the brain. Given that both the pre-onset response and the encoding model are central to the landmark study, and that similar approaches have been adopted in several influential works, this manuscript is likely to be of high interest to the field. Overall, this study encourages more cautious interpretation of pre-onset responses in neural data, and the paper is well written and clearly structured.

      Strengths:

      (1) The authors provide clear and convincing evidence that inherent dependencies in word embeddings can lead to pre-activation of upcoming words, previously interpreted as neural predictive processing in many influential studies.

      (2) They demonstrate that dependencies across representational domains (word embeddings and acoustic features) can explain the pre-onset response, and that these effects are not eliminated by regressing out neighboring word embeddings - an approach used in prior work.

      (3) The study is based on two large MEG datasets, showing that results previously observed in ECoG data can be replicated in MEG. Moreover, the stimulus dependencies appear to be consistent across the two datasets.

      We’d like to thank the reviewer for their comments on our preprint.

      Weaknesses:

      (1) To allow a more direct comparison with Goldstein et al., the authors could consider using their publicly available dataset.

      We thank the reviewer for this suggestion. The Goldstein dataset was not publicly available when we conducted this research. However, we have now applied our control analyses to their stimulus material, and found that the exact same problem applies to their dataset, too.

      We have added analyses of the Goldstein et al. (2022) podcast stimulus throughout the paper. Results are shown in Figures S2B, S3B, S5C, and S6B. Critically, we observe the same pattern: both hallmarks emerge in the acoustic control system, and residualisation fails to eliminate them. This demonstrates that our findings generalise to the very dataset used to establish pre-onset encoding as evidence for neural prediction.

      (2) Goldstein et al. already addressed embedding dependencies and showed that their main results hold after regressing out the embedding dependencies. This may lessen the impact of the concerns about self-dependency raised here.

      We thank the reviewer for raising this point, as it reveals we failed to convey a central argument in the previous version. Goldstein et al.'s control analysis did not address the concern. We show that even after the control analyses that Goldstein et al. perform (removing bigrams, regressing out embedding dependencies) the "hallmarks of prediction" still emerge when applying the analysis to a passive control system that by definition does not predict: the speech acoustics. We now also show this in their data.

      To better convey this critical point, around the concept of "passive control systems". We now first establish that the hallmarks appear in acoustics (Figure 3), then show that residualisation fails to remove them (Figure 4). This makes explicit that any claim about "controlling for dependencies" must be validated against a system that cannot predict.

      (3) While this study shows that stimulus dependency can account for pre-onset responses, it remains unclear whether this fully explains them, or whether predictive processing still plays a role. The more important question is whether pre-activation remains after accounting for these confounds.

      We thank the reviewer for this question, and we agree that the question whether pre-activation occurs is an important and interesting one. However, we ask a different question in our study: Our goal is not to definitively establish whether the brain predicts during language processing; it is to scrutinise what counts as evidence for prediction, and to correct for some highly influential claims made in the literature. The reviewer asks whether pre-activation remains "after accounting for these confounds." But the point we are trying to make is that in this analytical framework, one cannot analytically account for these confounds: corrections to the X matrix leave dependencies in the data itself intact, as the acoustic control demonstrates.

      We do offer recommendations for future work. The passive control systems approach can serve as a benchmark: pre-onset neural encoding (or decoding) can only count as evidence for prediction if it exceeds what is observed in a passive control system like acoustics (which is not what we observe). Additionally, the field could move toward less naturalistic stimuli with tighter experimental controls, reducing the correlations that make this attribution so difficult. Developing a new definitive test is beyond the scope of our paper, but we believe applying this benchmark is a necessary first step.

      To make this clearer, we have rewritten the Discussion to explicitly state this criterion (lines 331-340) and to outline these recommendations for future work (lines 337-340). We have also added a paragraph extending our argument to decoding approaches (lines 343-354), noting that the same ambiguity applies regardless of analytical direction.

      Recommendations for Authors:

      Reviewer #1 (Recommendations for Authors):

      As per my "Weakness" point, I would appreciate engagement with the conceptual point related to the difference between prediction and stimulus correlations. Most importantly, I hope the authors will spell out more explicitly which predictions their proposal makes, and how exactly those would be present in an encoding model.

      Our proposal makes a clear prediction: if pre-onset encoding can be explained by stimulus dependencies (essentially a confound in the analysis) the same hallmarks should emerge in passive control systems that encode the stimulus but do not predict. We test this with word embeddings and speech acoustics, and both show hallmarks despite not doing any prediction.

      Reviewer #2 (Recommendations for Authors):

      I greatly enjoyed reading the paper and only have minor quibbles. The work is overdue and will no doubt be a valuable addition to the literature to push back on over-hyped claims about the implications of pre-word predictivity in neural response. I have few issues with the methods that the paper uses, they seem sensible and in line with previous work that has investigated these questions, and I did not find typos.

      One point I would like to raise is whether or not there is a more effective solution to resolving the issues behind residualization that the paper demonstrates. The authors show that removing next-word information does not effectively resolve the problem that local relationships in the stimulus dataset pose. The challenge to me here seems to be that it is difficult to get a model to "not learn" a relationship that is learnable. I wonder if a better solution to this is to not try to get a model to exclude a set of information but instead to do some sort of variance partitioning where you train a model to predict the next-word representation from the current-word representation (as in the self-predictivity analysis) and then build an encoding model out of the predicted representation. Then, compare the pre-word-onset encoding performance of the prediction with the pre-word-onset encoding performance of the original representation. If the performance of the two models roughly matches, that would be strong evidence that most of what these models are capturing before word onset is just explainable by the stimulus dependencies, no?

      We would like to thank the reviewer for their kind words and positive appraisal!

      The proposed analysis is that if a linear proxy representation, w_hat_t – predicted linearly from w_{t-1} – yields pre-onset predictivity comparable to the actual w_t vector, this would support that the effect can be explained by stimulus dependencies. While this is an interesting alternative analysis, we would be cautious about the inverse conclusion: that if w_t outperforms the linear proxy w_hat_t, the residual variance must reflect true neural prediction.

      This is because of our control system results. We show that even when we remove the "predictable" shared variance – which is similar to computing the difference between w_t and w_hat_t – the unique information still yields pre-onset predictivity, albeit reduced, in the passive acoustics that by definition cannot predict. Therefore, instead of developing an ever-more-clever way to "correct" for the problem by adjusting the X matrix, we focus on showing that the problem lies in the stimulus itself. For the revision, we focused on reframing the problem and hope we have punched a fuller hole in the logic by breaking down the fundamental issue more clearly and showing it applies to the stimulus material of Goldstein et al. (2022) as well.

      Additionally, I would say that I was a bit confused about what was going on in the methods figures, to the point where I do not see the value in having them, but thankfully, the text was clear enough to resolve that confusion.

      We are sad the methods illustration wasn’t helpful. In presentations we have found that the illustrations were generally helpful to bring the analysis across, e.g. the aspect of keeping the analysis identical but simply replacing the brain data with either word vectors (current Figure 2) and acoustics (current Figure 3). In the revision we have reorganised the schematics slightly, we introduce the acoustics as a control system earlier, to separately introduce residualisation and its insufficiency (Figure 4). We hope this helps

      Reviewer #3 (Recommendations for Authors):

      (1) My major concern is the extent to which this study offers new insights beyond what was already demonstrated in Goldstein's work. First, the embedding dependency highlighted by the authors seems somewhat expected, given how these embeddings are constructed: GloVe embeddings are based on word co-occurrence statistics, and GPT embeddings are combinations of embeddings of preceding words. More importantly, Goldstein et al. addressed this issue by regressing out neighboring word embeddings. This control was effective, as also confirmed by the current manuscript, and their main results remain. Therefore, the embedding dependency appears to have been properly accounted for in the earlier study.

      Building on the previous point, I appreciate the analysis of dependencies across representational domains, which I see as the main novel contribution of this manuscript. I would encourage the authors to explore this aspect more deeply. If I understand correctly, stimulus dependencies may persist even after regressing out neighboring word embeddings due to two potential factors:

      (a) Temporal dependencies in embeddings: since the regression of neighbor words is performed at the word level rather than over time, temporal dependency may remain.

      (b) Cross-feature dependencies - specifically, correlations between embeddings and acoustic features.

      Regarding the first factor, it is not entirely clear to me whether this is a real problem—i.e., whether word-level regression fails to remove temporal dependencies. A simulation could help clarify this and support the argument. While it's not essential, it would be valuable if the authors could propose a method to address this issue, or at least outline it as a direction for future work.

      For the second point, it would be helpful for the authors to explicitly explain the potential relationship between word embeddings and acoustic features. Additionally, while correlations between features are a common problem in speech research, they are typically addressed by regressing out acoustic features early in the analysis (Gwilliams et al., 2022). It would strengthen the current findings if the authors could test whether the self-predictability persists even after controlling for neighboring embeddings and acoustic features.

      We appreciate the extensive and detailed engagement with our work, which has been very useful in highlighting key unclarities and gaps we had to address.

      We do believe our study goes well beyond what was shown by Goldstein, by identifying a fundamental limitation in their analysis, and showing that their purported control analyses do not in fact control for the problem. We’ll address the reviewers' sub-questions in turn.

      (i) Why this offers crucial insights beyond Goldstein et al.

      While Goldstein et al. indeed addressed embedding dependencies via residualization (or in their case projection), their conclusion relied on the assumption that any neural encoding surviving this "fix" must reflect genuine predictive pre-activation. Our study invalidates this assumption. By applying the residualization fix, we show that the "hallmarks of prediction" persist just as robustly in a passive control system that cannot predict (the speech acoustics) as in the neural data. (We also show this for bigram removal.)

      This provides a key new insight: persistent pre-onset predictivity after “correction” is not evidence that the dependency issue was solved. Instead, because the same effect persists in a system that cannot predict (acoustics), the persistence of the hallmarks cannot be attributed to prediction. It demonstrates that the standard "fix" is mathematically insufficient to remove the confound, rendering the original evidence for neural prediction fundamentally ambiguous.

      (ii) Why do dependencies/hallmarks persist after residualization?

      Residualization successfully removes the linear dependency between the current embedding (w_t) and the previous embedding (w_{t-1}) within the feature space. However, it does not (and cannot) remove the dependency from language itself, and therefore from the brain which (in some format) encodes the linguistic stimulus. Language is massively redundant. Knowing the current word tells you something about what came before – acoustically, syntactically, semantically. As long as the embedding identifies the word, the regression model will re-learn this relationship. For instance, in the case of acoustics, even when using the corrected embedding, the regression will re-learn that certain words (e.g., "Holmes") tend to follow certain acoustic patterns (e.g., the acoustics of "Sherlock"). “This shows that correcting the embeddings is insufficient: the dependencies exist in language itself, and the model will re-learn them from any signal that encodes that language.”

      (iii) Why not regress out the acoustics?

      This is also why "regressing out acoustics" (as the reviewer suggests) would miss the point. We do not claim that acoustic features leak into the neural signal or that acoustics are a specific confound to be removed. Rather, we use acoustics as a “passive baseline”: a system that encodes the stimulus but cannot predict. That the method yields "hallmarks of prediction" in this baseline demonstrates these hallmarks are not valid evidence for prediction—regardless of what additional features one regresses out. This motivates our proposed criterion: future studies seeing evidence for neural pre-activation should not rest on finding pre-onset encoding per se, since passive systems show this too. Rather, it should require demonstrating that the brain signal contains more information about the upcoming word than the passive stimulus baseline.

      As these aspects are fundamental to the interpretation of our study, we have fundamentally re-organised and re-wrote large parts of the paper. We hope it is much clearer now.

      (2) To better compare to Goldstein's work, the author may consider performing the same analyses using their publicly available dataset.

      This is a good suggestion. When we initially conducted this research, the Goldstein dataset was not yet publicly available. It now is, and we have applied our analyses to their stimulus material. The same problem emerges: the hallmarks of prediction appear in the acoustics of their podcast stimuli. Even after applying the control analyses, pre-onset predictivity is robust in their acoustics (indeed, in correlation terms, higher than reported for the neural data, so there is not more predictivity in the brain than in the stimulus material), confirming that the issue we identify applies to the original dataset. Results are shown in Figures S2B, S3B, S5C, and S6B.

      (3) It is also interesting to show the predictability effect after word onsets for self-predictability analyses, for example, in Figure 2C. The predictability effect is not only reflected in pre-onset responses but also in post-onset responses, i.e., larger responses for unpredicted words. Whether the stimulus dependency mirror this effect?

      Our paper focuses specifically on temporal dependencies – the capacity of the current word to predict the previous stimulus signal (e.g., previous acoustics, previous embeddings) – and how this mimics neural pre-activation. Post-onset analyses, by contrast, concerns the mapping between the current word and its concurrent signal, which involves fundamentally different mechanisms (e.g., mapping fidelity, frequency effects, acoustic clarity, word length) and would require the consideration of covariates of the attributes of the word post-onset to meaningfully interpret. Post-onset, there can be differences between predictable and non predictable words – e.g. sometimes unpredictable words are pronounced with more emphasis – which is why surprisal studies include a large range of covariates. However, this is not about stimulus dependencies or pre-activation, so we consider it is beyond scope of our study.

      (4) The authors might consider reporting the encoding performance for the residual word embeddings, similar to Figure S6B in Goldstein's paper. This would allow us to determine whether pre-activation persists in the MEG responses and compare its pattern with the predictability of pre-onset acoustics.

      We do report this analysis, in the revised supplement it is shown in Figure S7. We placed it in the supplement precisely because residualized embeddings are not the "fix" they appear to be: as we show, they still yield strong pre-onset predictivity in the passive acoustic baseline (Figure 4, S6), undermining their use as a control.

      (5) The series of previous pre-activation analyses proposed fruitful findings, e.g., the difference between brain regions (Fig. S4, (Goldstein et al., 2022)) and the difference between listeners and speakers (Figure 2, (Zada et al., 2024)). Whether these observed differences can be explained by the stimulus dependency?

      We appreciate this question. Our goal is to address the general logic of using pre-onset encoding as evidence for prediction, rather than to critique every finding in specific papers, especially as it pertains to a specific author. But briefly:

      Speaker vs. Listener differences (Zada et al., 2024): Zada et al. report distinct temporal profiles: speaker encoding peaks pre-onset (planning?), whereas listener encoding peaks post-onset but shows a pre-onset "ramp." Our critique applies to interpreting this ramp as "prediction." However, this interpretation is not central to their paper, which focuses on speaker-listener coupling via shared embedding spaces. We leave the implications (which are clear enough) to the reader.

      Regional differences (Goldstein et al., 2022): Encoding timecourses do vary across electrodes, as we also observe across MEG sources (and participants). But our point is logical: because pre-onset encoding does not necessarily reflect prediction, finding a channel with stronger pre-onset encoding does not mean that channel performs “more prediction”. For instance, one subject in the Armeni dataset showed higher pre-onset than post-onset encoding (and indeed activity) overall – but it would be implausible to conclude this subject "only predicts" and does not “process” or “listen”. More likely, this reflects differences in signal-to-noise, integration windows, or source contributions. The exact sources of these morphological differences are interesting but unclear, and speculating on them is beyond our scope.

      (6) I appreciate that the authors have shared their code; however, some parts appear to be missing. For example, the script encoding_analysis.py only includes package-loading code.

      Thank you for noticing, we have updated our code database.

      (7) What do the error bars in the figures represent - for example, in Figure 1C? How many samples were included in the significance tests? The difference between the two curves appears small, yet it is reported as significant. Additionally, Figure S1 shows large differences between subjects and between the two MEG datasets. Do the authors have any explanation for these differences?

      The shaded areas in our previous Figure 1c) show 95% confidence intervals computed over the 100 MEG sources identified to be part of the bilateral language system and the 10 cross-validation splits.

      We do not have an elaborate explanation for the differences in encoding performance across the three subjects in the few-subject dataset. Instead, we interpret these differences as a likely consequence of substantial inter-individual variability in evoked responses, even at the source level, arising from differences in cortical folding and the orientation of underlying current dipoles. We deem this a likely explanation since different electrodes in Goldstein’s ECoG data also showed very different encoding profiles.

      With respect to the multi-subject dataset, we suspect that the large differences stem most likely from two substantial differences: First, the acoustics were purposefully manipulated by the experimenters to reduce temporal dependence. This made it harder for listeners to concentrate on the stories and thereby might have potentially led to lower quality neural data. Furthermore, it reduced one form of stimulus dependency, namely the acoustic temporal dependencies, which could be exploited by the encoding model to reach higher encoding accuracies. Secondly, MEG has a notoriously poor signal-to-noise ratio, and the amount of data per participant (7.745 words as opposed to 85.719 in the few-subject dataset) might not have been enough to produce reliably high encoding results.

      Finally, the current study is clear and convincing, and my suggestions are not intended to question its novelty or robustness. Rather, I believe the authors are in a strong position to address a critical question in language processing: whether pre-activation occurs. The authors have thoughtfully considered important confounds related to pre-onset responses. Adding some approaches to regressing out these confounds could be particularly helpful for determining whether a true pre-onset response remains.

      We thank the reviewer again for their constructive feedback, suggestions and questions. To clarify, however, our goal is *not* to definitively attest to whether pre-activation occurs. Our goal is simply to scrutinise a specific method to test for linguistic prediction. This method purports to be an improvement on conventional post-onset (e.g. surprisal-based) methods, as it can directly investigate effects occurring prior to word onset. We have demonstrated fundamental limitations in the underlying logic of this method. We propose passive control systems as baselines against which claims of prediction should be evaluated. Against this baseline, the current evidence does not show unequivocal support for prediction: pre-onset encoding in the brain does not exceed that in the passive control. However, we do not conclude from this that pre-activation does not exist — that would require a different study entirely. Our aim is more methodological: to establish what should count as evidence for prediction, not to settle whether prediction occurs.

      We would like to thank the reviewers and editors for their thoughtful feedback, which has been tremendously helpful in improving the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely thank the editor and both reviewers for their time and thoughtful feedback on our manuscript. We have carefully addressed all the concerns raised in the responses below and incorporated the suggested revisions into the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the population structure of the invasive weed Lantana camara from 36 localities in India using 19,008 genome-wide SNPs obtained through ddRAD sequencing.

      Strengths:

      The manuscript is well-written, the analyses are sound, and the figures are of great quality.

      Weaknesses:

      The narrative almost completely ignores the fact that this plant is popular in horticultural trade and the different color morphs that form genetic populations are most likely the result of artificial selection by humans for certain colors for trade, and not the result of natural selfing. Although it may be possible that the genetic clustering of color morphs is maintained in the wild through selfing, there is no evidence in this study to support that. The high levels of homozygosity are more likely explained as a result of artificial selection in horticulture and relatively recent introductions in India. Therefore, the claim of the title that "the population structure.. is shaped by its mating system" is in part moot, because any population structure is in large part shaped by the mating system of the organism, but further misleading because it is much more likely artificial selection that caused the patterns observed.

      The reviewer raises the possibility that the observed genetic patterns may have originated through the selection of different varieties by the horticultural industry. While it is plausible that artificial selection can lead to the formation of distinct morphs, the presence of a strong structure between them in the wild populations cannot be explained just based on selection. The observed patterns in the inbreeding coefficient and heterozygosity can indeed arise from multiple factors, including past bottlenecks, selection, inbreeding, and selfing. In the wild, different flower colour variants frequently occur in close physical proximity and should, in principle, allow for cross-fertilization. Over time, this gene flow would be expected to erode any genetic structure shaped solely by past selection. However, our results show no evidence of such a breakdown in structure. Despite co-occurring in immediate proximity, the flower colour variants maintain distinct genetic identities. This suggests the presence of a barrier to gene flow, likely maintained by the species' mating system. Moreover, the presence of many of these flower colour morphs in the native range—as documented through observations on platforms like iNaturalist—suggests that these variants may have a natural origin rather than being solely products of horticultural selection.

      While it is plausible that horticultural breeding involved efforts to generate new varieties through crossing—resulting in the emergence of some of the observed morphs—even if this were the case, the dynamics of a self-fertilizing species would still lead to rapid genetic structuring. Following hybridization, just a few generations of selfing are sufficient to produce inbred lines, which can then maintain distinct genetic identities. As discussed in our manuscript, such inbred lines could be associated with specific flower colour morphs and persist through predominant self-fertilization. This mechanism provides a compelling explanation for the strong genetic structure observed among co-occurring flower colour variants in the wild.

      To further validate this, we conducted a bagging experiment on Lantana camara inflorescences to exclude insect-mediated cross-pollination. The results showed no significant difference in seed set between bagged and open-pollinated flowers, supporting the conclusion that L. camara is primarily self-fertilizing in India. These results are included in the revised manuscript.

      As the reviewer rightly points out, the mating system of a species plays a crucial role in shaping patterns of genetic structure. However, in many natural populations, structuring patterns are often influenced by a combination of factors such as selection, barriers to gene flow, and genetic drift. In some cases, the mating system exerts a more prominent influence at the microgeographic level, while in others, it can shape genetic structure at broader spatial scales. What is particularly interesting in our study is that - the mating system appears to shape genetic structure at a subcontinental scale. Despite the species having undergone other evolutionary forces—such as a genetic bottleneck and expansion due to its invasive nature—the mating system exerts a more pronounced effect on the observed genetic patterns, and the influence of the mating system is remarkably strong, resulting in a clear and consistent genetic structure across populations.

      Reviewer #1 (Recommendations for the authors):

      Lantana camara is a globally invasive plant as the authors mention in their manuscript, but this study only focuses on India. This should be reflected in the title.

      The reviewer has suggested that the title should reflect the study area. Since our sampling covers nearly all regions in India, we believe the patterns observed here are likely representative of those in other parts of the invaded range. For this reason, we would prefer to retain the current heading.

      It would be helpful if the pictures of the flowers in Figure 3 were larger to more clearly see the different colors.

      As per the reviewers suggestion we have increased the size of the images to improve clarity.

      Figure 4 could probably be moved to supplemental material, it does not add much to the results.

      We feel it is important to reiterate that the patterns we observe in Lantana are consistent with what one would expect in any predominantly self-fertilizing species. It act as an additional proof and therefore, we believe it is important to retain this figure, as it effectively conveys this link.

      Reviewer #2 (Public review):

      Summary:

      The authors performed a series of population genetic analyses in Lantana camara using 19,008 genome-wide SNPs data from 359 individuals in India. They found a clear population structure that did not show a geographical pattern, and that flower color was rather associated with population structure. Excess of homozygosity indicates a high selfing rate, which may lead to fixation of alleles in local populations and explain the presence of population structure without a clear geographic pattern. The authors also performed a forward simulation analysis, theoretically confirming that selfing promotes fixation of alleles (higher Fst) and reduction in genetic diversity (lower heterozygosity).

      Strengths:

      Biological invasion is a critical driver of biodiversity loss, and it is important to understand how invasive species adapt to novel environments despite limited genetic diversity (genetic paradox of biological invasion). Lantana camara is one of the hundred most invasive species in the world (IUCN 2000), and the authors collected 359 plants from a wide geographical range in India, where L. camara has invaded. The scale of the dataset and the importance of the target species are the strengths of the present study.

      Weaknesses:

      One of the most critical weaknesses of this study would be that the output modelling analysis is largely qualitative, which cannot be directly comparable to the empirical data. The main findings of the SLiM-based simulation were that selfing promotes the fixation of alleles and the reduction of genetic diversity. These are theoretically well-reported knowledge, and such findings themselves are not novel, although it may have become interesting these findings are quantitatively integrated with their empirical findings in the studied species. In that sense, a coalescent-based analysis such as an Approximate Bayesian Computation method (e.g. DIY-ABC) utilizing their SNPs data would be more interesting. For example, by ABC-based methods, authors can infer the split time between subpopulations identified in this study. If such split time is older than the recorded invasion date, the result supports the scenario that multiple introductions may have contributed to the population structure of this species. In the current form of the manuscript, multiple introductions were implicated but not formally tested.

      Through our SLiM simulations, we aimed to demonstrate that a pattern of strong genetic structure within a location (similar to what we observed in Lantana camara) can arise under a predominantly self-fertilizing mating system. These simulations were not parameterized using species-specific data from Lantana but were intended as a conceptual demonstration of the plausibility of such patterns under selfing using SNP data. While the theoretical consequences of self-fertilisation have been widely discussed, relatively few studies have directly modelled these patterns using SNP data. Our SLiM simulations contribute to this gap and support the notion that the observed genetic structuring in Lantana may indeed result from predominant self-fertilisation. Therefore, we conducted these simulations ourselves for invasive plants to test whether the patterns we observed are consistent with expectations for a predominantly self-fertilising species.

      Additionally, as suggested by the reviewer, we have performed demographic history simulations using fastsimcoal2 to investigate the divergence among different flower colour morphs. The results have been incorporated into the revised manuscript.

      First, the authors removed SNPs that were not in Hardy-Weinberg equilibrium (HWE), but the studied populations would not satisfy the assumption of HWE, i.e., random mating, because of a high level of inbreeding. Thus, the first screening of the SNPs would be biased strongly, which may have led to spurious outputs in a series of downstream analyses.

      Applying a HWE filter is a common practice in genomic data analysis because it helps remove potential sequencing or genotyping artefacts, which can otherwise bias downstream analyses. However, we understand that HWE filtering can also remove biologically informative loci and potentially bias the analysis, especially when a stringent cutoff is used. A strict filter might retain only loci that perfectly fit Hardy–Weinberg expectations and exclude sites influenced by real evolutionary processes like selection and/or inbreeding.

      To balance this, we used a mild HWE filter, aiming to remove clear artefacts while retaining loci that may reflect genuine biological signals. Another reason for applying it is that many downstream tools, for example, admixture, assume the markers are neutral and not strongly deviating from HWE (although this assumption may not always hold). This helps in avoiding the complexity of the model.

      Second, in the genetic simulation, it is not clear how a set of parameters such as mutation rate, recombination rate, and growth rate were determined and how they are appropriate.

      We have cited the references for these values in the manuscript. However, for Lantana, many such baseline data are not available, so we used general values reported for plants, which is an accepted approach when working with understudied species. Moreover, the aim of these simulations was to develop a general understanding of how mating systems influence genetic diversity in invasive plants, rather than to parameterize the simulations specifically for Lantana.

      While we acknowledge that this simulation does not provide an exact representation of the species' evolutionary history, the goal of the simulation was not to produce precise estimates but rather to illustrate the feasibility of such strong genetic structuring resulting from self-fertilisation alone.

      Importantly, while authors assume the selfing rate in the simulation, selfing can also strongly influence the effective mutation rate (e.g. Nordborg & Donnelly 1997 Genetics, Nordborg 2000 Genetics). It is not clear how this effect is incorporated in the simulation.

      In genetic simulations, it is often best to begin with simpler scenarios involving fewer parameters, and we followed this approach. As the reviewer rightly pointed out, selfing can influence multiple factors such as mutation and recombination rates. However, to first understand the broad effects, we chose to work with simpler scenarios where both mutation and recombination rates were kept constant.

      Third, while the authors argue the association between flower color and population structure, their statistical associations were not formally tested.

      We thank the reviewer for this valuable suggestion. We have performed a MANOVA to test the association between flower colour and genetic structure. These results are incorporated in the revised manuscript.

      Also, it is not mentioned how flower color polymorphisms are defined. Could it be possible to distinguish many flower color morphs shown in Figure 1b objectively?

      We carefully considered this and defined our criteria based on flower colour. Specifically, we named morphs according to the colour of both young and old flowers. If both stages shared the same colour, we used that colour as the name. As shown in Figure 1b, it is possible to reliably distinguish between the different flower colour morphs. While one could also measure flower colour using a photometer, we believe both approaches yield similar results.

      I am concerned particularly because the authors also mentioned that flower color may change temporally and that a single inflorescence can have flowers of different colors (L160).

      The flower colour changes within an inflorescence, with young flowers shifting colour after pollination. However, this trend is consistent within a plant; for example, the yellow–pink morph always changes from yellow to pink. Based on this consistency, we incorporated a naming system that considers both the colour of younger and older flowers.

      Reviewer #2 (Recommendations for the authors):

      Figure 4: Figures a and b are not the "signatures of high inbreeding", because such patterns could also simply happen due to geographical isolation. The title of the figure could be changed. Figure 4c should be presented as a histogram.

      We have incorporated this suggestion into the manuscript and revised the figure title accordingly. However, we believe that presenting Figure 4c in its current form is more informative.

      L459 "in the introduced range, Lantana is self-compatible": is it self-incompatible in the native range? If it is known, it could be mentioned in the manuscript.

      A previous study from India demonstrated that self-fertilisation is possible in Lantana, providing an additional line of evidence for our findings. However, Lantana remains poorly studied in its native range, and to the best of our knowledge, only a single study has examined its pollination biology there, which we have cited in this paper.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Chen et al. identified a role for the circadian photoreceptor CRYPTOCHROME (cry) in promoting wakefulness under short photoperiods. This research is potentially important as hypersomnolence is often seen in patients suffering from SAD during winter times. The mechanisms underlying these sleep effects are poorly known.

      Strengths:

      The authors clearly demonstrated that mutations in cry lead to elevated sleep under 4:20 Light-Dark (LD) cycles. Furthermore, using RNAi, they identified GABAergic neurons as a primary site of cry action to promote wakefulness under short photoperiods. They then provide genetic and pharmacological evidence demonstrating that cry acts on GABAergic transmission to modulate sleep under such conditions.

      Weaknesses:

      The authors then went on to identify the neuronal location of this cry action on sleep. This is where this reviewer is much more circumspect about the data provided. The authors hypothesize that the l-LNvs which are known to be arousal-promoting may be involved in the phenotypes they are observing. To investigate this, they undertook several imaging and genetic experiments.

      Major concerns:

      (1) Figure 2 A-B: The authors show that knocking down cry expression in GABAergic neurons mimics the sleep increase seen in cryb mutants under short photoperiod. However, they do not provide any other sleep parameters such as sleep bout numbers, sleep bout duration, and more importantly waking activity measurements. This is an essential parameter that is needed to rule out paralysis and/or motor defects as the cause of increased "sleep". Any experiments looking at sleep need to include these parameters.

      Thank you for bringing up these points. We have now included these sleep parameters in Figure 2—figure supplement 3.

      (2) For all Figures displaying immunostaining and imaging data the resolution of the images is quite poor. This makes it difficult to assess whether the authors' conclusions are supported by the data or not.

      We apologize for the poor resolution. This is probably due to the compression of the figures in the merged PDF file. We are now uploading the figures individually and hopefully this can resolve the resolution issue.

      (3) In Figure 4-S1A it appears that the syt-GFP signal driven by Gad1-GAL4 is colabeling the l-LNvs. This would imply that the l-LNvs are GABAergic. The authors suggest that this experiment suggests that l-LNvs receive input from GABAergic neurons. I am not sure the data presented support this.

      We agree that this piece of data alone is not sufficient to demonstrate that the l-LNvs receive GABAergic inputs rather than the l-LNvs are GABAergic. However, when nlsGFP signal is driven by two independent Gad1-GAL4 lines (one generated by P element insertion while the other generated by GAL4 inserted into the Gad1 locus), we do not observe any prominent signal in the l-LNvs (Figure 5A and B; Figure 5-figure supplement 1A). We have also co-labeled using Gad1GAL4 and PdfLexA (Figure 5-figure supplement 1B). As can be seen, Gad1GAL4-driven GFP signal is present only in the s-LNvs but not the l-LNvs. This further supports the idea that the l-LNvs are not GABAergic, and that the syt-GFP signal likely arises from GABAergic neurons projecting to the l-LNvs.

      (4) In Figure 4-S1B. The GRASP experiment is not very convincing. The resolution of the image is quite poor. In addition, the authors used Pdf-LexA to express the post t-GRASP construct in l-LNvs, but Pdf-LexA also labels the s-LNvs, so it is possible that the GRASP signal the authors observe is coming from the s-LNvs and not the l-LNvs. The authors could use a l-LNvs specific tool to do this experiment and remove any doubts. Altogether this reviewer is not convinced that the data presented supports the conclusion "All in all, these results demonstrate that GABAergic neurons project to the l-LNvs and form synaptic connections." (Line 176). In addition, the authors could have downregulated the expression of Rdl specifically in l-LNvs to support their conclusions. The data they are providing supports a role for RDL but does not prove that RDL is involved in l-LNvs.

      Thank you for these wonderful suggestions. Again we apologize for the poor resolution and hopefully by uploading the images separately we can resolve this issue. We agree that the GRASP signal could be coming from the s-LNvs and not the l-LNvs but unfortunately we are not able to find a LexA that is specifically expressed in the l-LNvs. We believe the trans-Tango data further support the idea that GABAergic neurons project to and form synaptic connections with the l-LNvs. Nonetheless, we have changed our conclusion to “All in all, these results strongly suggest that GABAergic neurons project to the l-LNvs and form synaptic connections” to be more rigorous. In addition, we have obtained R78G01GAL4 which is specifically expressed in the l-LNvs, and using this GAL4 to knock down Rdl rescues the long-sleep phenotype of cry mutants (Figure 4—figure supplement 1D).

      (5) In Figures 4 A and C: it appears that GABA is expressed in the l-LNvs. Is this correct? Can the authors clarify this? Maybe the authors could do an experiment where they co-label using Gad1-GAL4 and Pdf-LexA to clearly demonstrate that l-LNvs are not GABAergic. Also, the choice of colors could be better. It is very difficult to see what GABA is and what is PDF.

      Thank you for this wonderful suggestion. We have now co-labeled using Gad1GAL4 and PdfLexA (Figure 5-figure supplement 1B). As can be seen, Gad1GAL4-driven GFP signal is present only in the s-LNvs but not the l-LNvs. We suspect the GABA signal at the l-LNvs may arise from the GABAergic projections received by these cells. We have now changed the color of the GABA/PDF signals in these images and have reduced the intensity of the PDF signal. Hopefully, it would be easier to visualize in this revised version.

      (6) Figure 4G: Pdf-GAL4 expresses in both s-LNvs and l-LNvs. So, in this experiment, the authors are silencing both groups, not only the l-LNvs. Why not use a l-LNvs specific tool?

      Thank you for bring up this important point. We have previously used c929GAL4 to express Kir2.1 and this led to lethality. We have now used two l-LNv-specific GAL4 drivers (R78G01GAL4 and R10H10GAL4) that we newly obtained to express Kir2.1 but did not observe significant effect on sleep. Please see Author response image 1 for the results.

      Author response image 1.

      Daily sleep duration of male flies expressing Kir2.1 in l-LNvs using R78G01GAL4 (A)(n = 40, 41, 30 flies) and R10H10GAL4 (B) (n = 40, 41, 32 flies) and controls, monitored under 4L20D. One-way ANOVA with Bonferroni multiple comparison test was used to calculate the difference between experimental group and control group.

      (7) Figure 4H-I: The C929-GAL4 driver expresses in many peptidergic neurons. This makes the interpretation of these data difficult. The effects could be due to peptidergic cells being different than the l-LNvs. Why not use a more specific l-LNvs specific tool? I am also confused as to why some experiments used Pdf-GAL4 and some others used C929-GAL4 in a view to specifically manipulate l-LNvs? This is confusing since both drivers are not specific to the l-LNvs.

      Thank you for bring up these important points. We have now used the l-LNv-specific R10H10GAL4 and the results are more or less comparable with that of c929GAL4 (Figure 4I and K), i.e. activating the l-LNvs blocks the long-sleep phenotype of cry mutants. The reason PdfGAL4 is used in 4G is because c929GAL4 leads to lethality while the l-LNv-specific GAL4 lines do not alter sleep.

      (8) Figure 5-S1B: Why does the pdf-GAL80 construct not block the sleep increase seen when reducing expression of cry in Gad1-GAL4 neurons? This suggests that there are GABAergic neurons that are not PDF expressing involved in the cry-mediated effect on sleep under short photoperiods.

      Yes, this is indeed the conclusion we draw from this result, and we commented on this in the Discussion: “Moreover, inhibiting cry RNAi expression in PDF neurons does not eliminate the long-sleep phenotype of Gad1GAL4/UAScryRNAi flies. Therefore, we suspect that cry deficiency in other GABAergic neurons is also required for the long-sleep phenotype. Given that the s-LNvs are known to express CRY and appear to be GABAergic based on our findings here, we believe that CRY acts at least in part in the s-LNvs to promote wakefulness under short photoperiod.”

      In conclusion, it is not clear that the authors demonstrated that they are looking at a cry-mediated effect on GABA in s-LNvs resulting in a modulation of the activity of the l-LNvs. Better images and more-suited genetic experiments could be used to address this.

      Thank you very much for all the comments. They are indeed quite helpful for improving our manuscript. Hopefully, with images of higher quality and the additional experiments described above, we have now provided more evidence supporting our major conclusion.

      Reviewer #2 (Public Review):

      Summary:

      The sleep patterns of animals are adaptable, with shorter sleep durations in the winter and longer sleep durations in the summer. Chen and colleagues conducted a study using Drosophila (fruit flies) and discovered that a circadian photoreceptor called cryptochrome (cry) plays a role in reducing sleep duration during day/night cycles resembling winter conditions. They also found that cry functions in specific GABAergic circadian pacemaker cells known as s-LNvs inhibit these neurons, thereby promoting wakefulness in the animals in the winter. They also identified l-LNvs, known as arousal-promoting cells, as the downstream neurons.

      Strengths:

      Detailed mapping of the neural circuits cry acts to mediate the shortened sleep in winter-like day/night cycles.

      Weaknesses:

      The supporting evidence for s-LNvs being GABAergic neurons is not particularly strong. Additionally, there is a lack of direct evidence regarding changes in neural activity for s-LNvs and l-LNvs under varying day/night cycles, as well as in cry mutant flies.

      Thank you very much for all the comments. We have now expressed nlsGFP by two independent Gad1-GAL4 lines (one generated by P element insertion while the other generated by GAL4 inserted into the Gad1 locus), and positive signals in the s-LNvs can be observed (Figure 5A and B; Figure 5-figure supplement 1A). Hopefully, this can provide some further support regarding the s-LNvs being GABAergic neurons.

      We have now examined GCaMP signals in the l- and s-LNvs of WT and cry mutants under 4L20D/12L12D. Please see Author response image 2 for the results. As can be seen, both WT and cry mutants show photoperiod-dependent changes. Interestingly, cry mutants show more prominent reduction of GCaMP signal in the l-LNvs compared to WT under 12L12D vs. 4L20D, but the sleep duration phenotype is observed only under 4L20D. Moreover, GCaMP signal is elevated in the s-LNvs of cry mutants relative to WT under 4L20D but decreased under 12L12D. These results indicate that there are distinct mechanisms regulating sleep under short vs. normal photoperiod (with CRY being dispensable under 12L12D), and the role of CRY in modulating the activity of these neurons are also photoperiod-dependent. Further in-depth characterizations are need to delineate these complex issues.

      Author response image 2.<br /> Quantification of GCaMP6m signal intensity normalized to that of tdTomato under 12L12D and 4L20D (n = 25-45 cells). Student’s t-test: compared to WT, #P < 0.05, ##P < 0.01; 12L12D vs. 4L20D, *P < 0.05, ***P < 0.001.

      Reviewer #3 (Public Review):

      Summary:

      In humans, short photoperiods are associated with hypersomnolence. The mechanisms underlying these effects are, however, unknown. Chen et al. use the fly Drosophila to determine the mechanisms regulating sleep under short photoperiods. They find that mutations in the circadian photoreceptor cryptochrome (cry) increase sleep specifically under short photoperiods (e.g. 4h light: 20 h dark). They go on to show that cry is required in GABAergic neurons. Further, they suggest that the relevant subset of GABAergic neurons are the well-studied small ventral lateral neurons that they suggest inhibit the arousal-promoting large ventral neurons via GABA signalling.

      Strengths:

      Genetic analysis to show that cryptochrome (but not other core clock genes) mediates the increase in sleep in short photoperiods, and circuit analysis to localise cry function to GABAergic neurons.

      Weaknesses:

      The authors' conclusion that the sLNvs are GABAergic is not well supported by the data. Better immunostaining experiments and perhaps more specific genetic driver lines would help with this point (details below).

      (1) The sLNvs are well known as a key component of the circadian network. The finding that they are GABAergic would if true, be of great interest to the community. However, the data presented in support of this conclusion are not convincing. Much of the confocal images are of insufficient resolution to evaluate the paper's claims. The Anti-GABA immunostaining in Fig 4 and 5 seem to have a high background, and the GRASP experiments in Fig 4 supplement 1 low signal.

      We apologize for the poor resolution. This is probably due to the compression of the figures in the merged PDF file. We are now uploading the figures individually and hopefully this can resolve the resolution issue. Unfortunately, the GABA immunostaining does not work very well in our hands and thus the background is high. We have now adjusted the images by changing the minimum lookup table (LUT) value in the green channel to 213, which removes all pixels below 213. This can remove background without changing the gray values, so the analysis is not affected. We have modified all images the exact same way and hopefully this can improve the contrast. Furthermore, we have now expressed nlsGFP by two independent Gad1-GAL4 lines (one generated by P element insertion while the other generated by GAL4 inserted into the Gad1 locus), and positive signals in the s-LNvs can be observed (Figure 5A and B; Figure 5-figure supplement 1A). Hopefully, this can provide some further support regarding the s-LNvs being GABAergic neurons.

      Transcriptomic datasets are available for the components of the circadian network (e.g. PMID 33438579, and PMID 19966839). It would be of interest to determine if transcripts for GAD or other GABA synthesis/transport components were detected in sLNvs. Further, there are also more specific driver lines for GAD, and the lLNvs, sLNVs that could be used.

      Thank you for these wonderful suggestions. Based on PMID 19966839, both the s-LNvs and l-LNvs express Gad1 and VGAT at a relatively low level, although here in our study Gad1GAL4 expression is observed only in the s-LNvs and not l-LNvs. We have commented on this in the 4th paragraph of Discussion: “One study using cell-type specific gene expression profiling demonstrates Gad1 and VGAT expression in both s-LNvs and l-LNvs, although with relatively low signal (Nagoshi et al., 2010). Here we observed that Gad1GAL4 is expressed in the s-LNvs, and their GABA intensity is reduced when we use R6GAL4 to knock down VGAT in these cells.” PMID 33438579 does not report expression of these genes in either s-LNvs or l-LNvs, likely due to insufficient sequencing depth. Furthermore, we have now used two l-LNv-specific GAL4 lines (R78G01GAL4 and R10H10GAL4) to conduct some of the experiments that we previously used c929GAL4 for, and obtained comparable results (Figure 4I and K).

      (2) The authors' model posits that in short photoperiods, cry functions to suppress GABA secretion from sLNvs thereby disinhibiting the lNVs. In Fig 4I they find that activating the lLNvs (and other peptidergic cells) by c929>NaChBac in a cryb background reduces sleep compared to activating lLNVs in a wild-type background. It's not clear how this follows from the model. A similar trend is observable in Fig 4H with TRP-mediated activation of lNVs, although it is not clear from the figure if the difference b/w cryb vs wild-type background is significant.

      Thank you for bring up this important point. This does appear to be counterintuitive. We suspect that in cry mutants, there is more inhibition occurring at the l-LNvs and thus the system may be particularly sensitive to their activation. Therefore, activating these neurons on the mutant background can result in a more prominent wake-promoting effect compared to that of WT.

      Recommendations for the authors:

      Our major concern centers around the claim that the sLNvs are GABAergic and secrete GABA onto the lLNVs. As it stands, this is not well supported by the data.

      The authors could substantiate these findings by using more specific driver lines for GAD / vGAT (MiMic based lines are available that should better recapitulate endogenous expression). Transcriptomic data for circadian neurons are available, the FlyWire consortium also predicts neurotransmitter identities for specific neural circuits. These datasets could be mined for evidence to support the claim of sLNvs being GABAergic

      Thank you for these wonderful suggestions. We have now used MiMic-based lines for Gad1 (BS52090, Mi{MIC}Gad1MI09277) and VGAT (BS23022, Mi{ET1}VGATMB01219) to knock down cry but unfortunately were not able to observe changes in sleep. Please see Author response image 3 for the results.

      Author response image 3.

      Daily sleep duration of male flies with cry knocked down in GABAergic neurons by Gad1GAL4 (A) (n = 30, 38, 50, 18, 31 flies) or VGATGAL4 (B) (n = 28, 38, 50, 18, 30 flies) monitored under 4L20D.One-way ANOVA with Bonferroni multiple comparison test: compared to UAS control, ###P < 0.001.

      Furthermore, we have now included another Gad1GAL4 line which is generated by knocking GAL4 transgene into the Gad1 locus. We are also able to observe increased sleep when using this GAL4 to knock down cry, and positive signals in the s-LNvs can be observed when using this GAL4 to drive nlsGFP (Figure 2B; Figure 5-figure supplement 1A).

      Based on PMID 19966839, both the s-LNvs and l-LNvs express Gad1 and VGAT at a relatively low level, although here in our study Gad1GAL4 expression is observed only in the s-LNvs and not l-LNvs. We have commented on this in the 4th paragraph of Discussion: “One study using cell-type specific gene expression profiling demonstrates Gad1 and VGAT expression in both s-LNvs and l-LNvs, although with relatively low signal (Nagoshi et al., 2010). Here we observed that Gad1GAL4 is expressed in the s-LNvs, and their GABA intensity is reduced when we use R6GAL4 to knock down VGAT in these cells.” The FlyWire does not have prediction for this particular circuit that we are interested in.

      Further, many of the immunostaining images have high background / low signal - so better confocal images would help, as would the use of more specific driver lines for the lNVs as it is sometimes hard to distinguish the lLNvs from sLNvs.

      We have now adjusted all images by changing the minimum lookup table (LUT) value in the green channel to 213 and that of the red channel to 279, which removes all pixels below 213 and 279, respectively. This can remove background without changing the gray values, so the analysis is not affected. We have modified all images the exact same way and hopefully this can improve the signal to noise ratio. We were not able to find a LexA line that is specifically expressed in the l-LNvs but we have found two l-LNv-specific GAL4 lines (R78G01GAL4 and R10H10GAL4). We used these lines to conduct some of the experiments that we previously used c929GAL4 for, and obtained comparable results (Figure 4I and 4K).

      Additional specific comments are in the reviews above.

      Minor points:

      (1) Line 55: CRYPTOCHROME is misspelled.

      This has been fixed.

      (2) Line 140: The authors need to provide the appropriate references for the use of THIP and SKF-97541.

      This has been added.

      (3) Line 149: there are multiple GABA-A receptors in flies, the authors should acknowledge that. What about LccH3 or Grd?

      Thank you for bring up this important point. Here we focused only on Rdl because it is the only GABA-A receptor known to be involved in sleep regulation. We have modified our description regarding this issue: “We tested for genetic interaction between cry and Resistant to dieldrin (Rdl), a gene that encodes GABA-A receptor in flies and has previously been shown to be involved in sleep regulation.”

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors developed a new autofocusing method, LUNA (Locking Under Nanoscale Accuracy), to address severe focus drift-a major challenge in time-lapse microscopy. Using this method, they tackle a fundamental question in bacterial cold shock response: whether cells halt growth and division following an abrupt temperature downshift. Overall, the experimental design, modeling, and data analysis are solid and well executed. However, several points require clarification or further support to fully substantiate the authors' conclusions.

      Strengths:

      (1) The LUNA method outperforms existing autofocusing systems with nanoscale precision over a large focusing range. The focusing time is reasonable for the presented experiments, and the authors note potential improvements by using faster motors and optimized control algorithms, suggesting broad applicability. The theoretical simulations and experimental validation provide solid support for the robustness of the method.

      (2) Using LUNA, the authors address a long-standing question in bacterial physiology: whether cells arrest growth and division after an abrupt cold shock. Single-cell analyses monitoring the entire course of cold adaptation and steady-state growth reveal features that are obscured in bulk-culture studies: cells continue to grow at reduced rates with smaller cell sizes, resulting in an apparently unchanged population-level OD. The experiments are well designed and analyses are generally solid and largely support the authors' conclusions.

      (3) The authors also propose a model describing how population-level OD measurements depend on cell dry mass density, volume, and concentration. This provides a valuable conceptual contribution to the interpretation of OD-based growth measurements, which remain a gold-standard method in microbiology.

      We thank the reviewer for acknowledging the strengths of our study.

      Weaknesses:

      (1) It is unclear whether the author's model explaining the population-level OD during acclimation is broadly applicable. Most analyses focus on a shift from 37˚C to 14˚C, where the model agrees well with experimental data. However, in the 37˚C to 12˚C experiment, OD600 decreases after cold shock (Fig. 5e), and the computed OD does not match the experimental measurements (Fig. S16a). Although the authors attribute this discrepancy to a "complicated interplay," no further explanation is provided, which limits confidence in the model's general applicability.

      Thank you for this careful evaluation regarding the model generality. In the experiment with a temperature shift from 37°C to 12°C, the measured OD600 values were 0.243 at 0 hours and 0.242 at 5 hours. In comparison, our model-computed OD600 values were 0.243 at 0 hours and 0.271 at 5 hours. The absolute difference between the measured and computed values at 5 hours is therefore 0.028.

      Given the typical experimental variability in OD600 measurements and the limited linear range of the OD-to-biomass approximation (generally considered reliable below ~0.5), this deviation is quantitatively modest. We appreciate your valuable feedback and are happy to provide further clarification if needed.

      (2) The manuscript proposes that cell-cycle progression becomes synchronized across the population after cold shock, but the supporting evidence is not fully convincing. If synchronization refers primarily to the uniform reduction in growth rate following cold shock, this could plausibly arise from global translation inhibition affecting all cells. However, the additional claim that "cells encountering a relatively late CSR will accelerate division to maintain synchronization" is not strongly supported by the presented data.

      We appreciate your critical reading, which has helped us identify ambiguities in our terminology and strengthen the clarity of our work. Regarding the term “synchronization”, we would like to clarify that it refers to two different scenarios: (i) the synchrony in the timing of growth rate changes after cold shock. The cells initiate the slowdown in growth almost simultaneously, suggesting a highly coordinated, non-stochastic population-level response to cold shock; (ii) the synchrony in division cycle progression.

      In the sentence you referenced “cells encountering a relatively late CSR will accelerate divisions to maintain synchronization”, we intended to describe that cells maintain consistent progression of the division cycle after cold shock, meaning that after the same number of elapsed cycles, different cells are at a similar stage in their division timing (Figure 4f, 4g, Figure S14). The term “accelerate” refers to our observation that cells which complete a given cycle later than others tend to have shorter subsequent inter-division intervals, thereby “catching up” to maintain alignment in cycle number across the population. We acknowledge that using “synchronization” in this scenario may be ambiguous, and we will replace it with more precise phrasing “progression of division cycle” to accurately convey this finding.

      (3) Several technical terms used in the method development section are not clearly defined and may be unfamiliar to a broad readership, which makes it difficult to fully understand the methodology and evaluate its performance. Examples include depth of focus, focusing precision, focusing time, focusing frequency, and drift threshold value. In addition, the reported average focusing time per location (~0.6 s) lacks sufficient context, limiting the reader's ability to assess its significance relative to existing autofocusing methods.

      Thank you for your valuable comments and suggestions. In response, we have added more detailed descriptions in the Methods section of the revised version.

      The reviewer noted that the reported average focusing time (~0.6 s) lacks sufficient context, which may limit readers’ ability to assess its significance relative to existing autofocusing methods. We would like to clarify that the core innovation of this work lies in the proposed theoretical framework for autofocusing, which offers advantages over existing methods in terms of focusing precision and range. While focusing time is a practically relevant performance metric, it is primarily presented here as an implementation-dependent parameter rather than a central theoretical contribution of this study. In our experimental setup, an average focusing time of 0.6 s proved sufficient for routine timelapse imaging in microscopy, thereby demonstrating the practical usability of LUNA.

      Reviewer #2 (Public review):

      Summary:

      This study presents LUNA, an autofocus method that compensates for focus drift during rapid temperature changes. Using this approach, the authors show that E. coli cells continue to grow and divide during cold shock, revealing a coordinated, multi-phase adaptation process that could not be deduced from traditional population measurements. They propose a scattering-theory-based model that reconciles the paradox between growth differences of the bacteria at the single-cell level vs population level.

      Strengths:

      (1) The LUNA approach is pretty creative, turning coma aberration from what is normally a nuisance into an exploit. LUNA enabled long-term single-cell imaging during rapid temperature downshifts.

      (2) The authors show that the long-assumed growth arrest during cold shock from population-level measurements is misleading. At the single-cell level, bacteria do not stop growing or dividing but undergo a continuous, three-phase adaptation process. Importantly, this behavior is highly synchronized across the population and not based on bet-hedging.

      (3) Finally, the authors propose a model to resolve a long-standing paradox between single-cell vs population behavior: if cells keep growing, why does optical density (OD) of the culture stop increasing? Using light-scattering theory, they show that OD depends not only on cell number but also on cell volume, which decreases after cold shock. As a result, OD can remain flat, or even decrease, despite continued biomass accumulation. This demonstrates that OD is not a reliable proxy for growth under non-steady conditions.

      We thank the reviewer for acknowledging the strengths of our study.

      Weaknesses:

      (1) While the authors theoretically explain the advantages of LUNA over existing autofocus methods, it is unclear whether practical head-to-head comparisons have been performed, apart from the comparison to Nikon PFS shown in Video S1. As written, the manuscript gives the impression that only LUNA can solve this problem, but such a claim would require more systematic and rigorous benchmarking against alternative approaches.

      Thank you for your insightful comment regarding the comparison of LUNA with other autofocus methods.

      In our study, we primarily compared LUNA with the Nikon PFS system (as shown in Video S1) because Nikon PFS is one of the most widely used commercial autofocus systems in single-cell time-lapse imaging, and its manufacturer provides well-defined performance parameters (e.g., focusing precision within 1/3 depth-of-focus, response time <0.7 s), which facilitates a quantitative comparison. For other commercial systems, such as Olympus ZDC, Zeiss Definite Focus, Leica AFC, and ASI CRISP, the publicly available specifications are often less clearly defined, or are measured under inconsistent conditions, making a direct head-to-head comparison challenging and potentially misleading. Additionally, in our preliminary experiments, we also tested an Olympus microscope and observed severe focus drift during slow cooling processes. From a physical perspective, LUNA is specifically designed to meet the demanding requirements of single-cell experiments, including a wide focusing range and high precision, while existing commercial systems may not physically achieve the combination of range and accuracy needed for such extreme conditions.

      (2) No mutants/inhibitors used to test and challenge the proposed model.

      We agree that such approaches would provide valuable mechanistic insights and further strengthen the validation of the model presented in this study. In the current work, our primary goal was to introduce LUNA autofocusing method and demonstrate its capability to resolve bacterial cold shock response at the single-cell level with unprecedented precision. As such, we focused on characterizing the wild-type physiological dynamics under cold shock, which already revealed several previously unreported phenomena. We acknowledge that the use of genetic mutants or chemical inhibitors targeting specific cold shock proteins or regulatory pathways would be a logical and powerful next step to dissect the underlying molecular mechanisms and test the causality of the observed growth dynamics. We plan to address this in future work by incorporating such perturbations to further test and refine the model.

      (3) Cells display a high degree of synchronization, but they are grown in confined microfluidic channels under highly uniform conditions. It is unclear to what extent this synchrony reflects intrinsic biology versus effects imposed by the microfluidic environment.

      The reviewer raises a pertinent question regarding whether the observed high degree of cell synchronization represents an intrinsic biological phenomenon or an artifact induced by the microfluidic environment.

      Over the past decade, microfluidic chips, including the specific design used in our work, have become a widely accepted and powerful tool in microbial physiology research. A broad consensus has emerged within the community that the microenvironment within these microchannels does not significantly interfere with or perturb the natural physiological behavior of microorganisms (Dusny, C. & Grünberger, Curr Opin Biotechnol. 63, 26-33 (2020)). This understanding is also supported by the fact that key findings obtained with microfluidic single-cell technologies are reproducible by other methods. For example, the adder model of cell-size homeostasis in E. coli firstly observed in microfluidic chips has been repeatedly validated by different methods (Taheri-Araghi, S. et al. Curr. Biol. 25, 385-391 (2015)). Therefore, while we acknowledge the importance of considering environmental effects, we are confident that the synchronization we report reflects the genuine biological dynamics of E. coli cells.

      (4) To further test and generalize the model, it would be informative to also examine bacterial responses at intermediate temperatures rather than focusing primarily on a single cold-shock condition.

      We thank the reviewer for this thoughtful suggestion. In designing our experiments, we aimed to study the bacterial cold shock response at the single-cell level. A key feature of this response is that it is typically triggered only when the temperature drops below a certain threshold within a short time duration. We therefore chose to lower the temperature from 37 °C to 14 °C as rapidly as possible. This approach allowed us to leverage the unique capabilities of LUNA while also providing an opportunity to explore this biological process in greater detail.

      We agree that investigating bacterial responses across intermediate temperatures would be highly informative for understanding how temperature changes affect cellular physiology. However, this direction addresses a distinct scientific question that lies beyond the scope of the current work. We fully acknowledge its value and do have the intention to explore it in future studies.

    1. Author response:

      (1) Claim regarding NNDSVD initialization

      Reviewer #1:

      The authors state that "MPS is the first implementation of Constrained Non-negative Matrix Factorization (CNMF) with Nonnegative Double Singular Value Decomposition (NNDSVD) initialization." However, NNDSVD initialization is the default method in scikit-learn's NMF implementation and is also used in CaIMAN. I recommend rephrasing this claim in the abstract to more accurately reflect MPS's novelty, which appears to lie in the specific combination of constrained NMF with NNDSVD initialization, rather than being the first use of NNDSVD initialization itself.

      We agree that our original phrasing was too broad. NNDSVD-family initialization is widely used in NMF implementations (e.g., scikit-learn) and is available within some pipeline components. We revised the abstract and main text to clarify our intended contribution: MPS seeds CNMF directly with NNDSVD-derived nonnegative factors as the primary initialization strategy, rather than relying on heuristic or greedy ROI-based seeding, integrated within a memory-efficient, end-to-end workflow for long-duration miniscope recordings.

      (2) Installation issue on macOS

      Reviewer #1:

      At present, there are practical issues that limit the usability of the software. The link to the macOS installer on the documentation website is not functional. Furthermore, installation on a MacBook Pro was unsuccessful, producing the following error: "rsync(95755): error: ... Permission denied ...unexpected end of file."

      We thank the reviewer for identifying the broken installer link and the macOS installation error. We fixed the macOS installer link on the documentation website and updated installation instructions to explicitly address common macOS permission-related failures (including rsync "Permission denied" errors that arise when attempting to write into protected directories without appropriate privileges). We re-tested installation on clean macOS systems and confirmed successful installation under the revised instructions.

      (3) Validation, benchmarking, and cross-pipeline comparison

      Reviewer #2:

      A major limitation of this manuscript is that the authors don't validate the accuracy of their source extraction using ground-truth data or any benchmark against existing pipelines... Without this kind of validation, it is impossible to truly determine whether MPS produces biologically acceptable results... Considering one of the main benefits of MPS is its low memory demand and ability to run on unsophisticated hardware, the authors should include a figure that shows how processing times and memory usage scale with dataset sizes and differing pipelines... runtime comparisons on identical datasets processed through MPS, CaImAn, Minian, or CaliAli would be necessary to substantiate performance claims of MPS being "10-20X faster".

      We thank the reviewers for their careful reading and for raising the question of biological validity, which we agree is central to any calcium imaging analysis tool. We would like to clarify, however, that MPS does not introduce a novel source extraction algorithm, and therefore the question of biological validity is not one that MPS alone can answer - nor should it be expected to. MPS is built on CNMF, the same mathematical framework underlying CaImAn and Minian. The contribution of MPS lies in its initialization strategy and parallelization architecture, which allow this proven framework to operate in the multi-hour recording regime.

      To address the reviewers' request for a direct qualitative comparison, we will run MPS, CaImAn, Minian, and MIN1PIPE on a representative 10-minute real recording with clearly visible neurons. The figure will show the spatial components (ROI footprints) and representative temporal traces (ΔF/F) for all four pipelines on identical data. We anticipate that the spatial layouts and temporal dynamics will be highly concordant across pipelines, demonstrating that MPS produces biologically consistent output. We believe this side-by-side comparison will provide a clear demonstration that MPS output is comparable in quality to established tools on tractable recordings.

      Regarding runtime comparison across pipelines, we will provide a table showing approximate processing times at three recording durations (5, 20, and 180 minutes). On short recordings, all pipelines are expected to complete successfully at different rates, whereas on long-duration recordings, this pipeline behavior is expected to diverge. We acknowledge that any single runtime benchmark reflects specific hardware and dataset characteristics and may not generalize to all configurations. We will therefore present these data as illustrative rather than definitive and will direct readers to the MPS documentation for guidance on hardware-specific tuning.

      (4) Dataset description and scope of generalizability

      Reviewer #2:

      The current datasets used for validating MPS are not described in the manuscript. The manuscript appears to have 28 sessions of calcium imaging, but it is unclear if this is a single cohort or even animal, or whether these data are all from the same brain region. Importantly, the generalizability of parameter choices and performance could vary for others based on brain region differences, use of alternative calcium indicators...

      We agree that the dataset description should be centralized and unambiguous. We added a dedicated Methods subsection stating that all results are based on a single, controlled experimental dataset consisting of 28 long-duration miniscope sessions acquired under consistent conditions (same brain region, calcium indicator, optical configuration, and acquisition parameters). This section explicitly specifies the number of animals, brain region, frame rate, field of view, session duration, and total data volume. We also clarified that conclusions are intended to evaluate MPS performance in this controlled long-duration setting rather than to claim universal parameter generalizability across brain regions, indicators, or optical systems.

      (5) Parameter guidance and documentation

      Reviewer #2:

      ...users should not be expected to blindly trust default or suggested parameter selections. Instead, users need guidance on what each modifiable parameter does to their data and how each step analysis output should be interpreted. Currently, the documentation and FAQ website linked to MPS installation does not do an adequate job of describing parameters or their optimization...

      We agree that users should not blindly trust default or suggested parameters. We substantially expanded and centralized documentation by adding a parameter-selection walkthrough that explains what each modifiable parameter does, how it affects intermediate and final outputs, and how diagnostic plots generated at each stage should be interpreted. Rather than prescribing dataset-specific parameter values, we explicitly framed parameter selection as an iterative, hypothesis-driven process informed by experimental factors such as calcium indicator kinetics, lens size and numerical aperture, field of view, recording duration, and expected neuronal density. We consolidated previously dispersed explanations from the GitHub repository into a single documentation site and expanded figure descriptions to guide interpretation by less experienced users. A representative sample dataset and accompanying analysis code were made publicly available at https://github.com/ariasarch/MPS_Sample_Code to support parameter exploration on tractable data.

      (6) Packaging and distribution

      Reviewer #1:

      ...current best practices in software development increasingly rely on continuous integration and continuous deployment (CI/CD) pipelines to ensure reproducibility, testing, and long-term maintenance. In this context, it has become standard for Python packages to be distributed via PyPI or Conda. Without dismissing the value of standalone installers, the overall quality and sustainability of MPS would be greatly enhanced by also supporting conventional environment-based installations.

      Regarding distribution more broadly: while our one-click installers are intended to reduce setup burden for non-programmers, we recognize the value of conventional environment-based distribution for longterm sustainability. We are exploring the feasibility of adding a standard PyPI and/or Conda installation pathway alongside the standalone installers. To ensure reproducibility across environments, all package dependencies are now explicitly version-pinned at installation time, eliminating environment drift as a source of irreproducibility.

      We would note, however, that PyPI distribution alone does not fully resolve the reproducibility challenges inherent to scientific Python software. Even with version-pinned dependencies, downstream changes in the Python interpreter itself, compiled extension modules, and platform-specific build toolchains can silently alter numerical behavior in ways that are difficult to anticipate or control. Our standalone installers address this by shipping a complete, fixed execution environment, and we believe this remains a meaningful architectural advantage for ensuring long-term reproducibility - particularly for non-developer users who may not be in a position to diagnose subtle environment-related failures. We see PyPI/Conda support and standalone installers as complementary rather than equivalent approaches, and will pursue both where feasible.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Taken altogether, the experimental evidence favors an erosion-dominated process. However, a few minor questions remain regarding the models. Why does the equalfragmentation model predict no biomass transfer between size classes? To what extent, quantitatively, does the erosion model outperform the equal fragments model at capturing the biomass size distributions? Finally, why does the idealized erosion fail to capture the size distribution at late stages in Supplemental Figure S9 - would this discrepancy be resolved if the authors considered individual colony variances in cell adhesion (for instance, as hypothesized by the authors in lines 133-137)? I do not believe these questions curb the other results of the paper.

      Our analysis in Figure 2 considers two size classes: small colonies (l < 5) and large colonies (l ≥ 5). The equal-fragment model predicts that the fracture of a large colony gives rise to two daughter fragments with half the biovolume. For an average colony of l = 25 in diameter, this corresponds to two daughter fragments with a diameter of l = 18, which is still in the large colony class. Sequential fragmentation events would be required to set a biomass transfer to the small size range (l < 5). However, the nearly exponential behavior of the fragmentation frequency function (Eq. 5) implies that subsequent fragmentation events are greatly slowed down. Therefore, the equal-fragments model predicts that the biomass transfer from large to small colonies during the first five hours of the experiment is negligible. This is in a sharp contrast with the erosion model, which transfers biomass to the small size class at every fragmentation event. The difference between the two fragmentation models is quantified in Figure 2D, with a negligible change in biomass size distribution for the equal-fragment model (horizontal dash-dotted line) and a strong increase of small colonies for the erosion model (curved dashed line). Hence, it is clear from Figure 2D that the erosion model outperforms the equal-fragment model by capturing the observed shift from large to small colonies. We have now described this more clearly in lines 231-233.

      Nevertheless, the performance of the idealized erosion model is limited at late stages (Fig. S9D). We agree with the reviewer that this limitation could potentially be overcome with the introduction of variance in cell adhesion among colonies (as we hypothesized in lines 140142). However, this is not a trivial thing to do, as it would require additional free parameters and reduce the simplicity of the model. Therefore, we chose to restrain our model to the common assumptions of idealized fragmentation models widely used in literature (e.g. references 53-55).

      Reviewer #2 (Public review):

      Especially the introduction seems to imply that shear force is a very important parameter controlling colony formation. However, if one looks at the results this effect is overall rather modest, especially considering the shear forces that these bacterial colonies may experience in lakes. The main conclusion seems that not shear but bacterial adhesion is the most important factor in determining colony size. The writing could have done more justice to the fact that the importance of adhesion had been described elsewhere. This being said, the same method can be used to investigate systems where shear forces are biologically more relevant.

      In this work we aimed to investigate the effects of shear forces over a wide range of values, extending beyond the regime of natural lakes into the strong mixing created by technological applications such as the bubble plumes that are applied in several lakes to suppress cyanobacterial blooms. The adhesion force between cells via, e.g., extracellular polysaccharides (EPS) play an essential role by controlling the resistance to shear-driven erosion, which has been quantified in our model by the fitting parameters S<sub>i</sub> and q<sub>i</sub>.

      We agree with the reviewer that we have missed some literature on Microcystis colony formation via cell aggregation (i.e., cell adhesion), for which we apologize. In our new revision, we have now included several new references [30-34,36] and we now describe the findings of these earlier studies. Specifically, in the Introduction we now pay more attention to the role of cell adhesion by writing (lines 53-60):

      “In contrast, cell aggregation (sometimes also called cell adhesion) can promote a rapid increase in colony size beyond the limit set by division rates, and may explain sudden rises in colony size in late bloom periods [26, 30, 31]. Aggregation rates depend on the stickiness of the colonies, which in turn is controlled by the EPS composition, pH, and ionic composition of water [27–29]. In particular, divalent cations such as Ca2+ can bridge negatively charged functional groups in EPS and therefore increase stickiness [32–34]. It has been shown that high levels of Ca2+ enhance cell aggregation in Microcystis cultures [35]. Moreover, cell aggregation can provide a fast defense against grazing [36]. Fluid flow plays an important role in cell aggregation by regulating the collision frequency between cells or colonies [6]. In addition, fluid flow ….”

      Furthermore, in the Conclusions we added (lines 374-376):

      “A previous study on colony aggregation at high Ca2+ levels observed similar morphological differences in colony formation [35]. There, an initial fast cell aggregation produced a sparse colony structure, followed by a more compact structure of the colonies associated with cell division”

      Finally, we would like to clarify a difference in terminology between the reviewer’s comment and our work. The term cell adhesion is commonly used in microbiology to refer to adhesion of cells with a solid substrate. In our work, the adhesion mediated by EPS occurs between free-floating cells and colonies. To avoid any confusion, we chose to refer to this process as cell aggregation, in line with other literature on suspended particles.

      Reviewer #2 (Recommendations for the authors):

      The authors have expanded on the image analysis process but now report substantially different correction factors (λ2 =2.79 compared to 73.13 in the previous submission; λ3 =0.52 compared to 13.71 in the previous submission). Could the authors comment on how the analysis changed? These correction factors for N<5 appear particularly relevant for the aggregation experiments presented in Figure 3. For measurements involving only small colonies, as in Figure 3, are these correction factors still valid? In addition, does the timing of image acquisition, i.e. when the colonies are imaged, influence the correction factors applied in this study?

      The description of the calibration process was improved in our earlier revision of the manuscript to improve clarity and remove unclear definitions. In the first version, the supplementary equation (S1) for the input variable N<sub>p</sub>[i] was defined as the number of features per frame. This variable is dependent on the frame dimension (2048x2048 px for large colonies, l>5, and 400x400 px for small colonies). We believe that a more suitable input is the concentration distribution, which is normalized by frame area, and therefore invariant to frame dimensions and less prone to misinterpretations. For this reason, we adjusted this definition of N<sub>p</sub>[i] in the revised version of the manuscript, so that it expresses the number of features per frame area (instead of per frame). These changes required the calibration constants, λ<sub>2</sub> and λ<sub>3</sub>, to be updated in the manuscript by a factor of (400 px/2048 px)<sup>2</sup>. This explains why these two calibration constants changed by a factor 0.038. This rescaling of the input variable N<sub>p</sub>[i] and the calibration constants did not affect the final results of our calculations (Figures 2 and 3).

      The authors use a moderate dissipation rate to stir the colonies, after which they allow them to sediment. How long were the particles allowed to sediment before measurements were taken? Intuitively, one might expect a greater number of colonies to be detected following sedimentation, yet the authors report only about one third of the colonies in the sedimented state. What accounts for this reduction? Furthermore, if higher shear rates are applied, do the results differ, for instance if particles are lifted further by the shear flow? Some more clarity would help other researchers to perform similar work.

      The sedimentation of particles following an initial stir was applied only for creating a reference size distribution, displayed in the supplementary Figures S8-C and D. As one intuitively would expect, a higher concentration of colonies was detected after sedimentation (Fig. S8-C and D) than during the shear flow (Fig. S8-A and B). During all other experiments in our work, the applied dissipation rate was sufficient to ensure a uniform distribution of colonies in suspension throughout the parameter range, as described in lines 461-473.

      In the caption of Figure S8 we have reported the number of colonies counted in small subsamples. These numbers are just small subsets of the total number of colonies contained in the entire volume of the cone-and-plate setup. A sub-sample with larger volume was measured during the shear flow in comparison to the sub-sample measured for the sedimented sample, leading to a larger number of counted colonies in panels A and B (N = 10776, combined) compared to panels C and D (N = 3066 and 1455, respectively).

      However, when normalized for the volume of the sub-samples, the calculated concentration of colonies is higher for panels C and D (as shown in the graphs). We understand that the earlier caption description of Figure S8 was misleading, for which we apologize. In the revised version, we have adjusted the caption to better describe the quantity:

      “Number of colonies counted during sampling …”

      Line 797 contains an unfinished edit ("Figure ADD") that should be corrected.

      The unfinished edit has been corrected in the newly revised manuscript. Thanks!

    1. Author response:

      The following is the authors’ response to the previous reviews

      We appreciate the authors' efforts in addressing the concerns raised, particularly including a variance partitioning approach to analyse their data. Detailed feedback on the revised manuscript are below and we include a brief list of comments that we think the authors could address in the text: 

      (1) Justify metric selection - Could you please include in the text and explanation for why only five behavioural metrics were highlighted out of the many you calculated?

      We have added explanations throughout the manuscript clarifying the rationale for selecting these behavioral parameters, including in lines 467ff. and 531ff. In short, the five highlighted metrics were chosen because they capture key aspects of the behavioral repertoire and, importantly, can be consistently measured across all experimental conditions. Other parameters were excluded as they were only applicable under specific contexts and thus not suitable for cross-condition comparisons.

      (2) Discuss ICC variation - We note that there is variation among the ICC scores for the different metrics you've studied. While this is expected, we ask that you acknowledge in the text that some traits show high repeatability and others low, and reflect this variation in the conclusions.

      We have added an additional paragraph in the Discussion (lines 743ff.) addressing the variation in ICC values among behavioral traits. This new section highlights that some metrics show high repeatability while others exhibit lower consistency, and we discuss how this heterogeneity informs our conclusions about individual behavioral stability across contexts.

      (3) Tone down general claims - Because of the above point, we recommend that you avoid overstating that individuality persists across all behaviours. Please clarify this in the Abstract and main text that it applies to some traits more than others.

      We carefully reviewed the entire manuscript and revised the phrasing wherever necessary to avoid overgeneralization. Statements about individuality have been adjusted to clarify that consistent individuality can be measured in some behavioral traits more strongly than to others, both in the Abstract and throughout the main text.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths: 

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings, and extends the experiments from temporal stability to examining correlation of locomotion features betweendifferent contexts. 

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of highthroughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      Weaknesses:

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?

      The correlation analysis is used to establish stability between assays. For temporal retesting, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".

      The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23°C and 32°C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32°C variance is predictable by the 23°C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      The authors describe a dissociation between inter-group differences and interindividual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to ingroup ranks prior to correlation analysis?

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      Using the current single-correlation analysis approach, the aims would benefit from rewording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or openhardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms. 

      Comments on revisions:

      While the incorporation of a hierarchical mixed model (HMM) appears to represent an improvement over their prior single-parameter correlation approach, it's not clear to me that this is a multivariate analysis. They write that "For each trait, we fitted a hierarchical linear mixed-effects model in Matlab (using the fit lme function) with environmental context as a fixed effect and fly identity (ID) as a random intercept... We computed the intraclass correlation coefficient (ICC) from each model as the betweenfly variance divided by total variance. ICC, therefore, quantified repeatability across environmental contexts."

      Does this indicate that HMM was used in a univariate approach? Can an analysis of only five metrics of several dozen total metrics be characterized as 'holistic'?

      Within Figure 10a, some of the metrics show high ICC scores, but others do not. This suggests that the authors are overstating the overall persistence and/or consistency of behavioral individuality. It is clear from Figure S8 that a large number of metrics were calculated for each fly, but it remains unclear, at least to me, why the five metrics in Figure 10a are justified for selection. One is left wondering how rare or common is the 0.6 repeatability of % time walked among all the other behavioral metrics. It appears that a holistic analysis of this large data set remains impossible. 

      We thank the reviewer for the careful and thoughtful assessment of our work.

      We have added an additional paragraph in the Discussion (lines 743ff.) explicitly addressing the variation in ICC values among behavioral traits. This section emphasizes that while some metrics show high repeatability, others exhibit lower consistency, and we discuss how this heterogeneity informs our conclusions regarding individual behavioral stability across contexts.

      Regarding the reviewer’s concern about the analytical approach, we would like to clarify that the hierarchical linear mixed model (LMM) was applied in a univariate framework—each behavioral metric was analyzed separately to estimate its individual ICC value. This approach allows us to quantify repeatability for each trait across environmental contexts while accounting for individual identity as a random effect. Although this is not a multivariate model in the strict sense, it represents an improvement over the prior pairwise correlation approach because it explicitly partitions within- and between-individual variance.

      As for the selection of behavioral metrics, the five parameters highlighted (% time walked, walking speed, vector strength, angular velocity, and centrophobicity) were chosen because they represent key, biologically interpretable dimensions of locomotor and spatial behavior and, importantly, could be measured reliably across all tested conditions. Several other parameters that we routinely analyze (e.g., Linneweber et al., 2020) could not be calculated in all contexts—for instance, under darkness or when visual cues were absent—and therefore were excluded to maintain consistency across assays.

      We agree that a truly holistic multivariate comparison across all extracted parameters would be valuable; however, given the contextual limitations of some metrics, such an analysis was not feasible in the present framework. We have clarified these points in the revised manuscript to avoid potential misunderstandings.

      The authors write: "...fly individuality persists across different contexts, and individual differences shape behavior across variable environments, thereby making the underlying developmental and functional mechanisms amenable to genetic dissection." However, presumably the various behavioral features (and their variability) are governed by different brain regions, so some metrics (high ICC) would be amenable to the genetic dissection of individuality/variability, while others (low ICC) would not. It would be useful to know which are which, to define which behavioral domains express individuality, and could be targets for genetic analysis, and which do not. At the very least, the Abstract might like to acknowledge that inter-context consistency is not a major property of all or most behavioral metrics.

      We thank the reviewer for this helpful comment and agree that not all behavioral traits exhibit the same degree of inter-context consistency. We have clarified this point in the revised Abstract and ensured that it is also reflected in the main text. The Abstract now reads: 

      “We find that individuality is highly context-dependent, but even under the most extreme environmental alterations tested, consistency of behavioral individuality always persisted in at least one of the traits. Furthermore, our quantification reveals a hierarchical order of environmental features influencing individuality. We confirmed this hierarchy using a generalized linear model and a hierarchical linear mixed model. In summary, our work demonstrates that, similar to humans, fly individuality persists across different contexts (albeit worse than across time), and individual differences shape behavior across variable environments. The presence of consistency across situations in flies makes the underlying developmental and functional mechanisms amenable to genetic dissection.” 

      This revision clarifies that individuality is not uniformly expressed across all behavioral metrics, but rather in a subset of traits with higher repeatability, which are the most promising targets for future genetic analyses.

      I hold that inter-trial repeatability should rightly be called "stability" while inter-context repeatability should be called "consistency". In the current manuscript, "consistency" is used throughout the manuscript, except for the new edits, which use "stability". If the authors are going to use both terms, it would be preferable if they could explain precisely how they define and use these terms.

      We thank the reviewer for drawing attention to this inconsistency in terminology. We apologize for the oversight and have corrected it throughout the manuscript to ensure uniform usage.

      Reviewer #2 (Public review):

      Summary:

      The authors repeated measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths:

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting to their own needs.

      Weaknesses/Limitations: 

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context. 

      I think the authors are missing an opportunity to use much more robust statistical methods. It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anticonservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and withinindividual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not change, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? what exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!

      I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?

      I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors): 

      I am delighted to see the authors have included hierarchical models in their analysis. I really think this strengthens the paper and their conclusions while simultaneously making it more accessible to folks that typically use these types of methods to investigate these patterns of individual behavior. It's also cool, and completely jives with my own experience measuring individual behavior in that the activity metrics show the highest repeatability compared to the more flexible behaviors (such as "exploration"). I think it's quite striking and interesting to see such moderate repeatability estimates in these behaviors across what could be very different environmental scenarios. I think this is a very strong and meaty paper with a lot of information to digest producinghowever a very elegant and convincing take-home message: individuals are unique in their behavior even across very different environments.

      We sincerely thank the reviewer for the positive and encouraging feedback, as well as for their valuable input throughout the review process. We are very pleased that the inclusion of hierarchical models and the resulting interpretations resonated with the reviewer’s own experience and perspective.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Our goal was to propose a possible computational mechanism underlying information integration in the claustrum, not to claim structural or causal equivalence between the model and the biological circuit. We acknowledge that some expressions in the original manuscript may have been interpreted as exceeding this intention, and we will revise the text to explicitly soften such statements.

      It is well established that behavior-trained RNNs can admit multiple internal solutions capable of producing the same behavioral output, and we fully agree with this point. Among the many possible solutions, we focused on networks that exhibited dynamical properties consistent with independently obtained behavioral and physiological findings. Thus, in our view, biological plausibility in this study is not grounded in structural isomorphism, but rather in whether the core population-level dynamical properties observed in the model are reproducible in actual claustral population activity.

      We also agree with the reviewer that our original qualitative comparison of GPFA-based latent trajectories did not provide sufficient quantitative support. In the revised manuscript, we have therefore added an eigenvalue-based quantitative analysis of the dimensional structure of population trajectories. This analysis does not depend on the identity of the dimensionality-reduction method itself, but instead focuses on quantifying the geometric structure of population-state trajectories as they evolve over time. Applying the same metric to both the RNN and biological claustrum data revealed consistent condition-specific differences in population dynamics.

      This quantitative addition strengthens the previous qualitative trajectory comparison and clarifies that the model implements a specific computational dynamical regime that directionally corresponds to claustral population activity. While this does not imply uniqueness of the model, we believe it suggests that the proposed computational principle represents a biologically realizable candidate mechanism.

      (1) Tone of model-data correspondence

      Numerous statements describe the RNN as "closely mimicking," "recapitulating," or being "nearly identical" to claustral neural dynamics, sometimes extending to claims about causal relationships between neural activity and behavior. Given that neural data were not used to train the model, and that only a small subset of trained networks showed the reported dynamics, these statements should be substantially softened throughout the manuscript. The RNN should be framed as providing one possible computational realization consistent with existing data, not as a close instantiation of the biological circuit.

      We agree with the reviewer’s concern. Expressions such as “closely mimicked,” “nearly identical,” and “recapitulate” will be replaced with more moderate language.

      (2) Non-uniqueness of RNN solutions

      The fact that only a small fraction of trained networks exhibited "claustrum-like" clusters deserves deeper discussion. This observation raises the possibility that the identified solution is fragile or highly specific rather than canonical. The authors should explicitly discuss the non-uniqueness of internal solutions in behavior-trained RNNs, including the range of alternative network dynamics that can reproduce the same behavior. In particular, it should be clarified why the specific network exhibiting "claustrum-like" clusters is informative about claustral computation, rather than representing one arbitrary solution among many.

      As the reviewer noted, behavior-trained RNNs can yield multiple internal solutions that generate the same behavioral output, and we acknowledge this non-uniqueness. However, we do not interpret the relatively low success rate (5/100 networks) as evidence of fragility. Rather, we interpret it as suggesting that the emergence of this particular dynamical regime requires stringent structural constraints.

      The computational demands of the task—specifically, the integration of temporally separated signals—drive convergence toward networks capable of sustaining persistent activity through recurrent excitatory connectivity. Indeed, all networks exhibiting a claustrum-like cluster shared a strong recurrent excitatory structure within Cluster 1, a structural feature consistent with our slice electrophysiology findings.

      Our criterion for selecting RNNs was their ability to reproduce behavioral and physiological observations from the delayed escape experiment. Excluded RNNs may reflect alternative information-processing strategies characteristic of other brain regions or artificial logical solutions. Importantly, claustrum-like dynamics were not explicitly enforced during training; they emerged spontaneously under behavioral constraints, suggesting that this solution is not arbitrary.

      Furthermore, the computational principles derived from the RNN were quantitatively consistent with in vivo single-neuron activity. Using an eigenvalue-based metric (λ<sub>3</sub>/Σλ), both the RNN and biological claustrum data showed effects in the same direction. Leave-one-neuron-out analyses further demonstrated that this pattern was broadly distributed across neurons in the claustrum. These convergent results suggest that the identified network captures a computational regime that is consistent with claustral population dynamics, rather than representing an arbitrary solution unrelated to the biological observations.

      (3) GPFA trajectory comparisons

      The qualitative similarity between RNN trajectories and GPFA-derived trajectories from sparse in vivo data is interesting but insufficient to support claims of robustness or population-level structure. Statements suggesting that these patterns are unlikely to arise from noise or random fluctuations are not justified, given the single-trial, pseudo-population nature of the data. Either additional quantitative controls should be added, or the interpretation should be substantially tempered.

      We agree that the original GPFA trajectory comparison in the biological claustrum data remained qualitative and did not sufficiently establish robustness or population-level structure. We have therefore added quantitative analyses in the revised manuscript.

      Before presenting these analyses, we clarify methodological limitations inherent in pseudopopulation and single-trial data. GPFA estimates latent trajectories based on covariance structure and temporal smoothness assumptions. In pseudopopulations, true simultaneously recorded covariance cannot be fully reconstructed. Although our dataset is based on single trials rather than trial-to-trial variability, we acknowledge that latent-space estimation depends on covariance structure.

      Therefore, the additional quantitative metric is not independent of the GPFA estimation stage; rather, it evaluates the geometric structure of single-trial latent trajectories estimated by GPFA.

      Specifically, for biological data, we reanalyzed GPFA-estimated latent trajectories in PCA space and computed an eigenvalue-based metric (λ<sub>3</sub>/Σλ). Across 20 time bins, a sliding window of 10 bins was applied. For each window, we computed the covariance matrix and extracted eigenvalues for PC1, PC2, and PC3. The third eigenvalue (λ<sub>3</sub>) was normalized by total variance (Σλ = λ<sub>1</sub> + λ<sub>2</sub> + λ<sub>3</sub>). This metric quantifies the extent to which trajectories deviate from a planar (two-dimensional) structure into a third dimension. An increase in λ<sub>3</sub>/Σλ indicates the formation of a higher-dimensional geometric structure.

      For RNN data, since all unit activities were simultaneously observed and sufficient trials were available, we directly applied PCA to population activity without GPFA. Mean trajectories across trials were computed, and the same λ<sub>3</sub>/Σλ metric was applied. Although the initial dimensionality-reduction steps differ, the final metric definition and computation are identical. Thus, the comparison focuses on geometric dimensional structure rather than the dimensionality-reduction method itself.

      Importantly, within the biological dataset, GPFA estimation, preprocessing, pseudopopulation construction, subsampling strategy, temporal alignment, and smoothing were applied identically across the CS and Neutral conditions. Under this common analysis framework, λ<sub>3</sub>/Σλ values were consistently higher in the CS condition than in the Neutral condition.

      For the RNN data, an identical analysis pipeline was applied across the CS+Open and Open-only conditions. In this case as well, λ<sub>3</sub>/Σλ values were significantly higher in the CS+Open condition than in the Open-only condition.

      If structural bias arose from covariance estimation or dimensionality reduction, it would be expected to affect conditions similarly within each dataset. The observation that λ<sub>3</sub>/Σλ increases selectively in the CS condition in biological data and in the CS+Open condition in the RNN therefore supports the interpretation that the effect reflects a condition-specific dynamical difference rather than an artifact of dimensionality reduction.

      To further examine whether the effect was driven by a small subset of neurons, we performed leave-one-neuron-out analyses in the biological dataset. In the CS group, most neurons contributed relatively evenly to the metric, whereas such distributed contribution was not observed in the Neutral group. This suggests that the three-dimensional structure reflects an organized population-level phenomenon rather than covariance dominated by a small number of outlier neurons.

      These results indicate that the consistent elevation of λ<sub>3</sub>/Σλ in the CS condition (biological data) and in the CS+Open condition (RNN) reflects a genuine dynamical feature rather than an artifact arising from pseudopopulation construction or dimensionality reduction.

      Taken together, the three-dimensional geometric structure observed in GPFA-based latent trajectories is unlikely to reflect random noise. The replication of the same quantitative metric in the RNN, using an independent dimensionality-reduction procedure, strengthens the correspondence between the two systems. We appreciate the reviewer’s suggestion for quantitative reinforcement, which has substantially strengthened the manuscript.

      (4) Scope of functional claims

      The discussion connecting the findings to broad theories of claustral function, global workspace, or consciousness extends well beyond the data presented. These speculative links should be clearly labeled as such and significantly reduced in strength and prominence.

      We agree with the reviewer and will clearly indicate that references to broader theoretical interpretations are speculative. We will substantially reduce their strength and emphasis.

      (5) Comment on Conceptual Interpretation of the Behavioral Paradigm:

      The manuscript repeatedly describes the delayed escape task as an "inference-based behavioral paradigm" and states that animals "infer that a value-neutral alternative space is likely to be safer" when the CS is presented in a novel environment. While I appreciate that the US-CS association was established in a different context and that the CS is then presented in a new environment, I am not convinced that the current behavioral evidence uniquely supports an inference interpretation.

      We agree with the reviewer’s concern. We will describe the delayed escape task as “a behavioral paradigm that requires integration of temporally separated task-relevant signals” and remove inference-related terminology throughout the manuscript.

      Reviewer #2 (Public review):

      We appreciate the reviewer’s constructive and well-balanced comments. We regret that some of our wording and the scope of our introduction and discussion may not have appropriately reflected the contributions of prior studies. We will revise the manuscript accordingly to ensure that previous literature is more accurately and fairly acknowledged. In addition, we will reorganize the figures to more clearly present the hypotheses being tested and will provide additional details regarding both the modeling framework and the experimental procedures.

      (1) This paper is based on behavioral results and neural recordings from their prior paper (Han et al.), but data, e.g., in Figure 1, are not clearly identified as new or as coming from that source. Figure 1A, for example, appears to be taken directly from Han et al. No methods are given in this manuscript for the behavioral testing or the in vivo electrophysiology.

      We will clarify more explicitly which data and methods originate from Han et al. (2024). In the original manuscript, Figure 1 panels A, D, E, F, and L (left) were indicated in the legend as originating from Han et al. (2024). We will further clarify this distinction in the main text. Additionally, we will briefly describe the behavioral experiments and in vivo electrophysiology performed in Han et al. in the Methods section, with appropriate citation.

      (2) Many other details are unclear. Examples include model training, the weight matrices and how these changed with training (p. 13), equations 2 and 3 (p. 13), the sources for the constants in the equations (p. 14), the methods (anesthesia, stereotaxic coordinates, injection specifics and details for "sparse expression") for the ChrimsonR injections.

      As requested, we will provide additional details regarding model training procedures, weight matrices and their evolution during training, equations (2) and (3), the origin of constants used in the equations, and detailed methods for ChrimsonR injection (anesthesia, stereotaxic coordinates, injection parameters, and clarification of “sparse expression”).

      (3) The explorations of model behavior are a catalog of everything tried rather than an organized demonstration of what the model can and cannot do. The figures could be reduced in number to emphasize the key comparisons of the different clusters and the model's behavior under different conditions, intended to "test" the model.

      We will reorganize the figures to emphasize core results and clarify that the primary goal is to test and validate the computational model.

      (4) On page 6, the E-E connectivity is argued from Shelton et al. (2025) and against Kim et al. (2016), but ignores Orman (2015), which, to this reviewer's knowledge, was the first to demonstrate such connectivity, including the long-duration events and impact of planes of section.

      We will cite Orman (2015) as suggested and note that persistent activity has been observed in slices cut at specific angles, consistent with our findings.

      (5) Whereas the authors are entitled to their own opinion of prior work (references 3-8), it is inappropriate to misrepresent prior work as only demonstrating a "limited function" of claustum. Additional papers by Mathur's group and Citri's group are ignored.

      We will remove wording implying “limited” prior work and appropriately acknowledge contributions from the Mathur and Citri groups.

      In summary, the authors have made a computational model that recapitulates the firing of a subset of potentially claustral neurons during a particular behavioral task (delayed escape is certainly not the only behavior that involves claustrum - see e.g., attention, salience, sleep). If the conclusion is that excitatory claustral cells must be connected to other excitatory claustral cells, such a conclusion is not new, and the electrophysiological E-E metrics are not well quantified (e.g., connectivity frequency, strength of connection). If the model is intended to predict how the claustrum might accomplish any other task, there is insufficient detail to evaluate the model beyond the evidence that the model creates a subset of cells that can sustain firing during the delay period in the delayed escape task.

      Across all whole-cell recordings, optogenetic responses were observed in 38 out of 43 patched cells (~90%), suggesting that a high proportion of claustral neurons receive intra-claustral excitatory input. However, precise connectivity frequency and strength cannot be determined from the current dataset.

      As the reviewer noted, our RNN is specialized for the delayed escape task, and we do not claim direct generalization to other proposed claustral functions such as attention, salience, or sleep. The goal of this study is to computationally characterize the temporal integration mechanism observed in this specific task.

      While our model is specific to the delayed escape task, the computational principle identified here—nonlinear trajectory-based temporal integration supported by recurrent excitatory connectivity—may represent a more general mechanism for integrating temporally separated signals. However, testing such generality lies beyond the scope of the present study and will be framed as a future direction in the revised Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The electrocardiogram (ECG) is routinely used to diagnose and assess cardiovascular risk. However, its interpretation can be complicated by sex-based and anatomical variations in heart and torso structure. To quantify these relationships, Dr. Smith and colleagues developed computational tools to automatically reconstruct 3D heart and torso anatomies from UK Biobank data. Their regression analysis identified key sex differences in anatomical parameters and their associations with ECG features, particularly post-myocardial infarction (MI). This work provides valuable quantitative insights into how sex and anatomy influence ECG metrics, potentially improving future ECG interpretation protocols by accounting for these factors.

      Strengths:

      (1) The study introduces an automated pipeline to reconstruct heart and torso anatomies from a large cohort (1,476 subjects, including healthy and post-MI individuals).

      (2) The 3-stage reconstruction achieved high accuracy (validated via Dice coefficient and error distances).

      (3) Extracted anatomical features enabled novel analyses of disease-dependent relationships between sex, anatomy, and ECG metrics.

      (4) Open-source code for the pipeline and analyses enhances reproducibility.

      Weaknesses:

      (1) The linear regression approach, while useful, may not fully address collinearity among parameters (e.g., cardiac size, torso volume, heart position). Although left ventricular mass or cavity volume was selected to mitigate collinearity, other parameters (e.g., heart center coordinates) could still introduce bias.

      (2) The study attributes residual ECG differences to sex/MI status after controlling for anatomical variables. However, regression model errors could distort these estimates. A rigorous evaluation of potential deviations (e.g., variance inflation factors or alternative methods like ridge regression) would strengthen the conclusions.

      (3) The manuscript's highly quantitative presentation may hinder readability. Simplifying technical descriptions and improving figure clarity (e.g., separating superimposed bar plots in Figures 2-4) would aid comprehension.

      (4) Given established sex differences in QTc intervals, applying the same analytical framework to explore QTc's dependence on sex and anatomy could have provided additional clinically relevant insights.

      We thank Reviewer 1 for their kind and constructive comments. While we have thoroughly addressed all specific recommendations below, in brief, we have added new analysis of the variance inflation factor in Supplementary Tables 2 and 3 to reassure readers that the chosen parameter sets exhibit low levels of collinearity, and provided more explanation for why the relative positional parameters were chosen to avoid this issue. We have added explanatory figures for all positional and orientational parameters to improve understanding of the technical details, and improved clarity of existing figures as detailed below. We welcome the suggestion to add QT interval to the manuscript – whilst this was only available in the UK Biobank for a single lead, we have included an analysis of both QT and QTc intervals in this lead to Page 10, and added some discussion of this to the second full paragraph of Page 14.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comment 1: “Collinearity and Regression Analysis: It would be valuable to assess the collinearity among the regressed parameters (e.g., cardiac size, torso volume, heart center positions [x, y, z], and cardiac orientation angles) and evaluate whether alternative regression methods (e.g., ridge regression) might improve robustness. Additionally, cardiac digital twinning with electrophysiological models could help isolate the exact contribution of electrophysiology while enabling sensitivity analysis. Nonlinear regression or machine learning approaches might also enhance the predictive power of the analysis.”

      We thank the reviewer for drawing attention to the important issue of collinearity in the parameter sets used in the regression analysis. To address this, we have added Supplementary Tables 2 and 3, which detail the variance inflation factors for each of the parameter sets used. This was considered in the selection of anatomical parameters – e.g. using relative position not absolute distances between landmarks, which would be more collinear. As these are all below a value of 3.4, we believe that the effect of collinearity is limited, and thus to reduce subjectivity of parameter selection in more complex methods, and encourage interpretability, we have retained our linear regression analysis. In addition, we have added an explanation to the second full paragraph on Page 6 of how we calculated the relative, rather than absolute position of the cardiac centre partially to avoid the problem of collinearity when using multiple absolute distances. We concur that modelling and simulation techniques are well suited to explore the electrophysiological component further – as this is out of the scope of this work, we have addressed the role of these methods in future work in the final paragraph of Page 16.

      Comment 2: “Figure Clarity (Bar Plots): The superimposed bar plots in Figures 2-4 are difficult to interpret; separating the bars for each coefficient would improve readability.”

      We accept that the stacked bar plots could be improved in their clarity. Whilst plotting each anatomical parameter separately multiplies the number of plots by a factor of nine, and makes comparison between parameters more difficult, we have added clear horizontal grid lines in order to make values easier to read and interpret.

      Comment 3: “Feature Extraction Visualization: A schematic figure illustrating the steps for measuring heart positional parameters (e.g., with example annotations) would help readers better understand the feature extraction methodology.”

      We agree with the reviewer that the calculation of positional and orientational parameters is crucial to illustrate clearly. We have included additional Supplementary Figures 2 and 3 to better convey these parameters.

      Reviewer #2 (Public review):

      Summary:

      Missed diagnosis of myocardial ischemia (MI) is more common in women, and treatment is typically less aggressive. This diagnosis stems from the fact that women's ECGs commonly exhibit 12 lead ECG biomarkers that are less likely to fall within the traditional diagnostic criteria. Namely, women have shorter QRS durations and lower ST junction and T wave amplitudes, but longer QT intervals, than men. To study the impact, this study aims to quantify sex differences in heart-torso anatomy and ECG biomarkers, as well as their relative associations, in both pre- and post-MI populations. A novel computational pipeline was constructed to generate torso-ventricular geometries from cardiac magnetic resonance imaging. The pipeline was used to build models for 425 post-myocardial infarction subjects and 1051 healthy controls from UK Biobank clinical images to generate the population.

      Strengths:

      This study has a strength in that it utilizes a large patient population from the UK Biobank (425 postMI and 1051 healthy controls) to analyze sex-based differences. The computational pipeline is stateof-the-art for constructing torso-ventricular geometries from cardiac MR and is clinically viable. It draws on novel machine learning techniques for segmentation, contour extraction, and shape modeling. This pipeline is publicly available and can help in the large-scale generation of anatomies for other studies. This allows computation of various anatomical factors (torso volume, cavity volume, etc), and subsequent regression analysis on how these factors are altered before and after MI from the 12-lead ECG.

      Weaknesses:

      Major weaknesses stem from the fact that, while electrophysiological factors appear to play a role across many leads, both post-MI and healthy, the electrophysiological factors are not stated or discussed. The computational modeling pipeline is validated for reconstructing torso contours; however, potential registration errors stemming from ventricular-torso construction are not addressed within the context of anatomical factors, such as the tilt and rotation of the heart. This should be discussed as the paper's claims are based on these results. Further analysis and explanation are needed to understand how these sex-specific results impact the ECG-based diagnosis of MI in men and women, as stated as the primary reason for the study at the beginning of the paper. This would provide a broader impact within the clinical community. Claims about demographics do not appear to be supported within the main manuscript but are provided in the supplements. Reformatting the paper's structure is required to efficiently and effectively present and support the findings and outcomes of this work.

      We thank Reviewer 2 for their considered and detailed feedback. We greatly appreciate the invitation to elaborate on the electrophysiological factors, and we have added discussion of this matter to the second and third full paragraphs on Page 14, extending to Page 15 and first full paragraph on Page 15, and highlighted the role of modelling and simulation in future work on the third full paragraph of Page 16. We agree that registration errors are one reason behind remaining reconstruction errors and feel a strength of our study is that the large number of subjects used aided in reducing the effect of this noise, and have updated the second full paragraph of Page 16 to reflect this. We are wary of moving too many supplemental figures and tables describing demographic trends to the main manuscript for fear of diluting the specific answers to our research questions. We have however actioned the suggestions as detailed below to reformat the paper, including redressing the balance of supplemental versus main methodological sections, and thank the reviewer for their guidance in increasing our clarity.

      Reviewer #2 (Recommendations for the authors):

      (1) Please detail what "chosen to be representative of the underlying dataset" means in terms of a validation dataset.

      We thank the reviewer for addressing the lack of clarity in this matter. We have added a reference in the third full paragraph on Page 6 to Supplementary Appendix 1.1, where we have included full details of the selection criteria.

      (2) “Current guidelines ... further research [16]." The paragraph should begin with a broader statement that is relevant to the fact that the entire body of work focuses on ECG-based diagnosis differences in women, rather than LVEF through echocardiography.

      We have revised the introduction to Paragraph 3 on Page 3 to clarify our motivation for focusing on the ECG in order to shape proposals for novel ECG-based risk stratification tools.

      (3) The last paragraph of the introduction should more clearly state what was performed and how you aim to prove your hypothesis. There is no mention of the data, the regression model, or other key aspects important to the reader.

      We have added methodological details to Paragraph 5 on Page 3 in order to clarify our approach in testing our hypothesis.

      (4) An overview paragraph should be included in the Methods at the beginning.

      We thank the reviewer for this valuable suggestion – we have added an overview paragraph to the start of the methodology section on Page 5.

      (5) The computational pipeline portion of the methods should be written in full paragraphs instead of almost a bulleted list. In general, more details from the supplement should be provided in the methods.

      We thank the reviewer for raising important points concerning the balance of methodological description in the main manuscript and the supplementary materials. We have added detailed description of the reconstruction pipeline to Pages 5 and 6. We feel that the ordered format of the methods section adds to the reproducibility and transparency of our methodology.

      (6) The torso reconstruction method was already validated in Smith et al. [29]. What value does your additional validation bring to this methodology? Furthermore, how does the construction of the ventricular-torso reconstructions using the cardiac axes (not just the torso contours) influence ECG metrics?

      We apologise that this was not clear – we have clarified in Paragraph 4 on Page 5 that while Smith et al. 2022 provided a detailed validation to the contour extraction networks, it did not validate the torso reconstruction pipeline, as it only presents the reconstruction of two cases as a proof of concept. We have also expanded the second full paragraph on Page 6 to explain that the sparse (but not dense) cardiac anatomies were constructed in order to calculate the cardiac size, which we found was a key factor moderating many ECG biomarkers. We also specified that the cardiac position and orientation were necessary in order to relate these to the torso axes and positions of the ECG electrodes.

      (7) Include the details of the regression analysis in the main body of the methods for the readers. This is crucial to the claims and outcomes of the paper. Only a sentence is included in the results and one in the figure: "Each factor's contribution is calculated from the product of the regression coefficients and anatomical sex differences (Supplementary Appendix 1.5)." What specific contributions can I expect to see in the results figures? The results are filled with methodological aspects that should be in the results.

      We thank the reviewer again for this important comment regarding the balance of the main text methodology and supplementary methodology sections. We have added detail to the statistical analysis section of the main text on Pages 7 and 8 in order for the reader to understand the following results section without consulting the supplemental methods. We have also removed these details from the results section.

      (8) What is "the remaining estimated effect of electrophysiology". Did you do simulations on the electrophysiology, or how is this computed from the clinical data of patients? More explanation is needed, as without this, the paper is just focusing on anatomy.

      We have clarified this important point by moving the explanation of the methodology underpinning our estimation of the electrophysiological contributions using the clinical ECGs from the supplementary methods to the main manuscript on the second full paragraph on Page 7, and continuing to Page 8. We have also specified the role of simulations studies in future work on the final paragraph on Page 16.

      (9) Include an overview paragraph of the methods to create more structure.

      We thank the reviewer again for the further attention to this issue – as previously, we have added an overview paragraph to the methodology section on Page 5.

      (10) Only 19.8% of the patients were female, which is probably due to females having a more severe presentation of the disease. How does this impact, bias, or skew your results?

      This comment raises a very interesting point, and while the origin of this imbalance is of course multifactorial – women likely do have lower rates of MI events due to the cardioprotective role of estrogen and different health promoting behaviours, and our sex imbalance was reflective of wider trends in MI diagnosis. However, as mentioned in Paragraph 2 Page 3 of the text, there are more missed MI diagnoses in women, and we agree that this may lead to a more severe presentation of female MI pathophysiology. We have expanded the first full paragraph on Page 16 to specify the ECG and demographic impacts that this has on our results, and that it is a strength of this work that we may contribute to future adjustment of the diagnostic criteria, such that future investigations do not have this bias, and that clinical outcomes are improved.

      (11) A lot of extra information is provided in Tables 1 and 2. Include additional information in the supplements that is not directly relevant to your findings.

      We agree that Table 2 is supplementary, rather than critical information, and have moved it accordingly to the Supplementary Materials on Page 38. We do believe that Table 1 is central for understanding the extracted dataset.

      (12) Combine paragraphs 3 and 4 into a single paragraph. "Current guidelines..." and "T wave amplitude...". They are part of a single coherent concept.

      We have removed the paragraph break on Page 3 Paragraph 3.

      (13) Check all acronyms throughout the paper. The abbreviation for sudden cardiac death (SCD) is only used once in the same paragraph. Remove the acronym and type it out. T-wave amplitude (TWA) is introduced twice in a Figure caption and not introduced until the methods.

      Many thanks for this suggestion – we have reviewed all acronyms in the manuscript.

      (14) "Figure 1B showcases the capability of the computational pipeline to extract torso contours and reconstruct them into 3D meshes". Isn't this Figure 1A?

      We apologise that this was unclear, and have updated the sentence on the first full paragraph of Page 8 to clarify the purpose of Figure 1B.

      (15) No need to state: "Female y-axis limits have been adjusted by the difference in healthy QRS duration between sexes for ease of comparison" in the Figure 2 caption.

      We have removed this statement on all relevant captions.

      (16) The paragraph "For lead V6, 15.9% of healthy subjects..." can be combined with the previous section.

      We have removed this paragraph break on Page 9 to improve readability.

      (17) The only demographics I could find were age and BMI. State which demographics you used explicitly. This is especially true when the discussion makes claims like "Our findings suggest that corrected QRS duration taking into consideration demographics...". How did you take them into account?

      We accept that our previous description of the demographic adjustment to QRS duration in the discussion did not adequately reflect the comprehensiveness of our approach, and have adjusted the second paragraph on Page 14 to rectify this.

      (18) The results section is also almost a bulleted list that should be written and reformatted into paragraphs.

      The ordered style of our results section was designed to compare how our obtained data answers our research question differently for ECG intervals, amplitudes, and axis angles. Whilst we have adjusted paragraph breaks and moved methodological details to more appropriate sections, we have retained this stylistic choice.

      (19) The following sentence should be in the introduction: "Alterations to the polarity and amplitude of the T wave are used in the diagnosis of acute MI [42] and TWA affects proposed risk stratification tools, particularly markers of repolarization abnormalities [9, 43]."

      We thank the reviewer for this suggestion. We have included the discussion of how TWA is separately used in proposed risk stratification and current diagnostic tools in Paragraph 3 of Page 3.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors trained rats on a "figure 8" go/no-go odor discrimination task. Six odor cues (3 rewarded and 3 non-rewarded) were presented in a fixed temporal order and arranged into two alternating sequences that partially overlap (Sequence #1: 5<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-2<sup>+</sup>; Sequence #2: 3<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-4<sup>+</sup>) - forming an abstract figure-8 structure of looping odor cues.

      This task is particularly well-suited for probing representations of hidden states, defined here as the animal's position within the task structure beyond superficial sensory features. Although the task can be solved without explicit sequence tracking, it affords the opportunity to generalize across functionally equivalent trials (or "positions") in different sequences, allowing the authors to examine how OFC representations collapse across latent task structure.

      Rats were first trained to criterion on the task and then underwent 15 days of self-administration of either intravenous cocaine (3 h/day) or sucrose. Following self-administration, electrodes were implanted in lateral OFC, and single-unit activity was recorded while rats performed the figure-8 task.

      Across a series of complementary analyses, the authors report several notable findings. In control animals, lOFC neurons exhibit representational compression across corresponding positions in the two sequences. This compression is observed not only in trial/positions involving overlapping odor (e.g., Position 3 = odor 1 in sequence 1 vs sequence 2), but also in trials/positions involving distinct, sequence-specific odors (e.g., Position 4: odor 2 vs odor 4) - indicating generalization across functionally equivalent task states. Ensemble decoding confirms that sequence identity is weakly decodable at these positions, consistent with the idea that OFC representations collapse incidental differences in sensory information into a common latent or hidden state representation. In contrast, cocaine-experienced rats show persistently stronger differentiation between sequences, including at overlapping odor positions.

      Strengths:

      Elegant behavioral design that affords the detection of hidden-state representations.

      Sophisticated and complementary analytical approaches (single-unit activity, population decoding, and tensor component analysis).

      Weaknesses:

      The number of subjects is small - can't fully rule out idiosyncratic, animal-specific effects.

      Comments

      (1) Emergence of sequence-dependent OFC representations across learning.

      A conceptual point that would benefit from further discussion concerns the emergence of sequence-dependent OFC activity at overlapping positions (e.g., position P3, odor 1). This implies knowledge of the broader task structure. Such representations are presumably absent early in learning, before rats have learned the sequence structure. While recordings were conducted only after rats were well trained, it would be informative if the authors could comment on how they envision these representations developing over learning. For example, does sequence differentiation initially emerge as animals learn the overall task structure, followed by progressive compression once animals learn that certain states are functionally equivalent? Clarifying this learning-stage interpretation would strengthen the theoretical framing of the results.

      We agree that the emergence of sequence-dependent OFC activity at overlapping positions (e.g., P3) implies knowledge of the broader task structure and therefore must depend on learning. Although we did not record during early acquisition in the current study, we can outline a learning-stage framework consistent with both prior work and the comparative analyses included here and include it in the discussion.

      We think the development of OFC representations is a multi-stage process. Early in learning, before animals have acquired the sequential structure of the task, OFC activity is likely dominated by local sensory features and immediate reinforcement history, with little differentiation between sequences at overlapping positions. As animals learn that odors are embedded within extended sequences that have utility for predicting future outcomes, OFC representations would begin to differentiate identical sensory cues based on their sequence context, giving rise to sequence-dependent activity at positions such as P3. This stage reflects acquisition of the broader task structure and the recognition that current cues carry information about future states.

      With continued training, however, OFC representations normally undergo a further refinement: positions that differ in sensory identity but are functionally equivalent become compressed, while distinctions that are irrelevant for guiding behavior are suppressed. Evidence for this later stage comes from our over-trained control animals, in which discrimination between overlapping positions is near chance across most trial epochs, and from prior work using the same task in less-trained animals, where sequence-dependent discrimination is more strongly preserved. Thus, sequence differentiation appears to emerge during structure learning but is subsequently down weighted as animals learn which distinctions are behaviorally irrelevant.

      Within this framework, prior cocaine exposure appears to interfere specifically with this later refinement stage. Cocaine-experienced rats exhibit OFC representations resembling those seen earlier in learning—retaining sequence-dependent discrimination at overlapping and functionally equivalent positions—despite extensive training. This suggests not a failure to acquire task structure per se, but rather an impairment in the ability to collapse across states that share common underlying causes.

      (2) Reference to the 24-odor position task

      The reference to the previously published 24-odor position task is not well integrated into the current manuscript. Given that this task has already been published and is not central to the main analyses presented here, the authors may wish to a) better motivate its relevance to the current study or b) consider removing this supplemental figure entirely to maintain focus.

      Thanks for your suggestion, we have removed this supplemental figure as suggested.

      (3) Missing behavioral comparison

      Line 117: the authors state that absolute differences between sequences differ between cocaine and sucrose groups across all three behavioral measures. However, Figure 1 includes only two corresponding comparisons (Fig. 1I-J). Please add the third measure (% correct) to Figure 1, and arrange these panels in an order consistent with Figure 1F-H (% correct, reaction time, poke latency).

      Thanks for your suggestion, we have included the related figure as suggested.

      (4) Description of the TCA component

      Line 220: authors wrote that the first TCA component exhibits low amplitude at positions P1 and P4 and high amplitude at positions P2 and P3. However, Figure 3 appears to show the opposite pattern (higher magnitude at P1 and P4 and lower magnitude at P2 and P3). Please check and clarify this apparent discrepancy. Alternatively, a clearer explanation of how to interpret the temporal dynamics and scaling of this component in the figure would help readers correctly understand the result.

      Thanks for your suggestion. We appreciate this point and agree that clearer guidance on how to interpret the temporal and scaling properties of the tensor components would help readers. In the TCA framework, each component is defined by three separable factors: a neuron factor, a temporal factor, and a trial (position) factor. The temporal factor reflects the shape of the activity pattern within a trial, indicating when during the trial that component is expressed, whereas the trial factor reflects how strongly that temporal pattern is expressed at each position and across trials.

      Importantly, the absolute scaling of these factors is not independently meaningful. Because TCA components are scale-indeterminate, the magnitude of the temporal factor and the trial factor should be interpreted relative to one another within a component, not across components. Thus, a large value in the trial factor does not imply stronger neural activity per se, but rather greater expression of that component’s characteristic temporal pattern at that position or trial.

      Accordingly, when a component shows similar temporal dynamics across groups but differs in its trial factor structure—as observed here—the interpretation is that the same within-trial dynamics are being differentially recruited across task positions, rather than that the timing of neural responses has changed.

      We have added a brief discussion of this in this section of the results in the manuscript.

      (5) Sucrose control

      Sucrose self-administration is a reasonable control for instrumental experience and reward exposure, but it means that this group also acquired an additional task involving the same reinforcer. This experience may itself influence OFC representations and could contribute to the generalization observed in control animals. A brief discussion of this possibility would help contextualize the interpretation of cocaine-related effects.

      We agree that sucrose self-administration is not a perfect neutral manipulation and that this experience could, in principle, influence OFC representations. In particular, sucrose self-administration involves instrumental responding for the same primary reinforcer used in the odor task, and thus may promote additional learning about reward predictability, action–outcome contingencies, or contextual structure that could facilitate generalization.

      Several considerations, however, suggest that the generalization observed in control animals primarily reflects learning-dependent refinement of task representations rather than a specific consequence of sucrose self-administration per se. First, the amount of sucrose administered during this phase was minimal (50 µl × 60 presses at most per session for 14 sessions) compared with the total sucrose reward obtained during task recording (100 µl × 160 trials per session for several dozen sessions). Second, all rats were extensively trained on the odor sequence task prior to any self-administration, and the key signatures of compression and generalization we report—near-chance discrimination between functionally equivalent positions—are consistent with prior studies using the same task in animals that did not undergo sucrose self-administration. Finally, comparisons to less-trained animals in earlier work show that OFC representations evolve toward greater abstraction with increasing task experience, indicating that generalization is a property of advanced learning rather than a unique outcome of sucrose exposure.

      Importantly, even if sucrose self-administration were to enhance generalization in OFC, this would not account for the primary finding that cocaine-experienced rats fail to show these signatures despite identical task training and parallel instrumental experience. Thus, the critical comparison is not between sucrose-trained animals and naive controls, but between two groups matched for self-administration experience, differing only in the pharmacological consequences of the reinforcer. Within this framework, the absence of position-general representations in cocaine-experienced rats reflects a disruption of normal learning-dependent abstraction rather than an artifact of the control condition.

      We have added a brief discussion acknowledging that sucrose self-administration may bias OFC toward abstraction, while emphasizing that cocaine exposure prevents the emergence or maintenance of these representations under otherwise comparable experiential conditions.

      (6) Acknowledge low N

      The number of rats per group is relatively low. Although the effects appear consistent across animals within each group, this sample size does not fully rule out idiosyncratic, animal-specific effects. This limitation should be explicitly acknowledged in the manuscript.

      We acknowledge that the number of animals per group is relatively small and therefore cannot fully rule out animal-specific effects. However, the key neural and behavioral signatures reported here were consistent across individual animals within each group and across multiple levels of analysis, and no outliers were observed. In addition, sample sizes of this scale are common in cocaine self-administration studies due to their technical and logistical constraints. We did not attempt to obscure this limitation and have now explicitly acknowledged it in the manuscript discussion.

      (7) Figure 3E-F: The task positions here are ordered differently (P1, P4, P2, P3) than elsewhere in the paper. Please reorder them to match the rest of the paper.

      Thank you for pointing this out. We agree that the ordering of task positions in Figures 3E–F should be consistent with the rest of the manuscript. We have reordered the positions to match the standard sequence order used elsewhere in the paper (P1, P2, P3, P4) to improve clarity and avoid confusion.

      Reviewer #2 (Public review):

      In the current study, the authors use an odor-guided sequence learning task described as a "figure 8" task to probe neuronal differences in latent state encoding within the orbitofrontal cortex after cocaine (n = 3) vs sucrose (n = 3) self-administration. The task uses six unique odors which are divided into two sequences that run in series. For both sequences, the 2nd and 3rd odors are the same and predict reward is not available at the reward port. The 1st and 4th odors are unique, and are followed by reward. Animals are well-trained before undergoing electrode implant and catheterization, and then retrained for two weeks prior to recording. The hypothesis under test is that cocaine-experienced animals will be less able to use the latent task structure to perform the task, and instead encode information about each unique sequence that is largely irrelevant. Behaviorally, both cocaine and sucrose-experienced rats show high levels of accuracy on task, with some group differences noted. When comparing reaction times and poke latencies between sequences, more variability was observed in the cocaine-treated group, implying animals treated these sequences somewhat differently. Analyses done at the single unit and ensemble level suggests that cocaine self-administration had increased the encoding of sequence-specific information, but decreased generalization across sequences. For example, the ability to decode odor position and sequence from neuronal firing in cocaine-treated animals was greater than controls. This pattern resembles that observed within the OFC of animals that had fewer training sessions. The authors then conducted tensor component analysis (TCA) to enable a more "hypothesis agnostic" evaluation of their data.

      Overall, the paper is well written and the authors do a good job of explaining quite complicated analyses so that the reader can follow their reasoning. I have the following comments.

      While well-written, the introduction mainly summarises the experimental design and results, rather than providing a summary of relevant literature that informed the experimental design. More details regarding the published effects of cocaine self-administration on OFC firing, and on tests of behavioral flexibility across species, would ground the paper more thoroughly in the literature and explain the need for the current experiment.

      We appreciate this suggestion and have tried to expand the Introduction to more explicitly situate the study within the existing literature on cocaine-induced changes in OFC function. In particular, prior work has shown that cocaine self-administration alters OFC firing properties and disrupts behavioral flexibility across species, including impairments in reversal learning, outcome devaluation, and sensory preconditioning. We have revised the Introduction to expand this literature review and more clearly articulate how these established findings motivated our focus on OFC representations of hidden task structure and generalization.

      For Fig 1F, it is hard to see the magnitude of the group difference with the graph showing 0-100%- can the y axis be adjusted to make this difference more obvious? It looks like the cocaine-treated animals were more accurate at P3- is that right?

      The concluding section is quite brief. The authors suggest that the failure to generalize across sequences observed in the current study could explain why people who are addicted to cocaine do not use information learned e.g. in classrooms or treatment programs to curtail their drug use. They do not acknowledge the limitations of their study e.g. use of male rats exclusively, or discuss alternative explanations of their data.

      We agree that the current 0–100% scale can make small differences difficult to discern. We will make it clear in the figure captions (We will adjust the y-axis to a narrower range to better highlight group differences). Across P3, cocaine-experienced rats were more accurate than controls.

      We appreciate the suggestion to expand the discussion. We have revised the concluding section to acknowledge key limitations, including the use of only male rats, the number of subjects, and to note that alternative explanations—such as differences in motivational state or attention—could also contribute to the observed effects. These revisions provide a more balanced interpretation while retaining the focus on OFC-mediated generalization as a potential mechanism for persistent, context-specific drug-seeking.

      Is it a problem that neuronal encoding of the "positions" i.e. the specific odors was at or near chance throughout in controls? Could they be using a simpler strategy based on the fact that two successive trials are rewarded, then two successive trials are not rewarded, such that the odors are irrelevant?

      We thank the reviewer for this point. While neuronal encoding of individual positions (specific odors) in control animals was comparatively lower, this does not indicate that the rats were using a simpler strategy based solely on reward patterns. First, rats were extensively trained on the odor sequence task prior to recordings, demonstrating accurate discrimination across all positions, and their trial-by-trial behavior reflects sensitivity to specific odors rather than only reward alternation. Second, the task design—with overlapping sequences and positions that differ in reward contingency across sequences—requires tracking odor-specific context to maximize reward; a purely “two rewarded, two non-rewarded” strategy would fail at overlapping positions and would not account for the compression of functionally equivalent positions observed in the OFC. Third, in the less-trained rats shown in Figure 3C, decoding accuracy was higher than in the sucrose group, indicating that these animals still differentiated negative positions. With additional training, decoding patterns suggested improved generalization across positions. Thus, the near-chance neural selectivity in controls reflects representation of latent task states rather than external sensory cues, consistent with the idea that OFC abstracts task-relevant structure and ignores irrelevant sensory differences.

      When looking at the RT and poke latency graphs, it seems the cocaine-experienced rats were faster to respond to rewarded odors, and also faster to poke after P3. Does this mean they were more motivated by the reward?

      At present, the basis of these response-time differences remains unclear, in part because motivation is difficult to define operationally. If motivation is indexed solely by reaction time or poke latency, then the data are consistent with increased response vigor in cocaine-experienced rats. Indeed, RT and poke-latency measures indicate that cocaine-experienced rats responded more quickly on some rewarded trials, including after P3. However, overall task performance was high in both groups, suggesting that these differences cannot be attributed simply to superior learning or engagement. Faster responses may also reflect differences in deliberation or strategy, with cocaine-experienced rats relying more on rapid, stimulus-driven responding and sucrose-trained rats engaging in more careful evaluation. In addition, altered reward sensitivity or persistent effects of cocaine exposure may contribute to these behavioral differences. Thus, the faster responses observed in cocaine-experienced rats likely reflect a combination of heightened reward responsivity and altered encoding of task structure, rather than a straightforward increase in motivation alone.

      Recommendations for the authors:

      The reviewers were very positive about the manuscript and emphasized the rigor and state of the art analyses. Two points that came up were the very small n (6 total and 3 per condition) and the exclusive use of males. Adding more subjects is not recommended. However, more discussion and acknowledgement of this issue is recommended. The main concern is that idiosyncratic differences between individuals (not differences in cocaine history) are responsible for the differences observed in OFC encoding.

      We acknowledge that the sample size (n = 3 per group) and use of only male rats limit generalizability and do not fully rule out idiosyncratic, individual-specific effects. However, the key neural and behavioral signatures we report were consistent across all animals within each group and across multiple analyses (single-unit, ensemble decoding, and TCA). We now explicitly note these limitations in the Discussion, emphasizing that while individual variability cannot be fully excluded, the convergence of results across multiple levels of analysis supports the interpretation that the observed differences reflect effects of prior cocaine exposure rather than idiosyncratic differences.

      Reviewer #2 (Recommendations for the authors):

      In the legend to figure 2, the authors state "Notably, rats could discriminate between the two sequences (S1 vs. S2) based solely on current sensory information at two task epochs ["Odor" at P3 and P4; black bars]. At all other task epochs, indicated by gray bars, the discrimination relied on an internal memory of events". I'm confused by this statement- how does the odor at P3 help to discriminate the sequences? Surely P1 and P4 are the times when the odor sampling indicates which sequence they are in?

      We thank the reviewer for pointing out this source of confusion. The statement in the original figure legend was imprecise, and we have removed the figure and revised the figure legends because the results in the left panel substantially overlapped with those shown in the right panel. In this task, odors at positions P1 and P4 are the only cues that directly signal sequence identity, whereas the odors presented at P2 and P3 are identical across sequences. Accordingly, discrimination observed during the “Odor” epoch at P3 does not reflect sensory differences but instead depends on the animal’s use of internal memory or sequence context to infer sequence identity.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1:

      Yet I think that important aspects of my critique of the first statement of the manuscript about the flaws of [SR] model remain unanswered.

      I believe that I have fully addressed the points in the earlier review. The reviewer had doubted that my results were correct, attributing them to “a poor setup of the model” on my part. The reviewer stated that if I were correct about the factor of >10<sup>43</sup> change in cmax, this would “naturally break down all the estimates and conclusions made in Siljestam and Rueffler” (S&R).

      It appears that the reviewer is now convinced that my results represent a faithful analysis of the models on which S&R based their claims. The reviewer now contends that these results, including the factor of >10<sup>43</sup>, present no difficulties for the claims of S&R after all. In fact, this enormous factor of >10<sup>43</sup> is now claimed to support the conclusions of S&R by invalidating my conclusions. I respond to these new and very different arguments in what follows.

      As I stated in the first round of review, the issue is not the enormity of this factor per se, but the fact that the compensatory adjustment of cmax conceals the true effects of changes in other parameters. These effects are large; small changes to the parameter values mostly eliminate the diversity that the model is claimed to explain.

      The model in [SR] is not phenomenological as none of the parameters or functional forms were derived empirically. Instead, it is a proof of principle demonstration that inevitably grossly simplifies the actual immune response.

      The hidden sensitivity of the results of S&R to paramater values is sufficient to invalidate them as a proof of principle. The manuscript goes further and explains how the problem "is not specific to the details of the models of Siljestam and Rueffler, but is inherent in the phenomenon invoked to allow high diversity" because "any change that affects condition by as much as the difference between MHC heterozygotes and homozygotes will eliminate high equilibrium diversity". This general principle addresses all of the reviewer's points.

      In reality, a new pathogen cannot reduce the "survival" by such a factor as it would wipe out any resident population. So to compensate for such an artifact, the additional factor cmax was introduced to buffer such an excess. There is no reason to fix cmax once for an arbitrary number of pathogens, because varying cmax basically reflects the observation that a well-adapted individual must have a reasonable survival probability.

      This is not a legitimate reason for making compensatory, diversity-promoting adjustments to cmax when evaluating sensitivity to other parameters. If the number of pathogens or their virulence changes, cmax obviously does not automatically change along with it. If the population or species consequently goes extinct, then it goes extinct. If it persists, it does so with the same value of cmax.

      The possibility of extinction arguably puts a minimum value on cmax, but it does not restrict it to a range of values that conveniently leads to high MHC diversity. In the examples that I analyzed, slightly decreasing the number of pathogens or their virulence, which increases survivability, eliminates diversity. This phenomenon obviously cannot be dismissed on the grounds that survivability would be too low for the species to exist.

      S&R in effect assume that the condition of the most fit homozygote remains fixed, regardless of the number of pathogens, their virulence, and myriad other differences between species. It is this assumption that is without justification.

      At the same time, there are many ways in which the numerical simulation may break down when the survival rates become of the order of 10^(-43) instead of one

      I am not sure what is meant by “the numerical simulation may break down”. Numerical error is not a tenable explanation of the lack of diversity observed in that simulation. The outcome is exactly what is expected from purely theoretical considerations: conditions of all genotypes fall on the steep part of the curve, making the mechanism proposed by S&R largely inoperative, so a pair of alleles forming a fit heterozygote comes to predominate. The numerical simulation is actually superfluous.

      Low survival rates are completely irrelevant to the effect of decreasing the number of pathogens or their virulence, which does not lower survival rates, but does eliminate diversity.

      so it comes to no surprise that the diversification, predicted by the adaptive dynamics, does not readily occur in the scenario with an addition or removal of the 8th pathogen with a very high virulence \nu=20.

      Whether or not it surprising, the lack of diversity is a problem for the claims of S&R, as there is no reason to expect the number of pathogens to have just the right value to produce high diversity. Furthermore, for many combinations of values of the other parameters (e.g., my v=19.5 and 20.5 examples), no number of pathogens leads to high diversity.

      Again, the general principle mentioned above makes the details that the reviewer refers to irrelevant. Nonetheless, some additional remarks are in order:

      (1) This comment ignores the fact that removal of a pathogen, or a slight decrease in “virulence”, eliminates diversity without lowering survival rates.

      (2) Small increases or decreases in v (virulence) eliminate diversity without having such large effects on condition.

      (3) In the example emphasized by the reviewer, mean survival rates are nowhere near as low as 10<sup>-43</sup>. Only homozygotes have such low fitness.

      (4) The adaptive dynamics predict the low diversity seen in the simulations, contrary to what the reviewer seems to suggest. Elimination of diversity is not an artifact of the simulation.

      (5) v\=20 was chosen because it is most favorable to the model of S&R in that it yields the highest diversity. Indeed, S&R only observed realistically high diversity with the narrow gaussians that the reviewer objects to. With lower values of v, diversity is much lower, but even this meager diversity is eliminated by small changes in parameter values (see below). If narrow gaussians and large effects of pathogens somehow invalidate results, then they invalidate the high-diversity results of S&R.

      I have doubts that the reported breakdown of the [SR] model with fixed cmax remains observable with less extreme values of m and \nu (say, for \nu=7 and m=3 plus or minus 1 used in Fig. 3 in the manuscript).

      These doubts are unwarrented. With the suggested parameter values, for example, increasing or decreasing m by 1 reduces the effective number of alleles to around 1 or 2. This can easily be checked using the simulation code of S&R, as detailed in my initial response and now in a Supplementary Text. Even without this result, the general principle mentioned above tells us that considering other regions of parameter space cannot rescue the conclusions of S&R.

      So I still find the claim that " the phenomenon that leads to high diversity in the simulations of Siljestam and Rueffler depends on finely tuned parameter values" is not well substantiated.

      What is unsubstantiated is the claim of S&R that “For a large part of the parameter space, more than 100 and up to over 200 alleles can emerge and coexist”. As my manuscript illustrates, this is an illusion created by the adjustment of one parameter to compensate for changes in others.

      The reviewer even acknowledges that “the choice of constants and functions...works in a limited range of parameter values”. Furthermore, the manuscript explains why this problem is inherent to the general phenomenon, not specific to the details of the model or parameter values.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      It appears obvious that with no or a little fitness penalty, it becomes beneficial to have MHC-coding genes specific to each pathogen. A more thorough study that takes into account a realistic (most probably non-linear in gene number) fitness penalty, various numbers of pathogens that could grossly exceed the self-consistent fitness limit on the number of MHC genes, etc, could be more informative.

      The reviewer seems to be referring to the cost of excessively high presentation breadth. Such a cost is irrelevant to the inferior fitness of a polymorphic population with heterozygote advantage compared to a monomorphic population with merely doubled gene copy number. It is relevant to the possibility of a fitness valley separating these two states, but this issue is addressed explicitly in the manuscript.

      An addition or removal of one of the pathogens is reported to affect "the maximum condition", a key ecological characteristic of the model, by an enormous factor 10^43, naturally breaking down all the estimates and conclusions made in [RS]. This observation is not substantiated by any formulas, recipes for how to compute this number numerically, or other details, and is presented just as a self-standing number in the text.

      It is encouraging that the reviewer agrees that this observation, if correct, would cast doubt on the conclusions of Siljestam and Rueffler. I would add that it is not the enormity of this factor per se that invalidates those conclusions, but the fact that the automatic compensatory adjustment of c</sub>max</sub> conceals the true effects of removing a pathogen, which are quite large.

      I am not sure why the reviewer doubts that this observation is correct. The factor of 2.7∙10<sup>43</sup> was determined in a straightforward manner in the course of simulating the symmetric Gaussian model of Siljestam and Rueffler with the specified parameter values. A simple way to determine this number is to have the simulation code print the value to which c</sub>max</sub> is set, or would be set, by the procedure of Siljestam and Rueffler for different parameter values. I have in this way confirmed this factor using the simulation code written and used by Siljestam and Rueffler. A procedure for doing so is described in the new Supplementary Text S1. In addition, I now give a theoretical derivation of this factor in Supplementary Text S2.

      This begs the conclusion that the branching remains robust to changes in cmax that span 4 decades as well.

      That shows at most that the results are not extremely sensitive to c</sub>max</sub> or K. They are, nonetheless, exquisitely sensitive to m and v. This difference in sensitivities is the reason that a relatively small change to m leads to such a large compensatory change in c</sub>max</sub>. It is evident from Fig. 4 of Siljestam and Rueffler that the level of diversity is not robust to these very large changes in c</sub>max</sub>, which include, as noted above, a change of over 43 orders of magnitude.

      As I wrote above, there is no explanation behind this number, so I can only guess that such a number is created by the removal or addition of a pathogen that is very far away from the other pathogens. Very far in this context means being separated in the x-space by a much greater distance than 1/\nu, the width of the pathogens' gaussians. Once again, I am not totally sure if this was the case, but if it were, some basic notions of how models are set up were broken. It appears very strange that nothing is said in the manuscript about the spatial distribution of the pathogens, which is crucial to their effects on the condition c.

      I did not explicitly describe the distribution of pathogens in antigenic space because it is exactly the same as in Siljestam and Rueffler, Fig. 4: the vertices of a regular simplex, centered at the origin, with unity edge length.

      The number in question (2.7∙10<sup>43</sup>) pertains to the Gaussian model with v\=20. As specified by Siljestam and Rueffler, each pathogen lies at a distance of 1 from every other pathogen, so the distance of any pathogen from the others is indeed much greater than 1/v. This condition holds, however, for most of the parameter space explored by Siljestam and Rueffler (their Fig. 4), and for all of the parameter space that seemingly supports their conclusions. Thus, if this condition indicates that “basic notions of how models are set up were broken”, they must have been broken by Siljestam and Rueffler.

      ...the branching condition appears to be pretty robust with respect to reasonable changes in parameters.

      It is clear from Fig. 4 of Siljestam and Rueffler that the branching condition is far from sufficient for high MHC diversity.

      Overall, I strongly suspect that an unfortunately poor setup of the model reported in the manuscript has led to the conclusions that dispute the much better-substantiated claims made in [SD].

      The reviewer seems to be suggesting that my simulations are somehow flawed and my conclusions unreliable. I have addressed the reasons for this suggestion above. Furthermore, I have confirmed the main conclusion—the extreme sensitivity of the results of Siljestam and Rueffler to parameter values--using the code that they used for their simulations, indicating that my conclusions are not consequences of my having done a “poor setup of the model”. I now describe, in Supplementary Text S1, how anybody can verify my conclusions in this way.

      Reviewer #2 (Public review):

      (1) The statement that the model outcome of Siljestam and Rueffler is very sensitive to parameter values is, in this form, not correct. The sensitivity is only visible once a strong assumption by Siljestam and Rueffler is removed. This assumption is questionable, and it is well explained in the manuscript by J. Cherry why it should not be used. This may be seen as a subtle difference, but I think it is important to pin done the exact nature of the problem (see, for example, the abstract, where this is presented in a misleading way).

      I appreciate the distinction, and the importance of clearly specifying the nature of the problem. However, as I understand it, Siljestam and Rueffler do not invoke the implausible assumption that changes to the number of pathogens or their virulence will be accompanied by compensatory changes to c</sub>max</sub>. Rather, they describe the adjustment of c</sub>max</sub> (Appendix 7) as a “helpful” standardization that applies “without loss of generality”. Indeed, my low-diversity results could be obtained, despite such adjustment, by combining the small change to m or v with a very large change to K (e.g., a factor of 2.7∙10<sup>43</sup>). In this sense there is no loss of generality, but the automatic adjustment of c</sub>max</sub> obscures the extreme sensitivity of the results to m and v.

      (2) The title of the study is very catchy, but it needs to be explained better in the text.

      I have expanded the end of the Discussion in the hope of clarifying the point expressed by the title.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would like to suggest to the author that they provide essential details about their simulations that would justify their claims, and to communicate with Mattias Siljestam and Claus Rueffler whether claims of the lack of robustness could be confirmed.

      The models simulated were modified versions of those of Siljestam and Rueffler. Thus, only the modifications were described in my manuscript. I have added a more detailed description of how c</sub>max</sub> was set in the simulations concerned with sensitivity to parameter values. In addition, the new Supplementary Text S1, which describes confirmation of the lack of robustness using the code of Siljestam and Rueffler, should remove any doubt about this conclusion.

      Reviewer #2 (Recommendations for the authors):

      I have no further recommendations. The manuscript is well written and clear.

      Thank you.

      Reviewer #3 (Recommendations for the authors):

      (1) Since this is a full report and not just a letter to the editor, it would benefit from a bit more introduction of what the MHC actually is and what the current understanding of its evolution is. Currently, it assumes a lot of knowledge about these genes that might not be available to every reader of eLife.

      I have added some more information to the opening paragraph. I would also note that this report was submitted as a “Research Advance”, which may only need “minimal introductory material”.

      (2) Some more recent literature on MHC evolution should be added, e.g., the review by Radwan et al. 2020 TiG, a concrete case of MHC heterozygote advantage by Arora et al. 2020 MolBiolEvol, and a simulation of MHC CNV evolution by Bentkowski et al. 2019 PLOSCompBiol.

      I have cited some additional literature.

      (3) Since much of the criticism hinges on the cmax parameter, its biological meaning or role (or the lack thereof) could be discussed more.

      I am not sure what I can add to what is in the first paragraph of the Discussion.

      (4) I find it difficult to grasp how the v parameter, which is intended to define pathogen virulence, if I understand it correctly, can be used to amend the breadth of peptide presentation. Maybe this could be illustrated better.

      I have attempted to make this clearer. The parameter v actually controls the breadth of peptide detection conferred by an allele, which, if not identical to the breath of presentation, is certainly affected by it. The basis of the “virulence” interpretation seems to be that narrower detection breadth can, according to the model, only decrease peptide detection probability, which increases the damage done by pathogens.

      (5) Please check sentences in lines 279ff on peptide detection and cost of . There seem to be words missing.

      There was an extraneous word, which I have removed. Thank you for pointing this out.

    1. Author response:

      Response to reviewer 1:

      We thank reviewer 1 for their thoughtful, detailed, and constructive evaluation of our manuscript. We appreciate their recognition of the strengths of the study, particularly the integration of noradrenergic recordings, optogenetic manipulation, and cross-species analyses. We are especially grateful for the reviewer’s careful attention to clarity, experimental interpretation, and control comparisons. The comments have helped us sharpen the framing of our hypotheses, clarify causal claims, improve statistical reporting, and better explain our closed-loop approach and heart rate analyses. We have addressed each point in detail below and believe that the revisions substantially strengthen the manuscript.

      Response reviewer 2:

      We thank reviewer 2 for their thoughtful comment regarding citation, positioning relative to prior work, and caution in mechanistic interpretation. We have made efforts to cite relevant foundational and related work throughout the manuscript, but we will of course further clarify the relationship between our findings and prior studies in the revision.

      Although prior work has demonstrated infraslow coupling between sigma activity and heart rate and established a role for the locus coeruleus (LC) in coordinating these oscillations, cardiac measures have typically been presented as secondary observations rather than as primary experimental targets. While we of course recognize all the prior efforts conducted, a central goal of the present study was to perform a targeted and highly systematic characterization of norepinephrine-mediated heart-rate dynamics during sleep, integrating infraslow relationships, sleep-wake transitions, and a range of physiological manipulations of LC activity. A major priority of ours was to link infraslow heart-rate fluctuations to the well-known very-low-frequency (VLF) component of heart rate variability (HRV). Within the clinical HRV field, VLF has remained comparatively under-characterized and mechanistically unresolved. Our findings provide a biologically grounded explanation for this component, which we believe may be informative for the broader HRV community.

      Second, a core aim of this work is to provide a translational tool: to determine whether cardiac dynamics alone can reflect the infraslow, memory-consolidating potential of sleep and thus serve as a non-invasive biomarker. Because direct LC recordings are not feasible in humans, HRV, including its VLF component, may offer a clinically accessible proxy of sleep’s memory-restorative capacity. By directly manipulating LC activity and demonstrating corresponding changes in heart-rate dynamics, we strengthen the mechanistic and translational rationale of this biomarker approach. Our findings suggest that heart-rate measures alone may provide an estimate of the infraslow memory-consolidating potential of sleep.<br /> In revision, we will ensure that the foundational findings underlying this manuscript are highlighted, while communicating our new findings more clearly.

    1. Author response:

      eLife Assessment

      This important study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The detailed genetic analysis of two cancer genes (BRCA1 and BRCA2) demonstrated new roles for these genes in causing the tumor microenvironment in lung cancer. Further experimental explorations of the immune-related changes may still be required. The solid findings of this study provide a foundation for further developing drugs targeting BRCA1/2 in lung cancer therapy.

      We would like to express our sincere gratitude for your thoughtful and constructive comments on our manuscript. We will carefully consider each comment from these two reviewers and will revise the manuscript accordingly. Below, we provide a point-by-point response to each comment.

      Reviewer #1 (Public review):

      Summary:

      Liao et al. performed a large-scale integrative analysis to explore the function of two cancer genes (BRCA1 and BRCA2) in lung cancer, which is one of the cancers with an extremely high mortality rate. The detailed genetic analysis demonstrated new roles of BRCA1/2 in causing the tumor microenvironment in lung cancer. In particular, the discovery of different mechanisms of BRCA1 and BRCA2 provides an essential foundation for developing drugs that target BRCA1 or BRCA2 in lung cancer therapy.

      Strengths:

      (1) This study leveraged large-scale genomic and transcriptomic datasets to investigate the prognostic implications of BRCA1/2 mutations in LUAD patients (~2,000 samples). The datasets range from genomics to single-cell RNA-seq to scTCR-seq.

      (2) In particular, the scTCR-seq offers a powerful approach for understanding T cell diversity, clonal expansion, and antigen-specific immune responses. Leveraging these data, this study found that BRCA1 mutations were associated with CD8+ Trm expansion, whereas BRCA2 mutations were linked to tumor CD4+ Trm expansion and peripheral T/NK cell cytotoxicity.

      (3) This study also performed a comprehensive analysis of genomic variation, gene expression, and clinical data from the TCGA program, which provides an independent validation of the findings from LUAD patients newly collected in this study.

      (4) This study provides an exemplary integration analysis using both computational biology and wet bench experiments. The experimental testing in the A549 cell line further supports the robustness of the computational analysis.

      (5) The findings of this study offer a comprehensive view of the molecular mechanisms underlying BRCA1 and BRCA2 mutations in LUAD. BRCA1 and BRCA2 are two well-known cancer-related genes in multiple cancers. However, their role in shaping the tumor microenvironment, particularly in lung cancer, is largely unknown.

      (6) By focusing on PD-L1-negative LUAD patients, this study demonstrated the molecular mechanisms underlying resistance to immune therapy. These new insights highlight new opportunities for personalized therapeutic strategies to BRCA-driven tumors. For example, they found histone deacetylase (HDAC) inhibitors consistently downregulated 4-R genes in A549 cells.

      (7) The deposition of raw single-cell sequencing (including scRNA-seq and scTCR-seq) data will provide an essential data resource for further discovery in this field.

      Weaknesses:

      (1) The finding of histone deacetylase (HDAC) inhibitors suggests the potential roles of epigenetic regulation in lung cancer. It would be interesting to explore epigenetic changes in LUAD patients in the future.

      Thank you for your insightful comment. We fully agree that the specific situation of epigenetic dysregulation in LUAD needs to be explored. We believe that future investigations utilizing clinical specimens and animal models to map histone acetylation patterns and DNA methylation profiles will be crucial for identifying novel biomarkers and therapeutic targets unique to LUAD.

      (2) For some methods, more detailed information is needed.

      This is a valid point. We agree that additional details regarding are necessary for clarity and reproducibility. We will expand these method details in the revised manuscript.

      (3) There are grammar issues in the text that need to be fixed.

      We apologize for our irregular use of grammar. In the revised manuscript, we will carefully check the grammar and make corrections.

      (4) Some text in the figures is not labeled well.

      We appreciate the reviewers' comments. We will add labels to the revised version of the figures.

      Reviewer #2 (Public review):

      Summary:

      This study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The work highlights distinct roles of BRCA1 and BRCA2 mutations in shaping immune-related processes, and is logically structured with clearly presented analyses. However, the conclusions rely primarily on descriptive computational analyses and would benefit from additional immunological validation.

      Strengths:

      By integrating public datasets with in-house data, this study examines the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma from multiple perspectives using multi-omics approaches. The analyses are diverse in scope, with a clear overall logic and a well-organized structure.

      Weaknesses:

      The study is largely descriptive and would benefit from additional immunological experiments or validation using in vivo models. The fact that the BRCA1 and BRCA2 samples were each derived from a single patient also limits the robustness of the conclusions.

      Thank you for this excellent suggestion. In the revised manuscript, we will supplement the additional immunological experiments or validation using in vivo models. In addition, we will elaborate on the limitations of our study in the Discussion section and provide reasonable explanations.

    1. Author response:

      eLife Assessment

      The findings of this study are important since they cover the repurposing of small molecules as snake venom metalloprotease and phospholipase inhibitors for early intervention in the treatment of bothropic envenoming in the Neotropics, and thus provide a strong rationale for the progression of these inhibitors into future preclinical and clinical evaluation for snakebite indications across various ecological zones. The strength of the evidence is solid; however, there are some weaknesses, such as a lack of translatability of the in vivo model and insufficient venom characterisation. Thus, the strength of the evidence can be enhanced by the use of a mouse model. The paper remains of interest to ophiologists, biochemists and medicinal chemists.

      We thank the editors and reviewers for their assessment of this manuscript, and for the positive words highlighting the value of undertaking evaluation of small molecule drugs for snakebite in the neotropics. We completely agree that the next steps for this work will be to evaluate the preclinical efficacy of the identified drugs in mouse models. The comment around insufficient venom characterisation seems somewhat misplaced – the objective of this project was not to characterise the venoms used, but to evaluate the in vitro inhibition of venom toxin family activities and identify the potential utility of specific repurposed drugs as therapeutics for snakebite in the Neotropics.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Small molecule therapeutics for snakebite have received a lot of attention for their potential to close the gap between bite and treatment, where antivenom is not immediately available.

      Strengths:

      There has been a lot of focus on Africa, Asia, and India, but very little work related to neotropical regions. The authors seek to begin filling this gap in the preclinical literature. The authors use well-developed methods for preclinical assessment.

      Weaknesses:

      A clearer and more focused discussion of the limitations of the overall present work would be desirable (e.g. protection vs. rescue, why marimastat over prinomastat for in vivo assays when both have been through clinical trials for other indications; real-world feasibility of nafamostat, which has a half-life of 1-2 minutes compared to camostat, which has a half-life of hours). All of this could be improved in a revision.

      We thank the reviewer for their shared opinion of the potential value of small molecules as snakebite envenoming therapeutics and their insight on the gap in focus in the neotropics, which this manuscript aims to address. Our work in this manuscript included the standard practice of pre-incubation between drug and venom for all in vitro studies, and sequential (i.e. not co-incubation) administration in the egg model. In our revised manuscript, we will make these distinctions clearer. Use of a ‘rescue’ approach in the in vitro assays is not feasible due to the rapid destruction of the substrates used for assay readouts. The clearest rationale for the use of rescue models relates to their power within in vivo preclinical models (i.e. murine envenoming models), which, following the in vitro characterisations presented in this paper, are the logical next step for evaluating small molecule drugs for inhibiting neotropical snake venoms.

      Although both marimastat and prinomastat are repurposed drugs that have undergone clinical evaluation for other indications, marimastat has been more extensively characterised preclinically than prinomastat for snakebite, and will soon enter Phase II clinical trial evaluation for this indication (https://www.ddw-online.com/ophirex-to-produce-snake-venom-inhibitor-for-lstm-study-40669-202602/). Marimastat also has a longer half-life in humans of 8-10 hours (Millar et al. 1998), compared to prinomastat (2-5h, Hande et al. 2004). We will more clearly highlight the rationale for selecting marimastat in the revised manuscript.

      Although we appreciate the reviewer’s point regarding the short half-life of nafamostat (which is typically given by continuous iv infusion due to its short half-life), in the manuscript we have already stated (Line 434 to 448) that we do not recommend the progression of nafamostat as a snake venom serine protease (SVSP) inhibitor candidate due its low efficacy and off target effects. We highlight the need for the community to identify other serine protease inhibitors that might have utility for snakebite.

      Reviewer #2 (Public review):

      Summary:

      The authors set out to test whether a defined set of small molecules can lessen damaging effects caused by venoms from several Bothrops species, and whether these effects are consistent enough to suggest a broadly applicable approach. They present a cross-venom dataset spanning in-vitro activity readouts and blood-based functional outcomes, and include a chicken embryo model to explore whether venom inhibition can translate into improved survival. The central message is that certain small molecules can reduce specific venom-driven effects across multiple samples, providing a comparative resource for the field and a basis for prioritizing future validation.

      Strengths:

      The main value of this work is the breadth and structure of the dataset, which places multiple venoms and multiple readouts into a single, comparable framework that should be useful for readers evaluating patterns across samples. The experimental flow is generally coherent, moving from activity measurements to functional outcomes and then to an in-vivo test, which helps the reader understand how the authors link mechanism-oriented assays to more integrated endpoints. The manuscript also provides practical information for the community by highlighting which readouts appear most consistently affected across venoms, which can help guide hypothesis generation and study design in follow-up work.

      Weaknesses:

      Several aspects of the study design and framing reduce the confidence with which readers can translate the findings beyond the specific experimental context presented. The evidence base is strongest in controlled in-vitro settings, while the bridge to real-world effectiveness remains limited, particularly for understanding performance under conditions that better reflect delayed treatment and systemic exposure. As a result, the manuscript is best interpreted as a well-organized comparative screening study with promising signals, rather than a definitive demonstration of a broadly effective, deployable intervention.

      We appreciate the reviewer’s opinion on the thorough and logical workflow we present in this manuscript and the value this pipeline providers the field for future and parallel work. We agree with the reviewer that this provides a well-organized comparative screening study applicable to different snake species or therapeutics. In relation to the comment on this manuscript being a ‘definitive demonstration of a broadly effective, deployable intervention’, we agree with their opinion and are happy to clarify that while the evidence presented in this manuscript is promising, there is much work still to do before such molecules are ready for deployment for treating snakebite. Ultimately, this manuscript supports the growing evidence of the promising utility of marimastat and varespladib, and extends this evidence to neotropical snake venoms in a comparative manner. The next step will be to evaluate the efficacy of these molecules within in vivo murine preclinical models, which will be crucial for further supporting the evidence base for onward translation.

      Reviewer #3 (Public review):

      In this work, the authors wanted to evaluate repurposed small molecule inhibitors for the treatment of envenomation by snakes of the Bothrops genus; one of the most medically relevant in the Americas. I believe the objectives of the research were clearly achieved, and compelling evidence for the ability of these molecules to neutralize enzymatic and toxic activities of metalloproteinases and phospholipases in all the tested venoms is provided. Furthermore, the work highlights the limited efficacy of the tested serine protease inhibitor, suggesting a need for drug discovery campaigns to address toxicity caused by this protein family. The methods are well designed and performed, and the use of both in vitro and in vivo methodologies makes this a thorough and robust work.

      These results are extremely relevant, since they take us one step further to a potential orally administered snakebite treatment. The existence of such a treatment could improve the outcomes for thousands of snakebite victims worldwide. I have a few comments and questions that I hope will be useful to the authors:

      We thank the author for their high regard for the purpose and execution of this work. Their insight in relation to questions are supportive for an improved manuscript and discussion points for the field.

      During the introduction, the authors mention that small-molecule inhibitors can neutralize the localized tissue damage via cytotoxicity of some venoms, and cite PLA2s, SVMPs and/or cytotoxic 3FTxs as the main causing agents of this pathology. I am not aware of any direct effect described by small molecule inhibitors on cytotoxic 3FTxs alone. Has this been observed at all? Or is it more likely that the small molecule inhibitors act on the enzymatic toxins only, preventing synergistic effects with 3FTxs?

      We apologise for this error on our behalf. While inhibitory molecules have been described for cytotoxic 3FTxs, these are not small molecules as alluded to in the previous version of the manuscript. We have amended this text in our revision.

      I think it would be relevant to address the effects of non-enzymatic PLA2s, such as myotoxin II, which have been described in detail within Bothrops venoms. I believe there is some evidence of Varespladib also having a neutralizing effect on the myotoxicity caused by these non-enzymatic PLA2s. I suggest adding a comment about the contribution of these toxins in the discussion or in the section where PLA2 activity of the venoms is compared. In my opinion, right now it seems like these were overlooked.

      We thank the reviewer for highlighting this point. We agree that this is highly relevant and would benefit from discussion in the revised manuscript given the nature of our assays and the non-enzymatic mechanism of action of certain Bothrops PLA<sub>2</sub>s.

      Regarding Marimastat and the other MP inhibitors, are there any studies showing that they don't have an effect on endogenous MPs? I understand they have been approved for human use before, but is there any indication that they would not have an effect at the doses that would be required to treat envenomation?

      Most matrix metalloproteinases inhibitors will act on endogenous MPs to at least some extent (variable potency on different MMPs). Marimastat has demonstrated activity against endogenous metalloproteinases, including MMP1, which was hypothesised to cause severe joint pain when used chronically (i.e. frequent dosing over many weeks) for indications such as cancer, though this effect was reversible within 8 weeks of cessation of drug administration (Wojtowicz-Praga, 1998). Thus long-term use of matrix metalloproteinases inhibitors can cause safety concerns. However, the anticipated duration of dosing for snakebite, which is an acute life-threatening condition, is a few days. It is therefore unlikely that prior safety concerns observed following chronic dosing in cancer studies would apply to its potential use as a snakebite field therapy.

      Regarding the quenched fluorescence substrate used for enzymatic activity. Is there a possibility that some of the SVMPs would not act on this substrate, and therefore their activity or neutralization is not observed? Would it be relevant to test other substrates, such as gelatin, collagen, or even specific clotting factors?

      It has been observed that certain SVMPs (specifically several PI SVMPs) are not active against this ES010 substrate in vitro. The substrate used in the in vitro SVMP assay is reported by the manufacturer as a substrate for a wide range of MMPs which target the extracellular matrix components mentioned by the reviewer, i.e. collagenases and gelatinases as well as matrilysins, stromelysins and elastate. This in vitro assay combined with the coagulation assays are complementary in covering the main targets of SVMPs (ECM and clotting cascade), prior to haemorrhagic assessment in the egg model. Thus, we are confident that activity for the broad range of SVMP isoforms will be captured through the screening pipeline we have developed.

      Finally, could the authors comment or provide some bibliography regarding the translatability of the chicken embryo model in the context of envenomation?

      Our current model is based on an earlier egg embryo model (Sells et al. 1997, Sells et al. 1998 and Sells et al. 2000) which described good correlations (p<0.01) with the standard WHO murine preclinical envenoming model. These studies have assessed correlations for minimal haemorrhagic doses (MHDs), LD50s and ED50s in both models for a selection of viper venoms. As chicken embryos at day 6 of development have incomplete neural arcs, the model is not well suited for assessing neurotoxic effects, but can be effectively used for addressing venom-induced haemorrhage and lethality and for testing therapeutics. In addition, a more recent study (Yusuf et al. 2023) reported almost identical LD50s for the venom of Bitis arietans between the two in vivo approaches. The model is also being pursued as a preclinical testing model by an antivenom manufacturer with the focus of reducing the use of rodents in batch release testing (Verity et al. 2021). We will provide further clarification on the rationale for using the egg model, including the supportive references outlined above, in the revised manuscript.

    1. Author response:

      General Statements

      We thank the reviewers for their thoughtful and constructive comments on our manuscript. We have thoroughly considered all points raised and have made extensive revisions to address them. These revisions have significantly strengthened the manuscript.

      In summary, the key revisions and clarifications include:

      (1) Developmental Time-Course: To address the need for earlier phenotypic analysis, we have performed new immunofluorescence experiments at 30 days after hatching (dah). This new data (Fig. S7) precisely pinpoints the onset of the Leydig cell differentiation defect in dhh<sup>-/-</sup> mutants, establishing ~30 dah as the critical window for Dhh action.

      (2) Role of Ptch1 and Ptch2: We have qualified our conclusions regarding receptor specificity throughout the text to accurately reflect our findings and the limitation posed by the early lethality of ptch1 mutants. The in vivo genetic evidence for Ptch2 (the rescue of dhh<sup>-/-</sup> by ptch2<sup>-/-</sup>) is emphasized, while we now explicitly state that a role for Ptch1 cannot be ruled out without future conditional knockout models.

      (3) Mechanism between Gli1 and Sf1: In direct response to the reviewers' request for stronger evidence, we have performed a new cold probe competition assay. This experiment provides dose-dependent, biochemical evidence for the specificity of Gli1 binding to the sf1 promoter (New Fig. 5E). Furthermore, we have revised the text throughout the manuscript to use more precise language (e.g., "Gli1 activates sf1 expression") and removed overstated claims of "direct" regulation.

      (4) Methodological Rigor and Controls: We have added crucial negative controls for all RNA-FISH experiments using sense probes (New Fig. S9), provided detailed quantification methods for immunofluorescence, clarified the number of biological replicates for transcriptomic analyses, and corrected statistical tests as recommended.

      (5) Clarity and Presentation: We have revised the text for clarity, expanded the description of the TSL cell line's validation in the Introduction, added missing details to figure legends and methods, and incorporated suggested key references.

      We believe that our detailed responses and the significant new data and textual revisions have fully addressed the reviewers' concerns and have substantially improved the quality and impact of our manuscript.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      This manuscript by Zhao et. al investigates the canonical hedgehog pathway in testis development of Nile tilapia. They used complementary approaches with genetically modified tilapia and transfected TSL cells (a clonal stem Leydig cell line) previously derived from 3-mo old tilapia. The approach is innovative and provides a means to investigate DHH and each downstream component from the ptch receptors to the gli and sf1 transcription factors. They concluded that Dhh binds Ptch2 to stimulate Gli1 to promote an increase in Sf1 expression leading to the onset of 11-ketotesterone synthesis heralding the differentiation of Leydig cells in the developing male tilapia.

      Major comments:

      (1) Are the key conclusions convincing?

      Most results as reported are convincing; however, some conclusions are premature as additional experiments are required to satisfy their claims. For example, the phenotype of the dhh-/- testis is convincing in that Cyp1c1 cells are missing and the addition of ptch2-/- rescues the phenotype indicating a direct path. The link from gli to sf1, however, requires additional study to validate the direct relationship (see item 3 below).

      We thank the reviewer for the positive assessment that our principal findings are convincing. Regarding the connection between Gli1 and Sf1, we agree that additional validation was important. We have now performed new experiments and revised our text. As detailed in our response to item 3 below, we have incorporated a cold probe competition assay (new Fig. 5E) which provides dose-dependent evidence for the specificity of Gli1 binding to the sf1 promoter. Furthermore, we have toned down our conclusions in the manuscript.

      (2) Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      Major: Most significant premature claim is the statement that gli1 directly controls sf1 activity. Additional experiments are required to make this claim (see next statement).

      We agree with the reviewer that the claim of "direct" control was premature. We have therefore revised the manuscript accordingly. All statements claiming "direct" regulation of sf1 by Gli1 have been removed or replaced with more accurate descriptions, such as "Gli1 activates sf1 expression" and "Sf1 is a key transcriptional target of Gli1." These changes, coupled with the new functional data from the cold probe competition experiment (Fig. 5E) described in our response to item 3, now provide a robust and appropriately qualified account of our findings.

      Minor: As addressed in the discussion section, the ptch1 animals fail to survive limiting the ability to validate both ptch1 and ptch2 roles. Thus, the conclusion that only ptch2 is required should be qualified.

      We thank the reviewer for this rigorous comment. We fully acknowledge the limitation imposed by the early lethality of ptch1 mutants, which precludes a definitive in vivo assessment of its potential role in postnatal testis development. In direct response to this point, we have revised the text throughout the manuscript to more accurately reflect the strength of our conclusions. Specifically, in the Results section, we now state that “This differential receptor requirement implies that Ptch2 likely acts as the functional receptor for transducing Dhh signals in TSL cells” (lines 174–176). Furthermore, we have strengthened the Discussion by explicitly stating: “Therefore, while our findings strongly nominate Ptch2 as the principal receptor for Dhh in SLCs, a definitive exclusion of a role for Ptch1 will require future studies employing Leydig cell–specific conditional knockout models” (lines 265–268). We believe these revisions provide a appropriately qualified interpretation of our data while maintaining the compelling narrative of Ptch2's primary role.

      Major: There are a couple of key references missing however, please consider including:

      - Kothandapani A, Lewis SR, Noel JL, Zacharski A, Krellwitz K, Baines A, Winske S, Vezina CM, Kaftanovskaya EM, Agoulnik AI, Merton EM, Cohn MJ, Jorgensen JS.PLoS Genet. 2020 Jun 4;16(6):e1008810. doi: 10.1371/journal.pgen.1008810. eCollection 2020 Jun.PMID: 32497091

      - Park SY, Tong M, Jameson JL.Endocrinology. 2007 Aug;148(8):3704-10. doi: 10.1210/en.2006-1731. Epub 2007 May 10.PMID: 17495005

      We have included the key references: Kothandapani A, et al. (2020). PLoS Genet. and Park SY, et al. (2007). Endocrinology.

      (3) Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. Additional experiments are suggested to strengthen the direct connection between gli1 and sf1:

      Major: Figure 5F shows evidence for increased sf1-luc activity upon co-transfection of OnGli1 in TSL cells. These data would be strengthened with evaluation of the same sf1 promoter that has each/both putative GLI binding sites mutated.

      We thank the reviewer for this insightful suggestion. To further strengthen the evidence for the functional connection between Gli1 and the sf1 promoter, we have performed a new cold probe competition experiment. Given the potential presence of other unpredicted Gli-binding motifs within the 5-kb sf1 promoter region and the practical constraints, we employed an alternative, robust biochemical approach. This assay used a wild-type oligonucleotide containing the canonical Gli-binding motif (GACCACCCA) as a specific competitor. As shown in the new Fig. 5E, this cold probe caused a significant, dose-dependent reduction in Gli1-induced sf1-luc activity, while a mutated control probe (TTAATTAAA) had no effect. This result provides strong evidence that Gli1-mediated transactivation of the sf1 promoter is dependent on its specific binding to this consensus motif.

      Furthermore, in response to the reviewer's comment, we have revised the manuscript text to use more precise language, such as "Gli1 activates sf1 expression" and "Sf1 is a key transcriptional target of Gli1," toning down any overstated claims of direct regulation. Together with the existing data-which includes the original luciferase assay, the new competition experiment, and key loss-of-function/gain-of-function genetic evidence from SLCs transplantation-we believe our study now provides a compelling and multi-faceted case for Gli1 being the key regulator of sf1 within this pathway. We are confident that these revisions have satisfactorily addressed the point raised.

      Major: All 8xGli-luciferase assays should include evaluation of the mutant 8xGli-luciferase plasmid as a negative control.

      We thank the reviewer for highlighting the importance of reporter assay controls. In our study, we included the empty vector pGL4.23, which lacks any Gli-binding sites, as the fundamental negative control. As shown in Fig. 4C, this vector showed minimal background activity that was unresponsive to Dhh, confirming that the strong luciferase induction in the 8xGli-reporter is entirely dependent on functional Gli-binding sites. While a mutated 8xGli construct is one valid approach, we think that the use of an empty vector is functionally equivalent and equally rigorous for establishing specificity. We are confident that our current data unambiguously demonstrate Gli-dependent activation. For clarity, we have explicitly stated in the figure legend and methods that pGL4.23 served as the negative control.

      Minor: Figure 5D experiment that includes TSL-gli1(also 2,3) +/- OnDhh; please examine whether the absence of Gli affects expression of sf1 in each condition. In other words, provide a loss-of-function of Gli connection to regulation of sf1.

      We measured the mRNA expression levels of sf1 in TSL-WT, TSL-gli1<sup>-/-</sup>, TSL-gli2<sup>-/-</sup>, and TSL-gli3<sup>-/-</sup> cells using qRT-PCR. The results are presented in the new Supplementary Figure S8A. The results show that the loss of gli1 leads to a significant reduction in the expression of sf1. In contrast, the knockout of gli2 or gli3 had no significant effect on sf1 expression levels.

      (4) Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Given the expertise, it is not anticipated that the suggested experiments would be a significant burden to this group.

      We appreciate the reviewer's considerations. Now, we have performed the additional key experiments, which have been incorporated into the revised manuscript. We believe these new data have fully addressed the points raised.

      (5) Are the data and the methods presented in such a way that they can be reproduced?

      Most methods are adequately described or referenced to previous detailed description. There were, however, some methods that could benefit from additional details:

      Major: IF quantification data: please provide details on how the number of positive cells were quantified and presented, for example, how many cells from how many sections for each genotype were included for the analysis?

      We have added relevant information in the "Materials and Methods" section in line 369-373: “For each biological replicate (n\=5-6 fish per genotype), three non-serial, non-adjacent testis sections were analyzed. From each section, three representative fields of view were captured to ensure non-overlapping sampling. All positive cells number of Vasa, Sycp3 and Cyp11c1 was quantified by Image J Pro 1.51 software using default parameters.”

      Major: FISH: No controls are present, for example, scrambled RNA probes. Further, please clarify or address the significant presence of message in the nucleus.

      As suggested, we have now included negative control experiments using sense RNA probes for all genes (ptch1, ptch2, gli1, gli2, gli3). These controls showed no specific signal, confirming the specificity of our antisense probe hybridization. These data are now presented in the new Supplementary Figure S9.

      Major: TSL cells: TSL-onDhh, -onSf1: provide evidence for increase in expression

      We measured the mRNA expression levels of dhh in TSL-WT and TSL-OnDhh, and sf1 in TSL-WT and TSL-OnSf1 using qRT-PCR. The results are presented in the new Supplementary Figure S8B. The results show that overexpression of Dhh and Sf1 significantly increased the mRNA expression levels of dhh and sf1, respectively.

      Major: TSL + SAG cells and other treatments in general: how long were they treated before transplantation?

      Response: We have added relevant information in the "Materials and Methods" section in line 398-399: “For the SAG treatment experiment, TSL cells were incubated with 0.5 μM SAG for 48 hours before transplantation.”

      Major: Transcriptome analyses: how many replicates were used for each cell line? Please clarify-the results presented in Fig 5E: how was this plot generated, it is interpreted that all three cell lines were combined and compared to the WT line. It is not clear how this was achieved.

      We have added relevant information in the "Materials and Methods" section in line 445-447: “For the SAG treatment experiment, TSL cells were incubated with 0.5 μM SAG for 48 hours before collection. For each genotype, cells from three independent culture wells were pooled.

      Added relevant information in the "Results" section in line 198-202: “…we performed transcriptomic profiling of TSL cells under conditions of pathway activation: Dhh overexpression (TSL-OnDhh), Gli1 overexpression (TSL-OnGli1), and SAG treatment (TSL+SAG). Comparative RNA-seq analysis identified a core set of 33 genes consistently upregulated across all three conditions.”

      (6) Are the experiments adequately replicated and statistical analysis adequate?

      Most are adequate and appropriate, some questions remain:

      - Transcriptomes-how many replicates (see above)?

      - IF quantification-how were cells identified/how many sections (see above)?

      Minor: Statistics: methods indicate that a student's t-test was used, but ANOVA's are also used, which is appropriate. There are data presented that should be reevaluated via an ANOVA: Figure 4D, 4N-R; Figure 5G-no stats indicated in figure legend.

      We sincerely thank the reviewer for highlighting the inappropriate use of statistical tests in our original submission. We have re-analyzed all data using the ANOVA-based methods as suggested in the specific detail. We confirm that these changes do not alter the overall interpretation of our results but provide a more robust and statistically sound foundation for our conclusions. We changed “Differences were determined by two-tailed independent Student's t-test” to “Statistical significance was determined by one-way ANOVA followed by Tukey's test (C, Q-U, different letters above the error bar indicate statistical differences at P < 0.05) or Student's t-test (D) (*, P < 0.05; **, P < 0.01; NS, no significant difference).”

      In lines 719-721 we added “Statistical significance was determined by one-way ANOVA followed by Tukey's test (E, different letters above the error bar indicate statistical differences at P < 0.05) or Student's t-test (B, H) (*, P < 0.05; **, P < 0.01; NS, no significant difference).” in line 745-747.

      Reviewer #1 (Significance):

      The data presented in this manuscript provides important context towards the connection between the DHH pathway, Sf1, and steroidogenesis.

      The audience would likely include developmental biologists, including those related to differentiation of any hormone producing cell type and especially those focused on steroidogenesis onset. Clinical interests will be related to sex determination and differentiation, especially related to male sex phenotype differentiation. Basic scientists will be especially interested.

      Expertise: mouse fetal testis differentiation and maturation, steroidogenesis, hedgehog, sf1. Good fit except for the animal model, but they are surprisingly similar.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this work, Zhao et al., investigated the role of Dhh signaling pathway in the proliferation and differentiation of leydig lineage cells in the testes of Nile tilapia, an economic important farmed fish. By generating dhh mutants, the authors showed that loss of Dhh in tilapia recapitulated mammalian phenotypes, characterized by testicular hypoplasia and androgen insufficiency. A previous established TSL line was used to rescue the deficits in dhh-/- testes, which demonstrated that Dhh regulates the differentiation of SLCs rather than their survival. By generating mutant TSL lines, the authors aimed to identify the downstream players under Dhh in tilapia. Based on the data, the authors propose that a dhh-ptch2-gli1-sf1 axis exists in leydig cell lineage development.

      How secreted dhh from Sertoli cells affect the Leydig cells remains elusive. While previous studies have revealed the paracrine role of Sertoli cell secreted Dhh in the regulation of Leydig cell development and maturation, the authors provided some new insights into the issue using tilapia as a model. Unfortunately, this work is not well performed, and the conclusions are not well supported by the current data. And to reach logic conclusions, more meaningful experiments should be performed, and more convincing data should be provided.

      Strength:

      The authors used genetic mutants, TSL lines, and cell transplantation techniques to address the questions. The manuscript is technically sound, and overall is well-written.

      Limitations:

      Experimental design should be optimized, and more convincing data should be provided to reach solid conclusion.

      (1) The SLCs (stem leydig cells) used in this work. The SLC line was established from 3-month-old immature XY tilapia. The authors claimed that this line is a SLC line only because they express a few Leydig markers such as pdgfra and nestin. However, in my opinion, the identity of the cell line is not clear. It is suggested to perform more experiments, including flow cytometry assay or single cell RNA sequencing analysis, to further characterize this line, to demonstrate that this line is a real SLCs that are equivalent to the SLCs in 3-month testes of tilapia. According to the previous publication (2020), the information about the line was not well presented.

      We thank the reviewer for this comment regarding the characterization of the TSL cell line. The identity of TSL as a stem Leydig cell line was rigorously established in our previous publication (Huang et al., 2020), which provided comprehensive molecular, in vitro, and in vivo functional evidence that meets the definitive criteria for an SLC. This includes its stable expression of established SLC markers (pdgfrα, nestin, coup-tfii), its capacity to differentiate into steroidogenic cells producing 11-KT in vitro, and most critically, its ability to colonize the testicular interstitium, differentiate into Leydig cells, and restore androgen production upon transplantation in vivo.

      In direct response to the reviewer's point, we have revised the Introduction of our manuscript to provide a more detailed and clear description of the TSL line's origin and validation (lines 95-105) as “Furthermore, a stem Leydig cell line (TSL) has been established from the testis of a 3-month-old Nile tilapia. TSL expresses platelet-derived growth factor receptor α (pdgfrα), nestin, and chicken ovalbumin upstream promoter transcription factor II (coup-flla), which are usually considered as SLC-related markers in several other species. Notably, this cell line exhibits the capacity to differentiate into 11-ketotestosterone (11-KT)-producing Leydig cells both in vitro and in vivo. When cultured in a defined induction medium, TSL cells differentiate into a steroidogenic phenotype, expressing key steroidogenic genes including star1, star2, and cyp11c1, and producing 11-KT; upon transplantation into recipient testes, TSL cells successfully colonize the interstitial compartment, activate the expression of steroidogenic genes, and restore 11-KT production”, ensuring that readers can fully appreciate its well-founded identity as a SLC model without needing to consult the original publication. We are confident that the existing body of evidence solidly supports all conclusions drawn from its use in this study.

      (2) How loss of dhh affects testicular and the leydig cell lineage development are not clearly investigated. In the current manuscript, the characterization of dhh mutant was not enough and lack of in-depth investigation. The authors primarily looked at testes at 90 dph when Leydig cell lineage was well developed. In my opinion, this time was too late. To investigate the earlier events that are affected by loss of dhh, I suggested to perform experiments at earlier time points, in particular around the initiation stages of the sex differentiation and Lyedig cell specification/maturation.

      We thank the reviewer for this insightful comment. We agree that a thorough developmental analysis is crucial. In response to this point, we have now performed an in-depth investigation at earlier stages to precisely define the phenotype onset.

      Our revised manuscript includes new data from a developmental time-course analysis. While our initial characterization included 5, 10, and 20 dah, we now identified 30 dah as the critical window for Leydig cell differentiation onset, which was also supported by prior work (Zheng et al.). Our new immunofluorescence data at 30 dah now clearly show that Cyp11c1-positive cells are present in wild-type testes but are entirely absent in dhh<sup>-/-</sup> mutants (Fig. S7). This finding pinpoints the initial failure of SLC differentiation.

      We have integrated this key finding into the Discussion (lines 234-239) as “To define the onset of Leydig cell differentiation, we performed a developmental time-course analysis. This revealed that Cyp11c1-positive steroidogenic cells first appear in wild-type testes at 30 dah, while being conspicuously absent in dhh<sup>-/-</sup> mutants at this same stage (Fig. S7). This clear temporal pattern establishes ~30 dah as the developmental window when SLCs initiate their differentiation program in the Nile tilapia.”

      Concurrently, our analysis of the 90 dah timepoint remains vital, as it represents a mature stage with robust spermatogenesis and a stabilized somatic niche. This allows for a comprehensive assessment of the ultimate functional consequences of the early differentiation block, including its impact on germ cell support and overall testicular architecture.

      Thus, our study now provides a complete developmental perspective: the 30 dah timepoint identifies the initiation of the Dhh-dependent defect, while the 90-dah analysis reveals the mature, functional outcomes within the intact testicular niche.

      (3) The authors claimed that there was a ptch2-gli1-sf1 axis. The conclusion was drawn largely based on data that generated from the in vitro cultured TSL line. More data from genetic mutant tilapia are required to support the conclusion.

      We thank the reviewer’s insightful comments regarding the need for robust in vivo validation. In fact, our conclusion of a Dhh-Ptch2-Gli1-Sf1 axis is supported by an integrated experimental strategy, combining key in vivo evidence with targeted in vitro analyses to build a coherent model.

      (1) Evidence for Ptch2 as the key receptor: The role of Ptch2 is supported by a pivotal in vivo genetic experiment. The observation that the dhh<sup>-/-</sup> testicular phenotype is fully rescued in dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> double mutants provides compelling genetic evidence that Ptch2 is the essential receptor for Dhh in vivo (Fig. 4E-U). We acknowledge that the early embryonic lethality of global ptch1 mutation precludes its functional analysis in postnatal testis development. Therefore, while our data strongly nominate Ptch2 as the principal receptor, we have qualified our conclusions in the revised manuscript to reflect that a role for Ptch1 cannot be definitively excluded without Leydig cell-specific conditional knockout models.

      (2) Evidence for Gli1 and its regulation of Sf1: The role of Gli1 as the key transcriptional effector was efficiently identified using our well-characterized TSL system, a valid approach for dissecting this highly conserved signaling cascade. The functional connection between Gli1 and Sf1 is supported by multiple lines of evidence: transcriptomic profiling, promoter analysis, luciferase reporter assays (including a new cold probe competition experiment), and most importantly, in vivo functional validation via SLC transplantation. The latter demonstrated that Sf1 is both necessary and sufficient for SLC differentiation within the testicular niche (Fig. 5).

      In direct response to the reviewer's points, we have thoroughly revised the manuscript text to ensure all claims are accurately stated, particularly regarding the receptor specificity and the nature of the Gli1-Sf1 regulatory relationship. We believe our study provides a solid foundation for the proposed signaling axis.

      Overall, better experimental design should be planned, including the rescue experiments. Some key information was missed. For instance, the identity of the stem Leydig cells was not clearly presented.

      We have explained it in point #1.

      Figures:

      Figure 1: The authors described the phenotypes at 90 dph. Loss of dhh led to severe phenotypes in testicular formation, as evidenced by defective formation of Vasa, a germline stem cell marker; loss of expression of cyp11c1, a leydig cell marker; and loss of sycp3, a marker of meiosis of spermatogonia.

      However, in my opinion, 90 dph was too late. To investigate the role of dhh in Leydig cell lineage, the authors are suggested to focus on earlier developmental stages when the sex differentiation and maturation of leydig cells occur. This work is actually a development biology one that investigates how dhh loss in Sertoli cells affects the development of Leydig cells. The careful characterization of earliest testicular phenotypes of dhh mutant is very important.

      We have explained it in point #2.

      Figure 2: Please clarify the logic for performing rescue experiments using 11-KT. Provided the critical role of 11-KT in the testis development and spermatogenesis, it was not unexpected that 11-KT treatment can rescue most of the cell types in testes. If dhh is absolutely required for LC lineage development maturation, adding 11-KT at 30 dph will not have an effect. Why not perform rescue experiments using Dhh protein?

      We thank the reviewer for this insightful comment, which allows us to clarify the logical progression of our experimental design, a process central to genetic discovery.

      When we first characterized the dhh<sup>-/-</sup> mutant, we observed a complex suite of phenotypes: testicular hypoplasia, arrested germ cell development, a profound deficiency of Leydig cells, and drastically low androgen levels. A primary challenge was to distinguish which defects were direct consequences of losing Dhh signaling and which were secondary effects of the overall testicular failure.

      We therefore employed a classic genetic strategy: phenotypic dissection through targeted rescue. The 11-KT rescue experiment was designed to test a foundational hypothesis: Are the severe testicular defects in dhh<sup>-/-</sup> mutants primarily a consequence of the systemic androgen deficiency? The results provided a pivotal and clear answer: while 11-KT treatment partially rescued germ cell development and testicular structure, it completely failed to restore the population of Cyp11c1-positive Leydig cells. This critical finding allowed us to dissociate the phenotypes, demonstrating that the Leydig cell defect is a primary, cell-autonomous consequence of Dhh loss, not a secondary effect of low androgen.

      This conclusion logically propelled the next phase of our research: to shift focus from systemic hormone action to the local, niche role of Dhh in regulating the Leydig lineage directly. This led directly to the TSL transplantation experiments and the mechanistic dissection of the Ptch2-Gli1-Sf1 axis within SLCs.

      Regarding the use of Dhh protein, we agree it is a complementary approach. However, producing biologically active, recombinant Hedgehog ligand is challenging due to its essential dual lipid modification, which is required for solubility and activity. Our transplantation experiments with TSL-OnDhh cells (Fig. 3) functionally demonstrate that providing Dhh signaling in a cell-autonomous manner is sufficient to rescue differentiation, thereby directly addressing the core question without the need for recombinant protein.

      Figure 3. The authors showed that in dhh-/- testes, TSL engrafted equivalently but failed to express Cyp11c1. This result was strange which raised a question about the identity of the TSLs, as I have mentioned above. The authors claimed that the TSLs are stem Leydig cells, which I doubt. Additional data should provided to support the statement.

      In the testicular environment, the transplanted TSLs should be able to colonize and differentiate into more mature leydig cells. Only a small portion of the PKH26-labled TSLs became Cyp11c1 positive after transplantation, can the authors comment this observation?

      To address "Mutation of dhh blocks SLC differentiation", the authors should first carefully examine the TSL lineage development using dhh mutant. Then, investigate how loss of dhh disrupts the cross talk between Sertoli cells and Leydig cells. why bother performing transplanted TSLs? Please clarify. Why not perform rescue experiments using Dhh protein at appropriate developmental stages?

      We thank the reviewer for these comments, which allow us to clarify the rationale and interpretation of our key experiments.

      (1) We have provided comprehensive evidence establishing the TSL line as a SLC line (Response to Point #1). The observation that WT TSL cells engraft but fail to differentiate in the dhh<sup>-/-</sup> testicular environment is not strange; it is, in fact, the core and most crucial finding of this experiment. It provides direct functional evidence that the dhh<sup>-/-</sup> niche lacks the essential signals required to initiate SLC differentiation, consistent with the severe deficiency of endogenous Cyp11c1<sup>+</sup> cells in these mutants (Fig. 1I-J', N).

      (2) The reviewer's concern about "only a small portion" of cells differentiating is based on a misunderstanding. Our quantitative data (Fig. 3F) show that approximately 78% of the transplanted PKH26+ TSL cells successfully differentiated into Cyp11c1<sup>+</sup> cells in WT hosts. This high efficiency robustly demonstrates the differentiation potential of TSL cells and the permissiveness of the WT niche. The near-zero differentiation rate in the dhh<sup>-/-</sup> host (Fig. 3F) starkly highlights the specific and severe defect in the mutant microenvironment.

      (3) The TSL transplantation experiment was the most direct strategy to test why Cyp11c1<sup>+</sup> cells are absent in dhh<sup>-/-</sup> testes. It allowed us to distinguish between a failure in SLC differentiation and other possibilities (e.g., cell death). The finding that functional SLCs cannot differentiate in the mutant niche logically directed our subsequent focus onto the cell-intrinsic molecular mechanism (the Ptch2-Gli1-Sf1 axis) within the Leydig lineage. While Sertoli-Leydig crosstalk is an important area, it was beyond the scope of this study aimed at defining the intrinsic differentiation pathway.

      (4) Regarding Dhh protein rescue, generating bioactive, lipid-modified recombinant Hh protein is technically challenging. Our transplantation of TSL-OnDhh cells (Fig. 3) functionally demonstrates that providing Dhh signaling in a cell-autonomous manner is sufficient to rescue differentiation, effectively addressing this question without the need for recombinant protein.

      Figure S3. “To assess whether dhh mutation affects androgen-producing cells outside Leydig cells, 11-KT levels were analyzed during early testicular development before SLCs differentiation. IF analyses revealed that no Cyp11c1 positive cells were present in the testes of XY WT fish at 5, 10, and 20 dah, indicating that SLCs had not yet differentiated at these stages (Fig. S3A-C). Tissue fluid 11-KT levels showed no significant differences between WT and dhh-/- XY fish at 5, 10, and 20 dah (Fig. S3D)”. These observations suggested that loss of dhh does not affect the specification of SLCs, but affect its differentiation into mature LCs. The differentiation of Cyp11c1 should be later than 20 dah. So when is the earliest time point for formation of Cyp11c1 positive cells, and how loss of dhh affect this? These are important questions to answer.

      We agree with the reviewer's interpretation that our data suggest dhh loss affects SLC differentiation rather than initial specification. In direct response to the need for earlier timepoints, we have now performed and included an analysis at 30 dah, which we identified as the critical window for Leydig cell differentiation onset. Our new data (Fig. S7) show that Cyp11c1+ cells are present in WT testes but are entirely absent in dhh<sup>-/-</sup> mutants at this stage. This precisely pinpoints the initiation of the phenotypic divergence and establishes ~30 dah as the developmental window when Dhh signaling is required to drive SLC differentiation. Our study therefore now provides a complete developmental perspective, from the initial failure at 30 dah to the mature functional outcomes at 90 dah.

      Figure 4. The authors generated ptch1/2 mutant TSL lines, and luciferase assay was performed, and based on the results, the authors concluded that Ptch2, but not Ptch1, is specifically required for transducing Dhh signals in TSLs. The conclusion was only based on luciferase assay using TSLs. Whether this was the case in testes at animal level is not clear. Clearly, more genetic experiments, using ptch mutants, should performed to substantiate this.

      The authors stated “Ptch2 acts as the obligate receptor for Dhh signaling during testis development”. If ptch2 is required for TSL lineage, why ptch2-/- testes exhibited no significant differences in testicular histology and Leydig cell (Cyp11c1+) populations and serum 11-KT levels? This contradictory statement need to be addressed.

      We thank the reviewer for these critical comments, which allow us to clarify the logic underlying our conclusions regarding Ptch2.

      (1) In Vivo Genetic Evidence for Ptch2: Our conclusion that Ptch2 is the primary receptor for Dhh is not based solely on the TSL luciferase assays. It is definitively supported by a key in vivo genetic experiment: the complete phenotypic rescue in the dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> double mutants (Fig. 4F-R). In genetic terms, the loss of the receptor (ptch2) suppressing the phenotype caused by the loss of the ligand (dhh) is classic evidence for a ligand-receptor relationship within a linear pathway. This in vivo evidence strongly substantiates Ptch2's role at the animal level. The early embryonic lethality of ptch1 mutants precludes a similar in vivo test for Ptch1 in postnatal testis development.

      (2) Addressing the Apparent Contradiction of the ptch2<sup>-/-</sup> Phenotype: The reviewer raises an excellent point, which stems from the fundamental biology of the Hh pathway as shown in Author response image 1. Ptch receptors are inhibitory. In the absence of ligand, Ptch suppresses pathway activity.

      Author response image 1.

      The canonical Hh signaling pathway. In the dhh<sup>-/-</sup> mutant, the pathway is suppressed due to unopposed Ptch activity, leading to a failure in SLC differentiation. In the ptch2<sup>-/-</sup> mutant, this key inhibitory brake is removed, leading to constitutive activation of the pathway. The fact that ptch2<sup>-/-</sup> testes are normally indicates that this level of pathway activation is not detrimental and, crucially, is sufficient to support wild-type levels of Leydig cell development and steroidogenesis. This lack of a phenotype in the receptor mutant, contrasted with the severe ligand mutant phenotype, is a common and expected observation in signaling pathways where the receptor acts as a tonic inhibitor.

      In summary, the normal development of ptch2<sup>-/-</sup> testes is not contradictory but is entirely consistent with its role as the inhibitory receptor for Dhh. The severe phenotype in dhh<sup>-/-</sup> mutants and its specific rescue by removing ptch2 provides compelling genetic evidence for their functional relationship. We have revised the text throughout the manuscript to ensure these conclusions are accurately stated.

      Figure 5. The authors generated gli1/2/3 mutant TSL lines, and luciferase assay was performed, and based on the results, the authors concluded that Gli1, but not Gli2/3, was specifically required for transducing Dhh signals in TSL cells. The conclusion is drawn, only based on luciferase assay using TSLs. Whether this was the case in testes at animal level is not clear. Clearly, more genetic experiments should performed to substantiate this, using the gli mutant fish.

      To identify Gli1-dependent targets in SLCs, the authors compared transcriptomes of TSLWT, Dhh-overexpressing (TSL-OnDhh), Gli1-overexpressing (TSL-OnGli1), and SAG-treated (TSL+ SAG) TSL cells. While this experiments can be used to identify dhh target genes, it is better to use gli mutant cell lines. Since the authors have generate gli1/2/3 mutants, why not using these mutant fish to identify/confirm the Gli targets?

      We thank the reviewer for these comments.

      (1) We acknowledge that Gli1 as the key transcriptional effector is primarily based on our in vitro evidence using the TSL cell line. We have revised the manuscript accordingly to ensure this is stated precisely, avoiding overstatement.

      (2) Concerning the transcriptomic analysis, the reviewer suggests using glis mutant cell lines. While this is a valid approach, our strategy of profiling pathway activation (via Dhh/Gli1 overexpression or SAG treatment) was deliberately chosen to provide a high signal-to-noise ratio for identifying genes that are positively upregulated during the differentiation process. Analyzing loss-of-function mutants under basal conditions can be confounded by potential compensatory mechanisms among the Gli family members, potentially masking the specific transcriptional signature of pathway activation we sought to capture.

      By the way, we have generated gli1/2/3 mutant TSL cell lines for the functional luciferase assays, but we have not generated the corresponding glis mutant fish lines, which would represent a substantial new line of investigation.

      Reviewer #2 (Significance):

      While previous studies have revealed the paracrine role of Sertoli cell secreted Dhh in the regulation of Leydig cell development and maturation, the authors provided some new insights into the issue using tilapia as a model.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary

      The authors investigate the Dhh signaling pathway in Leydig cell differentiation in the tilapia model. They generated multiple mutant lines in different hedgehog pathway components and utilized a Leydig stem cell line to interrogate Leydig cell differentiation. Through this analysis, the authors demonstrate that Dhh regulates Leydig differentiation rather than survival. They also found that Ptch2 is the specific receptor that mediates signaling to promote Leydig differentiation and that Gli1 is the primary Gli involved. Furthermore, they show that a known regulator of Leydig cell development and function, SF1, is a downstream transcriptional target. Overall, the study identifies previously unknown information as to how Dhh signaling regulates Leydig cell development, which is necessary for testosterone production by the testis.

      Major Comments

      (1) In the RNAseq analysis is not clear exactly how the 33 "up-regulated" genes were identified. What was the methodology for identification of these genes? Some of the genes were down-regulated or not different in the OnGli condition and some in the OnDhh condition were not differentially expressed, as shown in Fig S8B. Therefore, it is unclear why all 33 genes are classified as upregulated "across all three conditions".

      We have clarified this methodology in the Materials and Methods section in line 452-454: “Differentially expressed genes (DEGs) were identified for each condition (TSL-OnDhh, TSL-OnGli1, TSL+SAG) compared to TSL-WT controls using edgeR (threshold: FDR < 0.05, |log2(foldchange)| ≥ 1.5). And we Added relevant information in the Results section in line 198-202: we performed transcriptomic profiling of TSL cells under conditions of pathway activation: Dhh overexpression (TSL-OnDhh), Gli1 overexpression (TSL-OnGli1), and SAG treatment (TSL+SAG). Comparative RNA-seq analysis identified a core set of 33 genes consistently upregulated across all three conditions (Fig. 5C, S6A).”

      We have also updated Fig. S8B to include a clear value and to better visualize the FPKM value levels of these 33 genes across the conditions.

      (2) In figure 4A (and possibly B), it appears that ptch RNA is in the nucleus of the cell. Why would the RNA be primarily in the nucleus? Is the RNA detection accurate? Were controls done? The methods state that sense probes were made but no how they compared to the antisense probes. This comment can also be applied to the gli FISH, particularly gli3 (Figure 5).

      This is an excellent observation. We speculate that the apparent nuclear signal may be due to strong transcriptional activity in the nucleus. To confirm the specificity of our FISH experiment, we performed FISH with sense RNA probes as negative controls for all genes (ptch1, ptch2, gli1, gli2, gli3), and no specific signals were observed (see New Fig. S9).

      Minor comments

      (1) In the introduction, please include information as to when tilapia reach sexual maturity

      We have added this information to the Introduction in line 91-92: early sexual maturity (approximately 3 months after hatching for males and 6 months after hatching for females).

      (2) When first mentioning experiments that use the PKH26 dye, please give a brief description of the dye in the text of the results. This is described in the methods but it would be helpful to have some information about what PKH26 is in the results to more easily understand the figure and experimental design.

      We have added a brief description in the Results section in line 151-152: “To dissect Leydig cell lineage impairment in dhh<sup>-/-</sup> testes, we transplanted the TSL labeled with PKH26 (a fluorescent red hydrophobic membrane dye that enables tracking of transplanted cells) into WT and dhh<sup>-/-</sup> testes (Fig. 3A).”

      (3) In the statistical analysis section of the methods, the authors state that two-tailed t-tests were performed however in the figure legends it states that ANOVA was done for some of the statistical analysis. Please clarify this.

      We have updated the Statistical Analyses section in Methods to clarify in line 472-476: “A two-tailed independent Student’s t-test was used to determine the differences between the two groups. One-way ANOVA, followed by Tukey multiple comparison, was used to determine the significance of differences in more than two groups. P < 0.05 was used as a threshold for statistically significant differences.”

      (4) Figures - in figures that have charts with the Y-axis labeled as "relative positive cells", or similar, please explain what exactly is meant by "relative". What is it relative to?

      We have revised all relevant Y-axis labels and figure legends to explicitly state the quantification method. For example, we now use: "Vasa<sup>+</sup> / DAPI<sup>+</sup> (%), Sycp3<sup>+</sup> / DAPI<sup>+</sup> (%) or Cyp11c1<sup>+</sup> / DAPI<sup>+</sup> (%).

      (5) Figure 1: please point out the testes in panels A and B

      We have indicated the position of the testes with arrows in Figures 1A and B.

      (6) In figure 4, it would be helpful for the WT images from S7 moved to fig 4.

      We have moved representative WT images from Fig. S7 into Fig. 4 for easier comparison with the mutant phenotypes.

      (7) Figure 4E: Are the yellow bars comparable to each other. Is there any significance to the increased luciferase with 8xGli in ptch2-/- as compared to the other genotypes?

      We thank the reviewer for this astute observation. Yes, the yellow bars are directly comparable, and the elevated basal luciferase activity of the 8xGli reporter in the ptch2<sup>-/-</sup> TSL cells is indeed significant and expected. The genetic ablation of ptch2 removes this inhibition, leading to ligand-independent, constitutive activation of the downstream signaling cascade. The observed increase in basal reporter activity in the ptch2<sup>-/-</sup> cells is a classic manifestation of this mechanism.

      The primary objective of this experiment was to test the cells' responsiveness to Dhh stimulation across genotypes. The key finding is that while wild-type and ptch1<sup>-/-</sup> cells showed a significant response to Dhh, the ptch2<sup>-/-</sup> cells-which already exhibited high basal activity-were completely unresponsive. This combination of constitutive activation and ligand insensitivity in the ptch2<sup>-/-</sup> genotype provides particularly strong genetic evidence that Ptch2 is the essential receptor mediating Dhh signal transduction in this system.

      (8) Figure 5G: please include what exactly what each construct name stands for in the figure legend

      We have expanded the legend for Fig. 5G to define each construct.

      (9) Figure S8B: please include what the values in the table are (eg are these the significance values?)

      We have updated the caption for Figure S8B (now Figure S6B): “The FPKM value for each gene in each sample is indicated within the squares. The color gradient from blue to red reflects low to high expression levels per row (gene).”

      Reviewer #3 (Significance):

      Strengths and limitations:

      The genetics of the tilapia system and the availability of the tilapia Leydig stem cell lines were particular strengths of this study. The study utilizes fish genetics to genetically interrogate the Dhh signaling pathway in Leydig cell development through generation and analysis of mutant lines. The tilapia Leydig stem cell line was an integral part of this study as it allowed for genetic and chemical manipulation of Dhh signaling in undifferentiated Leydig cells and, through transplantation into testes, allowed for analysis of how Leydig cell differentiation was affected.

      Advance:

      The study makes significant advances as to how Dhh signaling instructs Leydig cell differentiation, including identification of the Ptch receptor and Gli transcription factor that function downstream of Dhh in this process. Furthermore, they identify a direct link between Dhh signaling and Sf1 expression, which is known to important for Leydig cell function.

      Audience:

      This study will be of particular interest to reproductive biologists, endocrinologists, and developmental biologists. The study may also be of interest to researchers and physicians investigating cancers that are promoted by androgens produced by Leydig cells of the testis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper aims to characterize the relationship between affinity and fitness in the process of affinity maturation. To this end, the authors develop a model of germinal center reaction and a tailored statistical approach, building on recent advances in simulation-based inference. The potential impact of this work is hindered by the poor organization of the manuscript. In crucial sections, the writing style and notations are unclear and difficult to follow.

      We thank the reviewer for their kind words, and have endeavored to address all of their concerns as to the structure and style of the manuscript.

      Strengths:

      The model provides a framework for linking affinity measurements and sequence evolution and does so while accounting for the stochasticity inherent to the germinal center reaction. The model's sophistication comes at the cost of numerous parameters and leads to intractable likelihood, which are the primary challenges addressed by the authors. The approach to inference is innovative and relies on training a neural network on extensive simulations of trajectories from the model.

      Weaknesses:

      The text is challenging to follow. The descriptions of the model and the inference procedure are fragmented and repetitive. In the introduction and the methods section, the same information is often provided multiple times, at different levels of detail.

      Thank you for pointing this out. We have rearranged the methods in order to make the presentation more linear, and to reduce duplication with the introduction.

      Specifically, we moved the affinity definition to the start, removed the redundant bullet point list, and moved the parameter value table to the end.

      This organization sometimes requires the reader to move back and forth between subsections (there are multiple non-specific references to "above" and "below" in the text).

      This is a great point, we have either removed or replaced all references to "above" or "below" with more specific citations.

      The choice of some parameter values in simulations appears arbitrary and would benefit from more extensive justification. It remains unclear how the "significant uncertainty" associated with these parameters affects the results of inference.

      We have clarified where various parameter values come from:

      “In addition to the four sigmoid parameters, which we infer directly, there are other parameters in Table 1 about which we have incomplete information. The carrying capacity method and the choice of sigmoid for the response function represent fundamental model assumptions. We also fix the death rate for nonfunctional (stop) sequences, which would be very difficult to infer with the present experiment. For others, we know precise values from the replay experiment for each GC (time to sampling, # sampled cells/GC), but use a somewhat wider range for the sake of generalizability. The mutability multiplier is a heuristic factor used to match the SHM distributions to data. The naive birth rate is determined by the sigmoid parameters, but has its own range in order to facilitate efficient simulation.

      For two of the three remaining parameters (carrying capacity and initial population), we can ostensibly choose values based on the replay experiment. These values carry significant uncertainty, however, partly due to inherent experimental uncertainty, but also because they may represent different biological quantities to those in simulation. For instance, an experimental measurement of the number of B cells in a germinal center might appear to correspond closely to simulation carrying capacity. However if germinal centers are not well mixed, such that competition occurs only among nearby cells, the "effective" carrying capacity that each cell experiences could be much smaller.

      Fortunately, in addition to the neural network inference of sigmoid parameters, we have another source of information that we can use to infer non-sigmoid parameters: summary statistic distributions. We can use the matching of these distributions to effectively fit values for these additional unknown parameters. We also include the final parameter, the functional death rate, in these non-sigmoid inferred parameters, although it is unconstrained by the replay experiment, and it is unclear whether it is uniquely identifiable.”

      In addition, the performance of the inference scheme on simulated data is difficult to evaluate, as the reported distributions of loss function values are not very informative.

      We thought of two different interpretions for this comment, so have worked to address both.

      First, the comment could have been that the distribution of loss functions on the training sample does not appear to be informative of performance on data-like samples. This is true, and in our revision we have emphasized the distinction between the two types of simulation sample: those for training, where each simulated GC has different (sampled) parameter values; vs the "data mimic" samples where all GCs have identical parameters. Since the former have different values for each GC, we can only plot many inferred curves together on the latter. We also would like to emphasize that the inference problem for one GC will have much more uncertainty than will that for an ensemble of GCs (as in the full replay experiment).

      “After building and training our neural network, we evaluate its performance on subsets of the training sample. While this evaluation provides an important baseline and sanity check, it is important to note that the training sample differs dramatically from real data (and the “data mimic” simulation sample that mimics real data). While real data consists of 119 GCs with identical parameters and thus response functions, we need the GCs in our training sample to span the space of all plausible parameter values. This means that while we must evaluate performance on individual GCs in the training and testing samples, in real data (and data mimic simulation) we combine results from 119 curves into a central (medoid) curve. Inference on the training sample will thus appear vastly noisier than on real data and data mimic simulation, and also cannot be plotted with all true and inferred curves together.”

      A second interpretation was that the reviewer did not have an intuitive sense of what a loss function value of, say, 1.0 actually means. To address this second interpretation, we have also added a supplement to Figure 2 with several example true and inferred response functions from the training sample, with representative loss values spanning 0.17 to 2.18. We have also added the following clarification to the caption of Figure 1-figure supplement 2:

      “The loss value is thus the fraction of the area under the true curve represented by the area between the true and inferred curves.”

      Finally, the discussion of the similarities and differences with an alternative approach to this inference problem, presented in Dewitt et al. (2025), is incomplete.

      We have expanded this section of the manuscript, and added a new plot directly comparing the methods.

      “In order to compare more directly to DeWitt et al. 2025, we remade their Fig.S6D, truncating to values at which affinities are actually observed in the bulk data, and using only three of the seven timepoints (11, 20, and 70, Figure 8, left). We then simulated 25 GCs with central data mimic parameters out to 70 days. For each such GC, we found the time point with mean affinity over living cells closest to each of three specific “target” affinity values (0.1, 1.0, 2.0) corresponding to the mean affinity of the bulk data at timepoints 11, 20, and 70. We then plot the effective birth rates of all living cells vs relative affinity (subtracting mean affinity) at the resulting GC-specific timepoints for all 25 GCs together Figure 8, right). Note that because each GC evolves at very different and time-dependent rates, we could not simply use the timepoints from the bulk data, since each GC slice from our simulation would then have very different mean affinity. The mean over GCs of these GC-specific chosen times is 10.9, 24.5, 44.4 (compared to the original bulk data time points 11, 20, 70). It is important to note that while the first two target affinities (0.1 and 1.0) are within the affinity ranges encountered in the extracted GC data, the third value (2.0) is far beyond them, and thus represents extrapolation to an affinity regime informed more by our underlying model than by the real data on which we fit it.”

      Reviewer #2 (Public review):

      Summary:

      This paper presents a new approach for explicitly transforming B-cell receptor affinity into evolutionary fitness in the germinal center. It demonstrates the feasibility of using likelihood-free inference to study this problem and demonstrates how effective birth rates appear to vary with affinity in real-world data.

      Strengths:

      (1) The authors leverage the unique data they have generated for a separate project to provide novel insights into a fundamental question. (2) The paper is clearly written, with accessible methods and a straightforward discussion of the limits of this model. (3) Code and data are publicly available and well documented.

      Weaknesses (minor):

      (1) Lines 444-446: I think that "affinity ceiling" and "fitness ceiling" should be considered independent concepts. The former, as the authors ably explain, is a physical limitation. This wouldn't necessarily correspond to a fitness ceiling, though, as Figure 7 shows. Conversely, the model developed here would allow for a fitness ceiling even if the physical limit doesn't exist.

      Right, whoops, good point. We've rearranged the discussion to separate the concepts, for instance:

      “While affinity and fitness ceilings are separate concepts, they are closely related. An affinity ceiling is a limit to affinity for a given antigen: there are no mutations that can improve affinity beyond this level. This would result in a truncated response function, undefined beyond the affinity ceiling. A fitness ceiling, on the other hand, is an upper asymptote on the response function. Such a ceiling would result in a limit on affinity for a germinal center reaction, since once cells are well into the upper asymptote of fitness they are no longer subject to selective pressure.”

      (2) Lines 566-569: I would like to see this caveat fleshed out more and perhaps mentioned earlier in the paper. While relative affinity is far more important, it is not at all clear to me that absolute affinity can be totally ignored in modeling GC behavior.

      This is a great point, we've added a mention of this where we introduce the replay experiment in the Methods:

      “It is important to note that this is a much lower level than typical BCR repertoires, which average roughly 5-10% nucleotide shm.”

      And expanded on the explanation in the Discussion:

      “Some aspects of behavior in the low-shm/early times regime of the extracted GC data are also potentially different to those at the higher shm levels and longer times found in typical repertoires. This is especially relevant to affinity or fitness ceilings, to which we likely have little sensitivity with the current data.”

      (3) One other limitation that is worth mentioning, though beyond the scope of the current work to fully address: the evolution of the repertoire is also strongly shaped by competition from circulating antibodies. (Eg: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3600904/, http://www.sciencedirect.com/science/article/pii/S1931312820303978). This is irrelevant for the replay experiment modeled here, but still an important factor in general repertoires.

      Yes good point, we've added these citations in a new paragraph on between-lineage competition:

      “We also neglect competition among lineages stemming from different rearrangement events (different clonal families), instead assuming that each GC is seeded with instances of only a single naive sequence, and that neither cells nor antibodies migrate between different GCs. More realistically for the polyclonal GC case, we would allow lineages stemming from different naive sequences to compete with each other both within and between GCs (Zhang et al. 2013: McNamara et al. 2020; Barbulescu et al. 2025). Implementing competition among several clonal families within a single GC would be conceptually simple and computationally practical in our current software framework. Competition among many GCs, however, would be computationally prohibitive because our time required is primarily determined by the total population size, since at each step we must iterate over every node and every event type in order to find the shortest waiting time. For the monoclonal replay experiment specifically, however, all naive sequences are the same and so the current modeling framework is sufficient.”

      Recommendations for the authors:

      Reviewing Editor Comments:

      The authors are encouraged to follow the suggestions of manuscript re-organization by Reviewer 1, in order to improve readability. We would also like to suggest improving the discussion of the traveling wave model to explain it in a more self-contained way. In passing, please clarify what is meant by 'steady-state' in that model. A superficial understanding would suggest that the only steady state in that model would be a homogeneous population of antibodies with maximum affinity/fitness.

      These are great suggestions. We have substantially rearranged the text according to Reviewer 1's suggestions, especially the Methods, and expanded on and rearranged the traveling wave discussion. We've also clarified throughout that the traveling wave model is assuming steady state with respect to population. In the public response to reviewer 1 above we describe these changes in more detail.

      Reviewer #1 (Recommendations for the authors):

      I suggest that the organization of the paper be reconsidered. The current methods section is long and at times repetitive, making it impossible to parse in a single reading. Moving some technical details from the main text to an appendix could improve readability. Despite the length of the methods section, many important points, such as justification of choices in model specification or values of parameters, are treated only briefly.

      We have rearranged the methods section, particularly the discussion of our model, and have more clearly justified choices of parameter values as described in the public response.

      Discussion of similarities and differences with reference to Dewitt et al. 2025 should be revised, as it's currently unclear whether the method presented here has any advantages.

      We have expanded this comparison, and emphasized the main disadvantage of the traveling wave approach: there is no way of knowing whether by abstracting away so much biological detail it misses important effects. We have also emphasized that the two approaches use different types of data (time series vs endpoint) which are typically not simultaneously available:

      “The clear advantage of the traveling wave model is its simplicity: if its high level view is accurate enough to effectively model the relevant GC dynamics, it is far more tractable. But reproducing low-level biological detail, and making high-dimensional real data comparisons (e.g. Figure 5) to iteratively improve model fidelity, are also useful, providing direct evidence that we are correctly modeling the underlying biological processes. The two approaches also utilize different types of data: we use a single time point, and thus must reconstruct evolutionary history; whereas the traveling wave requires a series of timepoints. The availability of both types of data is a unique feature of the replay experiment, and provides us with the opportunity to directly compare the approaches.”

      The results obtained from the same data should be directly compared (can the response function be directly compared to the result in Figure S6D in Dewitt et al., 2025? If yes, it should be re-plotted here and compared/superimposed with Figures 6 and 7). The text mentions the results differ, but it remains ambiguous whether the differences are significant and what their implications are.

      We've added a new Figure 8, comparing a modified version of the traveling wave Fig S6D to a new plot derived from our results using the data mimic parameters. While the two plots represent fundamentally different quantities, they do put the results of the two methods on an approximately equal footing and we see nice concordance between them in regions with significant data (they disagree substantially for larger negative affinities). We have also added emphasis to the point that the traveling wave model uses an entirely separate dataset to what we use here.

      Other comments:

      (1) l. 80: "[in] around 10 days"?

      Text rearranged so this phrase no longer appears.

      (2) l. 96: "an intrinsic rate [given by?] the response function above".

      Text rearranged so this phrase no longer appears.

      (3) Figure 1: The. “specific model” could part be expanded and improved to help make sense of model parameters and the order of different processes in the population model. Example values of parameters can be plotted rather than loosely described, (e.g., y_h+y_c, the upper asymptotes can be plotted in place of the “yscale determines upper asymptotes” label.

      Great suggestion, we've changed the labels.

      (4) The cartoons in the other parts are somewhat cryptic or illegible due to small sizes.

      We have added text in the caption linking to the figures that are, in the figure, intended to be in schematic form only.

      “Plots from elsewhere in the manuscript are rendered in schematic form: those in “infer on data” refer to Figure 4-figure supplement 1, and those in “simulate with inferred parameters” to Figure 5.

      (5) L. 137: It's not helpful to give numerical values before the definition of affinity. (and these numbers are repeated later).

      Good point, we've moved the affinity definition to the previous section, and remove the duplicate range information.

      (6): Table 1: A number of notations are unclear, such as “#seqs/GC” or “mutability multiplier”. The double notation for crucial parameters doesn't help. At the moment the table is introduced, the columns make little sense to the reader, and it's not well specified what dictates the choice or changes of parameter values or ranges.

      We've moved the table further down until after the parameters have been introduced, and clarified the indicated names.

      (7) l. 147: Choices of model are not justified and appear arbitrary (e.g., why death events happen at one of two rate).

      We have clarified the reasoning behind having two death rates.

      (8) l.151: “happened on the edges of developing phylogenetic tree” - ambiguous: do they accumulate at cell divisions? What is a “developing tree”?

      We have removed this ambiguous phrasing.

      (9) l.161: This paragraph is particularly dense.

      We have rearranged this section of the methods, and split up this paragraph.

      (10) l. 164: All the different response functions for different event types? Or only the one for birth, as stated before?

      Yes. This has been clarified.

      (11) l.167: Does the statement in the bracket refer to a unit?

      This has been clarified.

      (12) l. 169: Discussion of the implementation seems too detailed.

      Hopefully the rearranged description is clearer, but we worry that removing the details of events selection would leave some readers confused.

      (13) l. 186: Why describe the methods that, in the end, were not used? Similarly, as a mention of “variety of response functions” seems out of place if only one choice is used throughout the paper. eq. (2): that's mˆ{-1} from eq. (1). Having the two equations using the same notation is confusing.

      We've moved the mention of alternatives to the Discussion, where it is an important source of uncontrolled systematic uncertainty, and removed the extra equation.

      (14) l. 206: Unclear what “thus” refers to.

      Removed.

      (15) l.211: What does “neglecting y_h” mean?

      This has been clarified.

      (16) l. 242: Unclear what “this” refers to.

      Clarified.

      (17) l. 261: What does “model independence” refer to in this context?

      From the sigmoid model. Clarified.

      (18) l. 306: What values for which parameters? References?

      We have clarified and updated this statement - it was out of date, corresponding to the analysis before we started fitting non-sigmoid parameters.

      “In addition to the four sigmoid parameters, which we infer directly, there are other parameters in Table 1 about which we have incomplete information. The carrying capacity method and the choice of sigmoid for the response function represent fundamental model assumptions. We also fix the death rate for nonfunctional (stop) sequences, which would be very difficult to infer with the present experiment. For others, we know precise values from the replay experiment for each GC (time to sampling, # sampled cells/GC), but use a somewhat wider range for the sake of generalizability. The mutability multiplier is a heuristic factor used to match the SHM distributions to data. The naive birth rate is determined by the sigmoid parameters, but has its own range in order to facilitate efficient simulation.

      For two of the three remaining parameters (carrying capacity and initial population), we can ostensibly choose values based on the replay experiment. These values carry significant uncertainty, however, partly due to inherent experimental uncertainty, but also because they may represent different biological quantities to those in simulation. For instance, an experimental measurement of the number of B cells in a germinal center might appear to correspond closely to simulation carrying capacity. However if germinal centers are not well mixed, such that competition occurs only among nearby cells, the "effective" carrying capacity that each cell experiences could be much smaller.

      Fortunately, in addition to the neural network inference of sigmoid parameters, we have another source of information that we can use to infer non-sigmoid parameters: summary statistic distributions. We can use the matching of these distributions to effectively fit values for these additional unknown parameters. We also include the final parameter, the functional death rate, in these non-sigmoid inferred parameters, although it is unconstrained by the replay experiment, and it is unclear whether it is uniquely identifiable.”

      (19) l. 326: "is interpreted as having" or "corresponds to"?

      Changed.

      (20) l. 340: Not sure what "encompassing" means in this context.

      Clarified.

      (21) l. 341: "We do this..." -- I think this sentence is not grammatical.

      Fixed.

      (22) l. 348: "on simulation" -- "from simulated data"?

      Indeed.

      (23) l. 351: "top rows", the figures only have one row.

      Fixed.

      (24) Figure 2: It's difficult to tell from the loss function itself whether inference on simulated data works well. Why not report the simulated and inferred response functions? The equivalent plots in Figure 5 would also be informative. Has inference been tested for different "sigmoid parameters" values?

      This is an important point that was not clear, thanks for bringing it up. We have expanded on and emphasized the differences between these samples and the reasoning behind their different evaluation choices. Briefly, we can't display true vs inferred response functions on the training samples since the curves for each GC are different -- the plot would be entirely filled in with very different response function shapes. This is why we do actual performance evaluation on the "data mimic" samples, where all GCs have the same parameters. Summary stats (like Fig 5) for the training sample are in Fig 5 Supplement 2.

      (25) l. 354: Unclear what "this" refers to.

      Removed.

      (26) l. 355: We assume the parameters are the same?

      Yes, we assume all data GCs have the same parameters. We have added emphasis of this point.

      (27) Figure 4: Is "lambda" the fitness? Should be typeset as \lambda_i?

      Our convention is to add the subscript when evaluating fitness on individual cells, but to omit it, as here, when plotting the response function as a whole.

      (28) l. 412: "[a] carrying capacity constraint".

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      (1) In 2 places, you state that observed affinity ranged from -37 to 3, but I assume that the lower bound should be -3.7.

      The -37 was actually correct, but we had mistakenly missed updating it when we switched to the latest (current) version of the affinity model. We have updated the values, although these don't really have any effect on the model since we only infer within bounds in which we have a lot of points:

      “Affinity is ∅ for the initial unmutated sequence, and ranges from -12.2 to 3.5 in observed sequences, with a mean median of -0.3 (0.3).

      (2). I had to look up the Vols nicker paper to understand the tree encoding: It would be nice to spend another sentence or two on it here for those who aren't familiar.

      Great point, we have added the following:

      “We encode each tree with an approach similar to Lambert et al. (2023) and Thompson et al. (2024), most closely following the compact bijective ladderized vector (CBLV) approach from Voznica et al. (2022). The CBLV method first ladderizes the tree by rotating each subtree such that, roughly speaking, longer branches end up toward the left. This does not modify the tree, but rather allows iteration over nodes in a defined, repeatable way, called inorder iteration. To generate the matrix, we traverse the ladderized tree in order, calculating a distance to associate with each node. For internal nodes, this is the distance to root, whereas for leaf nodes it is the distance to the most-recently-visited internal node (Voznica et al., 2022, Fig. 2). Distances corresponding to leaf nodes are arranged in the first row of the matrix, while those from internal nodes form the second row.”

      (3) On line 351, you refer to the "top rows of Figure 2 and Figure 3," but each only has one row in the current version. I think it should now be "left panel.".

      Fixed.

      (4) How many vertical dashed lines are in the left panel of the bottom row of Figure 7? I think it's more than one, but can't tell if it is two or three...

      Nice catch! There were actually three. We've shortened them and added a white outline to clarify overlapping lines.

      (5) Would the model be applicable to GCs with multiple naive founders of different affinities? Or would more/different parameters be needed to account for that?

      The model would be applicable, but since the time required for our simulation scales roughly with the total simulated population size, we could probably only handle competition among at most a couple of GCs. Some sort of "migration strength" parameter would be required for competition among GCs (or within one GC if we don't want to assume it's well-mixed), but that doesn't seem a terrible impediment. We've added the following:

      “We also neglect competition among lineages stemming from different rearrangement events (different clonal families), instead assuming that each GC is seeded with instances of only a single naive sequence, and that neither cells nor antibodies migrate between different GCs. More realistically for the polyclonal GC case, we would allow lineages stemming from different naive sequences to compete with each other both within and between GCs (Zhang et al. 2013; McNamara et al. 2020; Barbulescu et al. 2025). Implementing competition among several clonal families within a single GC would be conceptually simple and computationally practical in our current software framework. Competition among many GCs, however, would be computationally prohibitive because our time required is primarily determined by the total population size, since at each step we must iterate over every node and every event type in order to find the shortest waiting time. For the monoclonal replay experiment specifically, however, all naive sequences are the same and so the current modeling framework is sufficient.”

    1. Author response:

      The following is the authors’ response to the current reviews.

      I thank the authors for their clarifications. The manuscript is much improved now, in my opinion. The new power spectral density plots and revised Figure 1 are much appreciated. However, there is one remaining point that I am unclear about. In the rebuttal, the authors state the following: "To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated." 

      I am very confused by this statement, because both Fig. 7B and Suppl. Fig. 1B show that the visual- (i.e., visual target presented alone) has a lower accuracy and longer reaction time than visual+ (i.e., visual target presented with distractor). In fact, Suppl. Fig. 1B legend states the following: "accuracy: auditory- - auditory+: M = 7.2 %; SD = 7.5; p = .001; t(25) = 4.9; visual- - visual+: M = -7.6%; SD = 10.80; p < .01; t(25) = -3.59; Reaction time: auditory- - auditory +: M = -20.64 ms; SD = 57.6; n.s.: p = .08; t(25) = -1.83; visual- - visual+: M = 60.1 ms ; SD = 58.52; p < .001; t(25) = 5.23)." 

      These statements appear to directly contradict each other. I appreciate that the difficulty of auditory and visual trials in block 2 of MEG experiments are matched, but this does not address the question of whether the distractor was actually distracting (and thus needed to be inhibited by occipital alpha). Please clarify.

      We apologize for mixing up the visual and auditory distractor cost in our rebuttal. The reviewer is right in that our two statements contradict each other.

      To clarify: In the EEG experiment, we see significant distractor cost for auditory distractors in the accuracy (which can be seen in SUPPL Fig. 1A). We also see a faster reaction time with auditory distractors, which may speak to intersensory facilitation. As we used the same distractors for both experiments, it can be assumed that they were distracting in both experiments.

      In our follow-up MEG-experiment, as the reviewer stated, performance in block 2 was higher than in block 1, even though there were distractors present. In this experiment, distractor cost and learning effects are difficult to disentangle. It is possible that participants improved over time for the visual discrimination task in Block 1, as performance at the beginning was quite low. To illustrate this, we divided the trials of each condition into bins of 10 and plotted the mean accuracy in these bins over time (see Author response image 1). Here it can be seen that in Block 2, there is a more or less stable performance over time with a variation < 10 %. In Block 1, both for visual as well as auditory trials, an improvement over time can be seen. This is especially strong for visual trials, which span a difference of > 20%. Note that the mean performance for the 80-90 trial bin was higher than any mean performance observed in Block 2. 

      Additionally, the same paradigm has been applied in previous investigations, which also found distractor costs for the here-used auditory stimuli in blocked and non-blocked designs. See:

      Mazaheri, A., van Schouwenburg, M. R., Dimitrijevic, A., Denys, D., Cools, R., & Jensen, O. (2014). Region-specific modulations in oscillatory alpha activity serve to facilitate processing in the visual and auditory modalities. NeuroImage, 87, 356–362. https://doi.org/10.1016/j.neuroimage.2013.10.052

      Van Diepen, R & Mazaheri, A 2017, 'Cross-sensory modulation of alpha oscillatory activity: suppression, idling and default resource allocation', European Journal of Neuroscience, vol. 45, no. 11, pp. 1431-1438. https://doi.org/10.1111/ejn.13570

      Author response image 1.

      Accuracy development over time in the MEG experiment. During block 1, a performance increase over time can be observed for visual as well as for auditory stimuli. During Block 2, performance is stable over time. Data are presented as mean ± SEM. N = 27 (one participant was excluded from this analysis, as their trial count in at least one condition was below 90 trials).


      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      In this study, Brickwedde et al. leveraged a cross-modal task where visual cues indicated whether upcoming targets required visual or auditory discrimination. Visual and auditory targets were paired with auditory and visual distractors, respectively. The authors found that during the cue-to-target interval, posterior alpha activity increased along with auditory and visual frequency-tagged activity when subjects were anticipating auditory targets. The authors conclude that their results disprove the alpha inhibition hypothesis, and instead implies that alpha "regulates downstream information transfer." However, as I detail below, I do not think the presented data irrefutably disproves the alpha inhibition hypothesis. Moreover, the evidence for the alternative hypothesis of alpha as an orchestrator for downstream signal transmission is weak. Their data serves to refute only the most extreme and physiologically implausible version of the alpha inhibition hypothesis, which assumes that alpha completely disengages the entire brain area, inhibiting all neuronal activity.

      We thank the reviewer for taking the time to provide additional feedback and suggestions and we improved our manuscript accordingly.

      (1) Authors assign specific meanings to specific frequencies (8-12 Hz alpha, 4 Hz intermodulation frequency, 36 Hz visual tagging activity, 40 Hz auditory tagging activity), but the results show that spectral power increases in all of these frequencies towards the end of the cue-to-target interval. This result is consistent with a broadband increase, which could simply be due to additional attention required when anticipating auditory target (since behavioral performance was lower with auditory targets, we can say auditory discrimination was more difficult). To rule this out, authors will need to show a power spectral density curve with specific increases around each frequency band of interest. In addition, it would be more convincing if there was a bump in the alpha band, and distinct bumps for 4 vs 36 vs 40 Hz band.

      This is an interesting point with several aspects, which we will address separately

      Broadband Increase vs. Frequency-Specific Effects:

      The suggestion that the observed spectral power increases may reflect a broadband effect rather than frequency-specific tagging is important. However, Supplementary Figure 11 shows no difference between expecting an auditory or visual target at 44 Hz. This demonstrates that (1) there is no uniform increase across all frequencies, and (2) the separation between our stimulation frequencies was sufficient to allow differentiation using our method.

      Task Difficulty and Performance Differences:

      The reviewer suggests that the observed effects may be due to differences in task difficulty, citing lower performance when anticipating auditory targets in the EEG study. This issue was explicitly addressed in our follow-up MEG study, where stimulus difficulty was calibrated. In the second block—used for analysis—accuracy between auditory and visual targets was matched (see Fig. 7B). The replication of our findings under these controlled conditions directly rules out task difficulty as the sole explanation. This point is clearly presented in the manuscript.

      Power Spectrum Analysis:

      The reviewer’s suggestion that our analysis lacks evidence of frequency-specific effects is addressed directly in the manuscript. While we initially used the Hilbert method to track the time course of power fluctuations, we also included spectral analyses to confirm distinct peaks at the stimulation frequencies. Specifically, when averaging over the alpha cluster, we observed a significant difference at 10 Hz between auditory and visual target expectation, with no significant differences at 36 or 40 Hz in that cluster. Conversely, in the sensor cluster showing significant 36 Hz activity, alpha power did not differ, but both 36 Hz and 40 Hz tagging frequencies showed significant effects These findings clearly demonstrate frequency-specific modulation and are already presented in the manuscript.

      (2) For visual target discrimination, behavioral performance with and without the distractor is not statistically different. Moreover, the reaction time is faster with distractor. Is there any evidence that the added auditory signal was actually distracting?

      We appreciate the reviewer’s observation regarding the lack of a statistically significant difference in behavioral performance for visual target discrimination with and without the auditory distractor. While this was indeed the case in our EEG experiment, we believe the absence of an accuracy effect may be attributable to a ceiling effect, as overall visual performance approached 100%. This high baseline likely masked any subtle influence of the distractor.

      To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated.

      Regarding the faster reaction times observed in the presence of the auditory distractor, this phenomenon is consistent with prior findings on intersensory facilitation. Auditory stimuli, which are processed more rapidly than visual stimuli, can enhance response speed to visual targets—even when the auditory input is non-informative or nominally distracting (Nickerson, 1973; Diederich & Colonius, 2008; Salagovic & Leonard, 2021). Thus, while the auditory signal may facilitate motor responses, it can simultaneously impair perceptual accuracy, depending on task demands and baseline performance levels.

      Taken together, our data suggest that the auditory signal does exert a distracting influence, particularly under conditions where visual performance is not at ceiling. The dual effect—facilitated reaction time but reduced accuracy—highlights the complexity of multisensory interactions and underscores the importance of considering both behavioral and neurophysiological measures.

      (3) It is possible that alpha does suppress task-irrelevant stimuli, but only when it is distracting. In other words, perhaps alpha only suppresses distractors that are presented simultaneously with the target. Since the authors did not test this, they cannot irrefutably reject the alpha inhibition hypothesis.

      The reviewer’s claim that we did not test whether alpha suppresses distractors presented simultaneously with the target is incorrect. As stated in the manuscript and supported by our data (see point 2), auditory distractors were indeed presented concurrently with visual targets, and they were demonstrably distracting. Therefore, the scenario the reviewer suggests was not only tested—it forms a core part of our design.

      Furthermore, it was never our intention to irrefutably reject the alpha inhibition hypothesis. Rather, our aim was to revise and expand it. If our phrasing implied otherwise, we have now clarified this in the manuscript. Specifically, we propose that alpha oscillations:

      (a) Exhibit cyclic inhibitory and excitatory dynamics;

      (b) Regulate processing by modulating transfer pathways, which can result in either inhibition or facilitation depending on the network context.

      In our study, we did not observe suppression of distractor transfer, likely due to the engagement of a supramodal system that enhances both auditory and visual excitability. This interpretation is supported by prior findings (e.g., Jacoby et al., 2012), which show increased visual SSEPs under auditory task load, and by Zhigalov et al. (2020), who found no trial-by-trial correlation between alpha power and visual tagging in early visual areas, despite a general association with attention.

      Recent evidence (Clausner et al., 2024; Yang et al., 2024) further supports the notion that alpha oscillations serve multiple functional roles depending on the network involved. These roles include intra- and inter-cortical signal transmission, distractor inhibition, and enhancement of downstream processing (Scheeringa et al., 2012; Bastos et al., 2015; Zumer et al., 2014). We believe the most plausible account is that alpha oscillations support both functions, depending on context.

      To reflect this more clearly, we have updated Figure 1 to present a broader signal-transfer framework for alpha oscillations, beyond the specific scenario tested in this study.

      We have now revised Figure 1 and several sentences in the introduction and discussion, to clarify this argument.

      L35-37: Previous research gave rise to the prominent alpha inhibition hypothesis, which suggests that oscillatory activity in the alpha range (~10 Hz) plays a mechanistic role in selective attention through functional inhibition of irrelevant cortical areas (see Fig. 1; Foxe et al., 1998; Jensen & Mazaheri, 2010; Klimesch et al., 2007).

      L60-65: In contrast, we propose that functional and inhibitory effects of alpha modulation, such as distractor inhibition, are exhibited through blocking or facilitating signal transmission to higher order areas (Peylo et al., 2021; Yang et al., 2023; Zhigalov & Jensen, 2020; Zumer et al., 2014), gating feedforward or feedback communication between sensory areas (see Fig. 1; Bauer et al., 2020; Haegens et al., 2015; Uemura et al., 2021).

      L482-485: This suggests that responsiveness of the visual stream was not inhibited when attention was directed to auditory processing and was not inhibited by occipital alpha activity, which directly contradicts the proposed mechanism behind the alpha inhibition hypothesis.

      L517-519: Top-down cued changes in alpha power have now been widely viewed to play a functional role in directing attention: the processing of irrelevant information is attenuated by increasing alpha power in areas involved with processing this information (Foxe, Simpson, & Ahlfors, 1998; Hanslmayr et al., 2007; Jensen & Mazaheri, 2010).

      L566-569: As such, it is conceivable that alpha oscillations can in some cases inhibit local transmission, while in other cases, depending on network location, connectivity and demand, alpha oscillation can facilitate signal transmission. This mechanism allows to increase transmission of relevant information and to block transmission of distractors.

      (4) In the abstract and Figure 1, the authors claim an alternative function for alpha oscillations; that alpha "orchestrates signal transmission to later stages of the processing stream." In support, the authors cite their result showing that increased alpha activity originating from early visual cortex is related to enhanced visual processing in higher visual areas and association areas. This does not constitute a strong support for the alternative hypothesis. The correlation between posterior alpha power and frequency-tagged activity was not specific in any way; Fig. 10 shows that the correlation appeared on both 1) anticipating-auditory and anticipating-visual trials, 2) the visual tagged frequency and the auditory tagged activity, and 3) was not specific to the visual processing stream. Thus, the data is more parsimonious with a correlation than a causal relationship between posterior alpha and visual processing.

      Again, the reviewer raises important points, which we want to address

      The correlation between posterior alpha power and frequency-tagged activity was not specific, as it is present both when auditory and visual targets are expected:

      If there is a connection between posterior alpha activity and higher-order visual information transfer, then it can be expected that this relationship remains across conditions and that a higher alpha activity is accompanied by higher frequency-tagged activity, both over trials and over conditions. However, it is possible that when alpha activity is lower, such as when expecting a visual target, the signal-to-noise ratio is affected, which may lead to higher difficulty to find a correlation effect in the data when using non-invasive measurements.

      The connection between alpha activity and frequency-tagged activity appears both for auditory as well as visual stimuli and The correlation is not specific to the visual processing stream:

      While we do see differences between conditions (e.g. in the EEG-analysis, mostly 36 Hz correlated with alpha activity and only in one condition 40 Hz showed a correlation as well), it is true that in our MEG analysis, we found correlations both between alpha activity and 36 Hz as well as alpha activity and 40 Hz.  

      We acknowledge that when analysing frequency-tagged activity on a trial-by-trial basis, where removal of non-timelocked activity through averaging (which we did when we tested for condition differences in Fig. 4 and 9) is not possible, there is uncertainty in the data. Baseline-correction can alleviate this issue, but it cannot offset the possibility of non-specific effects. We therefore decided to repeat the analysis with a fast-fourier calculated power instead of the Hilbert power, in favour of a higher and stricter frequency-resolution, as we averaged over a time-period and thus, the time-domain was not relevant for this analysis. In this more conservative analysis, we can see that only 36 Hz tagged activity when expecting an auditory target correlated with early visual alpha activity.

      Additionally, we added correlation analyses between alpha activity and frequency-tagged activity within early visual areas, using the sensor cluster which showed significant condition differences in alpha activity. Here, no correlations between frequency-tagged activity and alpha activity could be found (apart from a small correlation with 40 Hz which could not be confirmed by a median split; see SUPPL Fig. 14 C). The absence of a significant correlation between early visual alpha and frequency-tagged activity has previously been described by others (Zhigalov & Jensen, 2020) and a Bayes factor of below 1 also indicated that the alternative hypotheses is unlikely.

      Nonetheless, a correlation with auditory signal is possible and could be explained in different ways. For example, it could be that very early auditory feedback in early visual cortex (see for example Brang et al., 2022) is transmitted alongside visual information to higher-order areas. Several studies have shown that alpha activity and visual as well as auditory processing are closely linked together (Bauer et al., 2020; Popov et al., 2023). Inference on whether or how this link could play out in the case of this manuscript expands beyond the scope of this study.

      To summarize, we believe the fact that 36 Hz activity within early visual areas does not correlate with alpha activity on a trial-by-trial basis, but that 36 Hz activity in other areas does, provides strong evidence that alpha activity affects down-stream signal processing.

      We mention this analysis now in our discussion:

      L533-536: Our data provides evidence in favour of this view, as we can show that early sensory alpha activity does not covary over trials with SSEP magnitude in early visual areas, but covaries instead over trials with SSEP magnitude in higher order sensory areas (see also SUPPL. Fig. 14).

      Reviewer #1 (Recommendations for the authors):

      The evidence for the alternative hypothesis, that alpha in early sensory areas orchestrates downstream signal transmission, is not strong enough to be described up front in the abstract and Figure 1. I would leave it in the Discussion section, but advise against mentioning it in the abstract and Figure 1.

      We appreciate the reviewer’s concern regarding the inclusion of the alternative hypothesis—that alpha activity in early sensory areas orchestrates downstream signal transmission—in the abstract and Figure 1. While we agree that this interpretation is still developing, recent studies (Keitel et al., 2025; Clausner et al., 2024; Yang et al., 2024) provide growing support for this framework.

      In response, we have revised the introduction, discussion, and Figure 1 to clarify that our intention is not to outright dismiss the alpha inhibition hypothesis, but to refine and expand it in light of new data. This revision does not invalidate the prior literature on alpha timing and inhibition; rather, it proposes an updated mechanism that may better account for observed effects.

      We have though retained Figure 1, as it visually contextualizes the broader theoretical landscape. while at the same time added further analyses to strengthen our empirical support for this emerging view.

      References:

      Bastos, A. M., Litvak, V., Moran, R., Bosman, C. A., Fries, P., & Friston, K. J. (2015). A DCM study of spectral asymmetries in feedforward and feedback connections between visual areas V1 and V4 in the monkey. NeuroImage, 108, 460–475. https://doi.org/10.1016/j.neuroimage.2014.12.081

      Bauer, A. R., Debener, S., & Nobre, A. C. (2020). Synchronisation of Neural Oscillations and Cross-modal Influences. Trends in cognitive sciences, 24(6), 481–495. https://doi.org/10.1016/j.tics.2020.03.003

      Brang, D., Plass, J., Sherman, A., Stacey, W. C., Wasade, V. S., Grabowecky, M., Ahn, E., Towle, V. L., Tao, J. X., Wu, S., Issa, N. P., & Suzuki, S. (2022). Visual cortex responds to sound onset and offset during passive listening. Journal of neurophysiology, 127(6), 1547–1563. https://doi.org/10.1152/jn.00164.2021

      Clausner T., Marques J., Scheeringa R. & Bonnefond M (2024). Feature specific neuronal oscillations in cortical layers BioRxiv :2024.07.31.605816. https://doi.org/10.1101/2024.07.31.605816

      Diederich, A., & Colonius, H. (2008). When a high-intensity "distractor" is better then a low-intensity one: modeling the effect of an auditory or tactile nontarget stimulus on visual saccadic reaction time. Brain research, 1242, 219–230. https://doi.org/10.1016/j.brainres.2008.05.081

      Haegens, S., Nácher, V., Luna, R., Romo, R., & Jensen, O. (2011). α-Oscillations in the monkey sensorimotor network influence discrimination performance by rhythmical inhibition of neuronal spiking. Proceedings of the National Academy of Sciences of the United States of America, 108(48), 19377–19382. https://doi.org/10.1073/pnas.1117190108

      Jacoby, O., Hall, S. E., & Mattingley, J. B. (2012). A crossmodal crossover: opposite effects of visual and auditory perceptual load on steady-state evoked potentials to irrelevant visual stimuli. NeuroImage, 61(4), 1050–1058. https://doi.org/10.1016/j.neuroimage.2012.03.040

      Keitel, A., Keitel, C., Alavash, M., Bakardjian, K., Benwell, C. S. Y., Bouton, S., Busch, N. A., Criscuolo, A., Doelling, K. B., Dugue, L., Grabot, L., Gross, J., Hanslmayr, S., Klatt, L.-I., Kluger, D. S., Learmonth, G., London, R. E., Lubinus, C., Martin, A. E., … Kotz, S. A. (2025). Brain rhythms in cognition – controversies and future directions. ArXiv. https://doi.org/10.48550/arXiv.2507.15639

      Nickerson R. S. (1973). Intersensory facilitation of reaction time: energy summation or preparation enhancement?. Psychological review, 80(6), 489–509. https://doi.org/10.1037/h0035437

      Popov, T., Gips, B., Weisz, N., & Jensen, O. (2023). Brain areas associated with visual spatial attention display topographic organization during auditory spatial attention. Cerebral cortex (New York, N.Y. : 1991), 33(7), 3478–3489. https://doi.org/10.1093/cercor/bhac285

      Salagovic, C. A., & Leonard, C. J. (2021). A nonspatial sound modulates processing of visual distractors in a flanker task. Attention, perception & psychophysics, 83(2), 800–809. https://doi.org/10.3758/s13414-020-02161-5

      Scheeringa, R., Petersson, K. M., Kleinschmidt, A., Jensen, O., & Bastiaansen, M. C. (2012). EEG α power modulation of fMRI resting-state connectivity. Brain connectivity, 2(5), 254–264. https://doi.org/10.1089/brain.2012.0088

      Spaak, E., Bonnefond, M., Maier, A., Leopold, D. A., & Jensen, O. (2012). Layer-specific entrainment of γ-band neural activity by the α rhythm in monkey visual cortex. Current biology : CB, 22(24), 2313–2318. https://doi.org/10.1016/j.cub.2012.10.020

      Yang, X., Fiebelkorn, I. C., Jensen, O., Knight, R. T., & Kastner, S. (2024). Differential neural mechanisms underlie cortical gating of visual spatial attention mediated by alpha-band oscillations. Proceedings of the National Academy of Sciences of the United States of America, 121(45), e2313304121. https://doi.org/10.1073/pnas.2313304121

      Zhigalov, A., & Jensen, O. (2020). Alpha oscillations do not implement gain control in early visual cortex but rather gating in parieto-occipital regions. Human brain mapping, 41(18), 5176–5186. https://doi.org/10.1002/hbm.25183

      Zumer, J. M., Scheeringa, R., Schoffelen, J. M., Norris, D. G., & Jensen, O. (2014). Occipital alpha activity during stimulus processing gates the information flow to object-selective cortex. PLoS biology, 12(10), e1001965. https://doi.org/10.1371/journal.pbio.1001965

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their paper, Zhan et al. have used Pf genetic data from simulated data and Ghanaian field samples to elucidate a relationship between multiplicity of infection (MOI) (the number of distinct parasite clones in a single host infection) and force of infection (FOI). Specifically, they use sequencing data from the var genes of Pf along with Bayesian modeling to estimate MOI individual infections and use these values along with methods from queueing theory that rely on various assumptions to estimate FOI. They compare these estimates to known FOIs in a simulated scenario and describe the relationship between these estimated FOI values and another commonly used metric of transmission EIR (entomological inoculation rate).

      This approach does fill an important gap in malaria epidemiology, namely estimating the force of infection, which is currently complicated by several factors including superinfection, unknown duration of infection, and highly genetically diverse parasite populations. The authors use a new approach borrowing from other fields of statistics and modeling and make extensive efforts to evaluate their approach under a range of realistic sampling scenarios. However, the write-up would greatly benefit from added clarity both in the description of methods and in the presentation of the results. Without these clarifications, rigorously evaluating whether the author's proposed method of estimating FOI is sound remains difficult. Additionally, there are several limitations that call into question the stated generalizability of this method that should at minimum be further discussed by authors and in some cases require a more thorough evaluation.

      Major comments:

      (1) Description and evaluation of FOI estimation procedure.

      a. The methods section describing the two-moment approximation and accompanying appendix is lacking several important details. Equations on lines 891 and 892 are only a small part of the equations in Choi et al. and do not adequately describe the procedure notably several quantities in those equations are never defined some of them are important to understand the method (e.g. A, S as the main random variables for inter-arrival times and service times, aR and bR which are the known time average quantities, and these also rely on the squared coefficient of variation of the random variable which is also never introduced in the paper). Without going back to the Choi paper to understand these quantities, and to understand the assumptions of this method it was not possible to follow how this works in the paper. At a minimum, all variables used in the equations should be clearly defined. 

      We thank the reviewer for this useful comment. We plan to clarify the method, including all the relevant variables in our revised manuscript. The reviewer is correct in pointing out that there are more sections and equations in Choi et al., including the derivation of an exact expression for the steady-state queue-length distribution and the two-moment approximation for the queue-length distribution. Since only the latter was directly utilized in our work, we included in the first version of our manuscript only material on this section and not the other. We agree with the reviewer on readers benefiting from additional information on the derivation of the exact expression for the steady-state queue-length distribution. Therefore, we will summarize the derivation of this expression in our revised manuscript. Regarding the assumptions of the method we applied, especially those for going from the exact expression to the two-moment approximation, we did describe these in the Materials and Methods of our manuscript. We recognize from this comment that the writing and organization of this information may not have been sufficiently clear. We had separated the information on this method into two parts, with the descriptive summary placed in the Materials and Methods and the equations or mathematical formula placed in the Appendix. This can make it difficult for readers to connect the two parts and remember what was introduced earlier in the Materials and Methods when reading the equations and mathematical details in the Appendix. For our revised manuscript, we plan to cover both parts in the Materials and Methods, and to provide more of the technical details in one place, which will be easier to understand and follow.

      b. Additionally, the description in the main text of how the queueing procedure can be used to describe malaria infections would benefit from a diagram currently as written it's very difficult to follow. 

      We thank the reviewer for this suggestion. We will add a diagram illustrating the connection between the queueing procedure and malaria transmission.

      c. Just observing the box plots of mean and 95% CI on a plot with the FOI estimate (Figures 1, 2, and 10-14) is not sufficient to adequately assess the performance of this estimator. First, it is not clear whether the authors are displaying the bootstrapped 95%CIs or whether they are just showing the distribution of the mean FOI taken over multiple simulations, and then it seems that they are also estimating mean FOI per host on an annual basis. Showing a distribution of those per-host estimates would also be helpful. Second, a more quantitative assessment of the ability of the estimator to recover the truth across simulations (e.g. proportion of simulations where the truth is captured in the 95% CI or something like this) is important in many cases it seems that the estimator is always underestimating the true FOI and may not even contain the true value in the FOI distribution (e.g. Figure 10, Figure 1 under the mid-IRS panel). But it's not possible to conclude one way or the other based on this visualization. This is a major issue since it calls into question whether there is in fact data to support that these methods give good and consistent FOI estimates. 

      There appears to be some confusion on what we display in some key figures. We will clarify this further both here and in the revised text. In Figures 1, 2, and 10-14, we displayed the bootstrapped distributions including the 95% CIs. These figures do not show the distribution of the mean FOI taken over multiple simulations. We estimated mean FOI on an annual basis per host in the following sense. Both of our proposed methods require either a steady-state queue length distribution, or moments of this distribution for FOI inference. However, we only have one realization or observation for each individual host, and we do not have access to either the time-series observation of a single individual’s MOI or many realizations of a single individual’s MOI at the same sampling time. This is typically the case for empirical data, although numerical simulations could circumvent this limitation and generate such output. Nonetheless, we do have a queue length distribution at the population level for both the simulation output and the empirical data, which can be obtained by simply aggregating MOI estimates across all sampled individuals. We use this population-level queue length distribution to represent and approximate the steady-state queue length distribution at the individual level. Such representation or approximation does not consider explicitly any individual heterogeneity due to biology or transmission. The estimated FOI is per host in the sense of representing the FOI experienced by an individual host whose queue length distribution is approximated from the collection of all sampled individuals. The true FOI per host per year in the simulation output is obtained from dividing the total FOI of all hosts per year by the total number of all hosts. Therefore, our estimator, combined with the demographic information on population size, is for the total number of Plasmodium falciparum infections acquired by all individual hosts in the population of interest per year.

      We evaluated the impact of individual heterogeneity on FOI inference by introducing individual heterogeneity into the simulations. With a considerable amount of transmission heterogeneity across individuals (namely 2/3 of the population receiving more than 90% of all bites whereas the remaining 1/3 receives the rest of the bites), our two methods exhibit a similar performance than those of the homogeneous transmission scenarios.

      Concerning the second point, we will add a quantitative assessment of the ability of the estimator to recover the truth across simulations and include this information in the legend of each figure. In particular, we will provide the proportion of simulations where the truth is captured by the entire bootstrap distribution, in addition to some measure of relative deviation, such as the relative difference between the true FOI value and the median of the bootstrap distribution for the estimate. This assessment will be a valuable addition, but please note that the comparisons we have provided in a graphical way do illustrate the ability of the methods to estimate “sensible” values, close to the truth despite multiple sources of errors. “Close” is here relative to the scale of variation of FOI in the field and to the kind of precision that would be useful in an empirical context. From a practical perspective based on the potential range of variation of FOI, the graphical results already illustrate that the estimated distributions would be informative.

      d. Furthermore the authors state in the methods that the choice of mean and variance (and thus second moment) parameters for inter-arrival times are varied widely, however, it's not clear what those ranges are there needs to be a clear table or figure caption showing what combinations of values were tested and which results are produced from them, this is an essential component of the method and it's impossible to fully evaluate its performance without this information. This relates to the issue of selecting the mean and variance values that maximize the likelihood of observing a given distribution of MOI estimates, this is very unclear since no likelihoods have been written down in the methods section of the main text, which likelihood are the authors referring to, is this the probability distribution of the steady state queue length distribution? At other places the authors refer to these quantities as Maximum Likelihood estimators, how do they know they have found the MLE? There are no derivations in the manuscript to support this. The authors should specify the likelihood and include in an appendix an explanation of why their estimation procedure is in fact maximizing this likelihood, preferably with evidence of the shape of the likelihood, and how fine the grid of values they tested is for their mean and variance since this could influence the overall quality of the estimation procedure. 

      We thank the reviewer for pointing out these aspects of the work that can be further clarified. We will specify the ranges for the choice of mean and variance parameters for inter-arrival times as well as the grid of values tested in the corresponding figure caption or in a separate supplementary table. We maximized the likelihood of observing the set of individual MOI estimates in a sampled population given steady queue length distributions (with these distributions based on the two-moment approximation method for different combinations of the mean and variance of inter-arrival times). We will add a section to either the Materials and Methods or the Appendix in our revised manuscript including an explicit formulation of the likelihood.

      We will add example figures on the shape of the likelihood to the Appendix. We will also test how choices of the grid of values influence the overall quality of the estimation procedure. Specifically, we will further refine the grid of values to include more points and examine whether the results of FOI inference are consistent and robust against each other.

      (2) Limitation of FOI estimation procedure.

      a. The authors discuss the importance of the duration of infection to this problem. While I agree that empirically estimating this is not possible, there are other options besides assuming that all 1-5-year-olds have the same duration of infection distribution as naïve adults co-infected with syphilis. E.g. it would be useful to test a wide range of assumed infection duration and assess their impact on the estimation procedure. Furthermore, if the authors are going to stick to the described method for duration of infection, the potentially limited generalizability of this method needs to be further highlighted in both the introduction, and the discussion. In particular, for an estimated mean FOI of about 5 per host per year in the pre-IRS season as estimated in Ghana (Figure 3) it seems that this would not translate to 4-year-old being immune naïve, and certainly this would not necessarily generalize well to a school-aged child population or an adult population. 

      The reviewer is indeed correct about the difficulty of empirically measuring the duration of infection for 1-5-year-olds, and that of further testing whether these 1-5-year-olds exhibit the same distribution for duration of infection as naïve adults co-infected with syphilis. We will nevertheless continue to use the described method for duration of infection, while better acknowledging and discussing the limitations this aspect of the method introduces. We note that the infection duration from the historical clinical data we have relied on, is being used in the malaria modeling community as one of the credible sources for this parameter of untreated natural infections in malaria-naïve individuals in malaria-endemic settings of Africa (e.g. in the agent-based model OpenMalaria, see 1).

      It is important to emphasize that the proposed methods apply to the MOI estimates for naïve or close to naïve patients. They are not suitable for FOI inference for the school-aged children and the adult populations of high-transmission endemic regions, since individuals in these age classes have been infected many times and their duration of infection is significantly shortened by their immunity. To reduce the degree of misspecification in infection duration and take full advantage of our proposed methods, we will emphasize in the revision the need to prioritize in future data collection and sampling efforts the subpopulation class who has received either no infection or a minimum number of infections in the past, and whose immune profile is close to that of naïve adults, for example, infants. This emphasis is aligned with the top priority of all intervention efforts in the short term, which is to monitor and protect the most vulnerable individuals from severe clinical symptoms and death.

      Also, force of infection for naïve hosts is a key basic parameter for epidemiological models of a complex infectious disease such as falciparum malaria, whether for agent-based formulations or equation-based ones. This is because force of infection for non-naïve hosts is typically a function of their immune status and the force of infection of naïve hosts. Thus, knowing the force of infection of naïve hosts can help parameterize and validate these models by reducing degrees of freedom.

      b. The evaluation of the capacity parameter c seems to be quite important and is set at 30, however, the authors only describe trying values of 25 and 30, and claim that this does not impact FOI inference, however it is not clear that this is the case. What happens if the carrying capacity is increased substantially? Alternatively, this would be more convincing if the authors provided a mathematical explanation of why the carrying capacity increase will not influence the FOI inference, but absent that, this should be mentioned and discussed as a limitation. 

      Thank you for this question. We will investigate more values of the parameter c systematically, including substantially higher ones. We note however that this quantity is the carrying capacity of the queuing system, or the maximum number of blood-stage strains that an individual human host can be co-infected with. We do have empirical evidence for the value of the latter being around 20 (2). This observed value provides a lower bound for parameter c. To account for potential under-sampling of strains, we thus tried values of 25 and 30 in the first version of our manuscript.

      In general, this parameter influences the steady-state queue length distribution based on the two-moment approximation, more specifically, the tail of this distribution when the flow of customers/infections is high. Smaller values of parameter c put a lower cap on the maximum value possible for the queue length distribution. The system is more easily “overflowed”, in which case customers (or infections) often find that there is no space available in the queuing system/individual host upon their arrival. These customers (or infections) will not increment the queue length. The parameter c has therefore a small impact for the part of the grid resulting in low flows of customers/infection, for which the system is unlikely to be overflowed. The empirical MOI distribution centers around 4 or 5 with most values well below 10, and only a small fraction of higher values between 15-20 (2). When one increases the value of c, the part of the grid generating very high flows of customers/infections results in queue length distributions with a heavy tail around large MOI values that are not supported by the empirical distribution. We therefore do not expect that substantially higher values for parameter c would change either the relative shape of the likelihood or the MLE.

      Reviewer #2 (Public Review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent-based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real-world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      (1) The use of historical clinical data is very clever in this context. 

      (2) The simulations are very sophisticated with respect to trying to capture realistic population dynamics. 

      (3) The mathematical approach is simple and elegant, and thus easy to understand. 

      Weaknesses: 

      (1) The assumptions of the approach are quite strong and should be made more clear. While the historical clinical data is a unique resource, it would be useful to see how misspecification of the duration of infection distribution would impact the estimates. 

      We thank the reviewer for bringing up the limitation of our proposed methods due to their reliance on a known and fixed duration of infection from historical clinical data. Please see our response to reviewer 1 comment 2a.

      (2) Seeing as how the assumption of the duration of infection distribution is drawn from historical data and not informed by the data on hand, it does not substantially expand beyond MOI. The authors could address this by suggesting avenues for more refined estimates of infection duration. 

      We thank the reviewer for pointing out a potential improvement to the work. We acknowledge that FOI is inferred from MOI, and thus is dependent on the information contained in MOI. FOI reflects risk of infection, is associated with risk of clinical episodes, and can relate local variation in malaria burden to transmission better than other proxy parameters for transmission intensity. It is possible that MOI can be as informative as FOI when one regresses the risk of clinical episodes and local variation in malaria burden with MOI. But MOI by definition is a number and not a rate parameter. FOI for naïve hosts is a key basic parameter for epidemiological models. This is because FOI of non-naïve hosts is typically a function of their immune status and the FOI of naïve hosts. Thus, knowing the FOI of naïve hosts can help parameterize and validate these models by reducing degrees of freedom. In this sense, we believe the transformation from MOI to FOI provides a useful step.

      Given the difficulty of measuring infection duration, estimating infection duration and FOI simultaneously appears to be an attractive alternative, as the referee pointed out. This will require however either cohort studies or more densely sampled cross-sectional surveys due to the heterogeneity in infection duration across a multiplicity of factors. These kinds of studies have not been, and will not be, widely available across geographical locations and time. This work aims to utilize more readily available data, in the form of sparsely sampled single-time-point cross-sectional surveys.

      (3) It is unclear in the example how their bootstrap imputation approach is accounting for measurement error due to antimalarial treatment. They supply two approaches. First, there is no effect on measurement, so the measured MOI is unaffected, which is likely false and I think the authors are in agreement. The second approach instead discards the measurement for malaria-treated individuals and imputes their MOI by drawing from the remaining distribution. This is an extremely strong assumption that the distribution of MOI of the treated is the same as the untreated, which seems unlikely simply out of treatment-seeking behavior. By imputing in this way, the authors will also deflate the variability of their estimates. 

      We thank the reviewer for pointing out aspects of the work that can be further clarified. It is difficult to disentangle the effect of drug treatment on measurement, including infection status, MOI, and duration of infection. Thus, we did not attempt to address this matter explicitly in the original version of our manuscript. Instead, we considered two extreme scenarios which bound reality, well summarized by the reviewer. First, if drug treatment has had no impact on measurement, the MOI of the drug-treated 1-5-year-olds would reflect their true underlying MOI. We can then use their MOI directly for FOI inference. Second, if the drug treatment had a significant impact on measurement, i.e., if it completely changed the infection status, MOI, and duration infection of drug-treated 1-5-year-olds, we would need to either exclude those individuals’ MOI or impute their true underlying MOI. We chose to do the latter in the original version of the manuscript. If those 1-5-year-olds had not received drug treatment, they would have had similar MOI values than those of the non-treated 1-5-year-olds. We can then impute their MOI by sampling from the MOI estimates of non-treated 1-5-year-olds.

      The reviewer is correct in pointing out that this imputation does not add additional information and can potentially deflate the variability of MOI distributions, compared to simply throwing or excluding those drug-treated 1-5-year-olds from the analysis. Thus, we can include in our revision FOI estimates with the drug-treated 1-5-year-olds excluded in the estimation.

      - For similar reasons, their imputation of microscopy-negative individuals is also questionable, as it also assumes the same distributions of MOI for microscopy-positive and negative individuals. 

      We imputed the MOI values of microscopy-negative but PCR-positive 1-5-year-olds by sampling from the microscopy-positive 1-5-year-olds, effectively assuming that both have the same, or similar, MOI distributions. We did so because there is a weak relationship in our Ghana data between the parasitemia level of individual hosts and their MOI (or detected number of var genes, on the basis of which the MOI values themselves were estimated). Parasitemia levels underlie the difference in detection sensitivity of PCR and microscopy.

      We will elaborate on this matter in our revised manuscript and include information from our previous and on-going work on the weak relationship between MOI/the number of var genes detected within an individual host and their parasitemia levels. We will also discuss potential reasons or hypotheses for this pattern.

      Reviewer #3 (Public Review):

      Summary: 

      It has been proposed that the FOI is a method of using parasite genetics to determine changes in transmission in areas with high asymptomatic infection. The manuscript attempts to use queuing theory to convert multiplicity of infection estimates (MOI) into estimates of the force of infection (FOI), which they define as the number of genetically distinct blood-stage strains. They look to validate the method by applying it to simulated results from a previously published agent-based model. They then apply these queuing theory methods to previously published and analysed genetic data from Ghana. They then compare their results to previous estimates of FOI. 

      Strengths: 

      It would be great to be able to infer FOI from cross-sectional surveys which are easier and cheaper than current FOI estimates which require longitudinal studies. This work proposes a method to convert MOI to FOI for cross-sectional studies. They attempt to validate this process using a previously published agent-based model which helps us understand the complexity of parasite population genetics. 

      Weaknesses: 

      (1) I fear that the work could be easily over-interpreted as no true validation was done, as no field estimates of FOI (I think considered true validation) were measured. The authors have developed a method of estimating FOI from MOI which makes a number of biological and structural assumptions. I would not call being able to recreate model results that were generated using a model that makes its own (probably similar) defined set of biological and structural assumptions a validation of what is going on in the field. The authors claim this at times (for example, Line 153 ) and I feel it would be appropriate to differentiate this in the discussion. 

      We thank the reviewer for this comment, although we think there is a mis-understanding on what can and cannot be practically validated in the sense of a “true” measure of FOI that would be free from assumptions for a complex disease such as malaria. We would not want the results to be over-interpreted and will extend the discussion of what we have done to test the methods. We note that for the performance evaluation of statistical methods, the use of simulation output is quite common and often a necessary and important step. In some cases, the simulation output is generated by dynamical models, whereas in others, by purely descriptive ones. All these models make their own assumptions which are necessarily a simplification of reality. The stochastic agent-based model (ABM) of malaria transmission utilized in this work has been shown to reproduce several important patterns observed in empirical data from high-transmission regions, including aspects of strain diversity which are not represented in simpler models.

      In what sense this ABM makes a set of biological and structural assumptions which are “probably similar” to those of the queuing methods we present, is not clear to us. We agree that relying on models whose structural assumptions differ from those of a given method or model to be tested, is the best approach. Our proposed methods for FOI inference based on queuing theory rely on the duration of infection distribution and the MOI distribution among sampled individuals, both of which can be direct outputs from the ABM. But these methods are agnostic on the specific mechanisms or biology underlying the regulation of duration and MOI.

      Another important point raised by this comment is what would be the “true” FOI value against which to validate our methods. Empirical MOI-FOI pairs for FOI measured directly by tracking cohort studies are still lacking. There are potential measurement errors for both MOI and FOI because the polymorphic markers typically used in different cohort studies cannot differentiate hyper-diverse antigenic strains fully and well (5). Also, these cohort studies usually start with drug treatment. Alternative approaches do not provide a measure of true FOI, in the sense of the estimation being free from assumptions. For example, one approach would be to fit epidemiological models to densely sampled/repeated cross-sectional surveys for FOI inference. In this case, no FOI is measured directly and further benchmarked against fitted FOI values. The evaluation of these models is typically based on how well they can capture other epidemiological quantities which are more easily sampled or measured, including prevalence or incidence. This is similar to what is done in this work. We selected the FOI values that maximize the likelihood of observing the given distribution of MOI estimates. Furthermore, we paired our estimated FOI value for the empirical data from Ghana with another independently measured quantity EIR (Entomological Inoculation Rate), typically used in the field as a measure of transmission intensity. We check whether the resulting FOI-EIR point is consistent with the existing set of FOI-EIR pairs and the relationship between these two quantities from previous studies. We acknowledge that as for model fitting approaches for FOI inference, our validation is also indirect for the field data.

      Prompted by the reviewer’s comment, we will discuss this matter in more detail in our revised manuscript, including clarifying further certain basic assumptions of our agent-based model, emphasizing the indirect nature of the validation with the field data and the existing constraints for such validation.

      (2) Another aspect of the paper is adding greater realism to the previous agent-based model, by including assumptions on missing data and under-sampling. This takes prominence in the figures and results section, but I would imagine is generally not as interesting to the less specialised reader. The apparent lack of impact of drug treatment on MOI is interesting and counterintuitive, though it is not really mentioned in the results or discussion sufficiently to allay my confusion. I would have been interested in understanding the relationship between MOI and FOI as generated by your queuing theory method and the model. It isn't clear to me why these more standard results are not presented, as I would imagine they are outputs of the model (though happy to stand corrected - it isn't entirely clear to me what the model is doing in this manuscript alone). 

      We thank the reviewer for this comment. We will add supplementary figures for the MOI distributions generated by the queuing theory method (i.e., the two-moment approximation method) and our agent-based model in our revised manuscript.

      In the first version of our manuscript, we considered two extreme scenarios which bound the reality, instead of simply assuming that drug treatment does not impact the infection status, MOI, and duration of infection. See our response to reviewer 2 point (3). The resulting FOI estimates differ but not substantially across the two extreme scenarios, partially because drug-treated individuals’ MOI distribution is similar to that of non-treated individuals (or the apparent lack of drug treatment on MOI as pointed by the referee). We will consider potentially adding some formal test to quantify the difference between the two MOI distributions and how significant the difference is. We will discuss which of the two extreme scenarios reality is closer to, given the result of the formal test. We will also discuss in our revision possible reasons/hypotheses underlying the impact of drug treatment on MOI from the perspective of the nature, efficiency, and duration of the drugs administrated.

      Regarding the last point of the reviewer, on understanding the relationship between MOI and FOI, we are not fully clear about what was meant. We are also confused about the statement on what the “model is doing in this manuscript alone”. We interpret the overall comment as the reviewer suggesting a better understanding of the relationship between MOI and FOI, either between their distributions, or the moments of their distributions, perhaps by fitting models including simple linear regression models. This approach is in principle possible, but it is not the focus of this work. It will be equally difficult to evaluate the performance of this alternative approach given the lack of MOI-FOI pairs from empirical settings with directly measured FOI values (from large cohort studies). Moreover, the qualitative relationship between the two quantities is intuitive. Higher FOI values should correspond to higher MOI values. Less variable FOI values should correspond to more narrow or concentrated MOI distributions, whereas more variable FOI values should correspond to more spread-out ones. We will discuss this matter in our revised manuscript.

      (3) I would suggest that outside of malaria geneticists, the force of infection is considered to be the entomological inoculation rate, not the number of genetically distinct blood-stage strains. I appreciate that FOI has been used to explain the latter before by others, though the authors could avoid confusion by stating this clearly throughout the manuscript. For example, the abstract says FOI is "the number of new infections acquired by an individual host over a given time interval" which suggests the former, please consider clarifying. 

      We thank the reviewer for this helpful comment as it is fundamental that there is no confusion on the basic definitions. EIR, the entomological inoculation rate, is closely related to the force of infection but is not equal to it. EIR focuses on the rate of arrival of infectious bites and is measured as such by focusing on the mosquito vectors that are infectious and arrive to bite a given host. Not all these bites result in actual infection of the human host. Epidemiological models of malaria transmission clearly make this distinction, as FOI is defined as the rate at which a host acquires infection. This definition comes from more general models for the population dynamics of infectious diseases in general. (For diseases simpler than malaria, with no super-infection, the typical SIR models define the force of infection as the rate at which a susceptible individual becomes infected).  For malaria, force of infection refers to the number of blood-stage new infections acquired by an individual host over a given time interval. This distinction between EIR and FOI is the reason why studies have investigated their relationship, with the nonlinearity of this relationship reflecting the complexity of the underlying biology and how host immunity influences the outcome of an infectious bite.

      We agree however with the referee that there could be some confusion in our definition resulting from the approach we use to estimate the MOI distribution (which provides the basis for estimating FOI). In particular, we rely on the non-existent to very low overlap of var repertoires among individuals with MOI=1, an empirical pattern we have documented extensively in previous work (See 2, 3, and 4). The method of var_coding and its Bayesian formulation rely on the assumption of negligible overlap. We note that other approaches for estimating MOI (and FOI) based on other polymorphic markers, also make this assumption (reviewed in _5). Ultimately, the FOI we seek to estimate is the one defined as specified above and in both the abstract and introduction, consistent with the epidemiological literature. We will include clarification in the introduction and discussion of this point in the revision.

      (4) Line 319 says "Nevertheless, overall, our paired EIR (directly measured by the entomological team in Ghana (Tiedje et al., 2022)) and FOI values are reasonably consistent with the data points from previous studies, suggesting the robustness of our proposed methods". I would agree that the results are consistent, given that there is huge variation in Figure 4 despite the transformed scales, but I would not say this suggests a robustness of the method. 

      We will modify the relevant sentences to use “consistent” instead of “robust”.

      (5) The text is a little difficult to follow at times and sometimes requires multiple reads to understand. Greater precision is needed with the language in a few situations and some of the assumptions made in the modelling process are not referenced, making it unclear whether it is a true representation of the biology. 

      We thank the reviewer for this comment. As also mentioned in the response to reviewer 1’s comments, we will reorganize and rewrite parts of the text in our revision to improve clarity.

      References and Notes

      (1)   Maire, N. et al. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. Am J Trop Med Hyg., 75(2 Suppl):19-31 (2006).

      (2)   Tiedje, K. E. et al. Measuring changes in Plasmodium falciparum census population size in response to sequential malaria control interventions. eLife, 12 (2023).

      (3)   Day, K. P. et al. Evidence of strain structure in Plasmodium falciparum var gene repertoires in children from Gabon, West Africa. Proc. Natl. Acad. Sci. U.S.A., 114(20), 4103-4111 (2017).

      (4)   Ruybal-Pesántez, S. et al. Population genomics of virulence genes of Plasmodium falciparum in clinical isolates from Uganda. Sci. Rep., 7(11810) (2017).

      (5)   Labbé, F. et al. Neutral vs. non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLoS Comput Biol 19(1) (2023).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors have adequately responded to all comments.

      We thank Reviewer 1 for their positive assessment of our previous round of revisions.

      Reviewer #2 (Public review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      - The use of historical clinical data is very clever in this context

      - The simulations are very sophisticated with respect to trying to capture realistic population dynamics

      - The mathematical approach is simple and elegant, and thus easy to understand

      Weakness:

      The assumptions of the approach are quite strong, and the authors have made clear that applicability is constrained to individuals with immune profiles that are similar to malaria naive patients with neurosyphilis. While the historical clinical data is a unique resource and likely directionally correct, it remains somewhat dubious to use the exact estimated values as inputs to other models without extensive sensitivity analysis.

      We thank reviewer 2 for their comments on our previous round of revisions. The statement here that “it remains somewhat dubious to use the exact estimated values as inputs to other models” suggests that we may not have been sufficiently clear on how infection duration is represented in our agent-based model (ABM) of malaria population dynamics. Because our analysis uses simulated outputs from the ABM to validate the performance of the two queuing-theory methods, we believe this point warrants clarification, which we provide below.

      When simulating with the ABM, we do not use empirical estimates of infection duration in immunologically naïve individuals from the historical clinical data as direct inputs. Instead, infection duration emerges from the within-host dynamics modeled in the ABM (lines 800-816, second paragraph of the subsection Within-host dynamics in Appendix 1-Simulation data of the previous revision). Briefly, each Plasmodium falciparum parasite carries approximately 50-60 var genes, each encoding a distinct variant surface antigen expressed during the blood stage of infection. Empirical evidence[1,2] indicates that these var genes are expressed largely sequentially. If a host has previously encountered the antigenic product of a given var gene and retains immunity to it, subject to waning at empirically estimated rates[3,4], the corresponding parasite subpopulation is rapidly cleared. Conversely, if the host is naïve to that gene, it takes approximately seven days for the immune system to mount an effective antibody response, resulting in a rapid decline or elimination of the expressed variant[5]. This seven-day timescale aligns with the duration of each successive parasitemia peak observed in Plasmodium falciparum infections[6,7], each arising primarily from the expression of a single var gene and occasionally from a small number of var genes.

      In our previous analyses, we therefore modeled an average expression duration of seven days per gene in naïve hosts. Specifically, the switching time to the next gene was drawn from an exponential distribution with a mean of seven days. Each var gene is represented as a linear combination of two epitopes (alleles), based on the empirical characterization of two hypervariable regions in the var tag region[8], and immunity is acquired against these alleles. Immunity to one allele of a given gene reduces its average expression duration by approximately half, whereas immunity to both alleles results in an immediate switch to another var gene within the infection. Consequently, the total duration of infection is proportional to the number of unseen alleles by the host across all var genes expressed during that infection (lines 800-816, second paragraph of the subsection Within-host dynamics in Appendix 1-Simulation data of the previous revision).

      Prompted by the reviewer’s comments, in this revision we additionally tested mean expression durations of 7.5 and 8 days per var gene, together with an extension of the within-host rules. These values were applied in combination with the extended within-host rules (see the next paragraph for motivation and details). Although differences among the three mean expression durations are modest at the per-gene level, when aggregated across all var genes expressed within an individual parasite, the resulting total infection duration can differ by on the order of several months. The resulting distributions of infection duration across immunologically naïve individuals and those aged 1-5 years, together with those generated under our previous simulation settings, span a range of means and variances that lies above and below, but encompasses, scenarios comparable to the historical clinical data from naïve neurosyphilis patients treated with P. falciparum malaria. We have provided example supplementary figures illustrating that the distributions of infection duration from the simulated outputs overlap with, and closely resemble, the empirical distribution from the historical clinical data (Appendix 1-Figure 27-32).

      We considered the following modification of the within-host rules. In our previous ABM simulations, we had assumed that an infection would clear only once the parasite had exhausted its entire var gene repertoire, that is, after every var gene had been expressed and recognized. However, biological evidence indicates that clearance can occur earlier for several reasons, including stochastic extinction before full repertoire exhaustion. Even if some var genes remain unexpressed, an infection can terminate due to demographic stochasticity once parasite densities fall to very low levels. This decline in parasite densities may result from non-variant-specific immune mechanisms or from cross-immunity among var genes that share sequence similarity or alleles[9,10,11], both of which can substantially reduce parasite numbers. To model the possibility of termination or clearance before full repertoire exhaustion, we implemented a simple scenario in which there is a small probability of clearing the current infection while a given var gene-whether non-final or final-is being expressed. This probability is a function of the host’s pre-existing immunity to the two epitopes (alleles) of that gene, thereby capturing in a parsimonious manner the effects of cross-immunity among sequence- or allele-sharing var genes in reducing parasitemia. Specifically, it is modeled as a Bernoulli draw whose success probability equals the immunity level against the gene (0 for no immunity to either epitope, 0.5 for immunity to one epitope, and 1 for immunity to both epitopes) multiplied by a constant factor of 0.025. Thus, the probability scales with pre-existing variant-specific immunity to the gene but remains small overall, while introducing additional variance into the emergent distribution of total infection duration across hosts.

      We acknowledge that the ABM used to simulate malaria population dynamics cannot capture all mechanisms and complexities underlying within-host processes, many of which remain poorly understood. However, we emphasize that the resulting distributions of infection duration generated by the ABM span a broad range of means, variances, and shapes, including distributions that closely match those observed in the clinical historical data. Because the queueing-theory methods rely on only the mean and variance of infection duration to estimate the force of infection (FOI), these scenarios, which collectively span and encompass values comparable to the empirical ones, provide an appropriate basis for evaluating the performance of the methods using simulated outputs. We have added supplementary figures (see Appendix 1-Figure 16-22) illustrating the corresponding FOI inference results when we allow for clearance before the complete expression of the var repertoire, and the accuracy of FOI estimation remains comparable across all the scenarios examined.

      Finally, we emphasize that the application of the queuing-theory methods to the simulated outputs and to the Ghana field survey data involve two self-contained steps. For the simulations, FOI is inferred directly from the emergent distributions of infection duration generated by the ABM. For the Ghana surveys, FOI is inferred using the historical clinical data, which remains one of the few credible and widely used empirical sources for infection duration in immunologically naïve individuals[6]. By exploring different mean expression durations and within-host rules in the ABM, which generates distributions of infection duration that span and encompass those comparable to the empirical distribution, we demonstrate that the queueing-theory methods perform comparably across diverse scenarios and are well suited for application to the Ghana field surveys.

      We expanded the section on within-host dynamics in Appendix 1 to elaborate on this point (Lines 817-854).

      Reviewer #3 (Public review):

      I think the authors gave a robust but thorough response to our reviews and made some important changes to the manuscript which certainly clarify things for me.

      We thank Reviewer 3 for their positive feedback on our previous round of revisions.

      References

      (1) Zhang, X. & Deitsch, K. W. The mystery of persistent, asymptomatic Plasmodium falciparum infections. Curr. Opin. Microbiol 70, 102231 (2022).

      (2) Deitsch, K. W. & Dzikowski, R. Variant gene expression and antigenic variation by malaria parasites. Annu. Rev. Microbiol. 71, 625–641 (2017).

      (3) Collins, W. E., Skinner, J. C. & Jeffery, G. M. Studies on the persistence of malarial antibody response. American journal of epidemiology, 87(3), 592–598 (1968).

      (4) Collins, W. E., Jeffery, G. M. & Skinner, J. C. Fluorescent Antibody Studies in Human Malaria. II. Development and Persistence of Antibodies to Plasmodium falciparum. The American journal of tropical medicine and hygiene, 13, 256–260 (1964).

      (5) Gatton, M. L., & Cheng, Q. Investigating antigenic variation and other parasite-host interactions in Plasmodium falciparum infections in naïve hosts. Parasitology, 128(Pt 4), 367–376 (2004).

      (6) Maire, N., Smith, T., Ross, A., Owusu-Agyei, S., Dietz, K., & Molineaux, L. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. The American journal of tropical medicine and hygiene, 75(2 Suppl), 19–31 (2006).

      (7) Chen D. S., Barry A. E., Leliwa-Sytek A., Smith T-A., Peterson I., Brown S. M., et al. A Molecular Epidemiological Study of var Gene Diversity to Characterize the Reservoir of Plasmodium falciparum in Humans in Africa. PLoS ONE 6(2): e16629 (2011).

      (8) Larremore D. B., Clauset A., & Buckee C. O. A Network Approach to Analyzing Highly Recombinant Malaria Parasite Genes. PLoS Comput Biol 9(10): e1003268 (2013).

      (9) Holding T. & Recker M. Maintenance of phenotypic diversity within a set of virulence encoding genes of the malaria parasite Plasmodium falciparum. J. R. Soc. Interface.1220150848 (2015).

      (10) Crompton, P. D., Moebius, J., Portugal, S., Waisberg, M., Hart, G., Garver, L. S., Miller, L. H., Barillas-Mury, C., & Pierce, S. K. Malaria immunity in man and mosquito: insights into unsolved mysteries of a deadly infectious disease. Annual review of immunology, 32, 157–187 (2014).

      (11) Langhorne, J., Ndungu, F., Sponaas, AM. et al. Immunity to malaria: more questions than answers. Nat Immunol 9, 725–732 (2008).

    1. Author response:

      We thank the three reviewers for their critical and in-depth assessment of our study. Below you find our comments to the public reviews and our revision plans.

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript adds to the recent, exciting developments in our understanding of the MmpL/S transporters from mycobacteria. This work provides solid support for the trimeric/hexameric arrangement of subunits in the complex, and reveals a possible pathway for substrate translocation.Overall, I think this manuscript is a solid body of work that adds to several recent studies from this team and others on the structure and mechanism of the MmpL/S transporter family, particularly MmpL4/S4. The combination of AF, disulfide engineering, and experimental structure is good, though it is a bit puzzling that the experimental structure based on disulfide stabilization of the AF prediction does not recapitulate key elements (MmpS periplasmic domain docking to MmpL, and altered CCD configuration).

      I have no major concerns about this manuscript.

      We thank reviewer#1 for this positive assessment of our work. The deviation of the AF prediction from the experimental structure is , in our view, not puzzling. AF does not take the physical properties of proteins into account, but predicts structures based on strong sequence alignments. It therefore does not have “knowledge” about the general flexibility of domains such as the CCD, which is also observed in the corresponding MmpL5 structures, nor does it have knowledge about preferred conformational states. Rather than “failing” to confirm the AF predictions, our cryo-EM structure revealed an unexpected tilted conformation of the CCD. As we outline in comments below, the physiological relevance of the tilted CCD is unclear. Its flexibility might be required to interact with (still elusive) outer membrane protein components to form the fully assembled efflux machinery.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes the structure of the Mycobacterium tuberculosis (MmpS4)3-(MmpL4)3 hetero-heximeric transporter complex. The structure was obtained by cryogenic electron microscopy using an engineered construct that cross-links MmpS4 to MmpL4 via a disulfide bond. The position of the disulfide bond was determined using an Alphafold2 model of the hetero-heximer. Although Alphafold2 predicts a symmetric hetero-heximer, the author found that the structure of the coiled-coil domain (CCD) is asymmetric, tilted at about 60° relative to the membrane domains, and only contains two of the three alpha helical hairpins, with the third being disordered.

      Strengths:

      The strategy of using Alphafold2 models to guide construct design for experimental structure determination is state-of-the-art, and this work provides a great example of its applications and limitations. I.e., the experimental structure does not fully recapitulate the prediction but provides unexpected results.

      The comparisons between the authors' structures and the previously published structures of the MmpL4 monomer and MmpL5 trimers strengthen the authors' findings.

      We thank reviewer#2 for this positive assessment of our work and agree that it is interesting that the experimental structures do not fully agree with the AF predictions (see also comment to reviewer#1).

      Weaknesses:

      A more detailed description of the current mechanistic hypothesis would strengthen the manuscript. The authors state that the two periplasmic domains "are expected to undergo rigid body movements that allow substrate transport through these periplasmic domains similar to the conformational changes observed in the E. coli multidrug efflux pump AcrB". A schematic of the proposed transport cycle, as a supplemental figure that shows the current hypothesis regarding transport, would be beneficial for understanding the previous structures and putting the current structure in context. Outside of "the mechanistic basis of how these conformational changes are coupled to protonation of the DY-pairs", what are the major controversies/open questions regarding the mechanism?

      We thank the reviewer for this valuable comment. We will add a new figure with the model of the MmpL4 transport cycle based on our new data and discuss the proposed molecular transport mechanism in more detail in the main.

      The authors provide evidence that the cysteine-depleted S4L4 construct is functional, but do not show that the construct with the introduced disulfide bond #5 (D39C MmpS4 and S434C MmpL4) is also functional. Demonstrating this would allow the authors to better interpret their resulting structures.

      In the revised version, we will include additional data to assess the functional consequences of cross-linking.

      The analysis presented in Figure 5 and Supplementary Figure 7 seems to suggest that the authors are proposing that the CCD central cavity acts as a transport pathway for the transported substrate, but I am not sure that this hypothesis is explicitly stated. This makes the reasoning behind the analysis presented unclear. Clarity could be improved by stating that the hypothesis of direct transport of substrate through the CCD central channel is being examined using the structure prediction, and what the implications are for the structure solved with the incompletely formed CCD.

      We state clearly in the discussion that the channel through the CCD seems too narrow to let large molecules like mycobactin and bedaquiline pass:[AG1]

      Line 318ff: “ The channel radius of the MmpL4 CCD is very narrow with a minimum of 1.1 Å according to the AlphaFold3 predition (Fig. 5). This is much smaller than the smallest axis of a molecular model of mycobactin molecule of ?? nm as determined from a model of iron-free mycobactin. In addition, the cryo-EM structure of MSMEG_1382 revealed a constriction in the CCD channel [21]. Even though the methionine side chains lining the channel wall are considered to be flexible{Aledo, 2019 #69594}, large conformational changes of the α-helical hairpins relative to each other would be required to allow passage of molecules as large as mycobactin and bedaquiline. The AcrAB-TolC efflux machinery provides an example for such large conformational changes to enable transport of large molecules by iris-like opening and closing movements the outer membrane channel-tunnel TolC [33]. Similar helical twisting may widen the channel of the CCD. Alternatively, it is conceivable that the substrates of MmpL4/MmpL5 are transported along the CCD surface, potentially requiring further protein partners. It is interesting to note that siderophore secretion and drug efflux by MmpL4/MmpL5 systems involves at least two additional proteins, namely the periplasmic protein Rv0455, which was shown to be essential for mycobactin efflux [34] and an outer membrane channel, whose identity remains elusive. A complete molecular understanding of the transport mechanism through the MmpL4/MmpL5 systems hence requires the identification of the missing components and structural information about their interactions.”

      The channel radius of the MmpL4 CCD is very narrow (minimum of 1.1 Å) according to the AlphaFold3 prediction (Fig. 5), and the cryo-EM [AG2] [MN3] structure of MSMEG_1382 revealed a further constriction in the CCD channel [21]. We therefore consider direct substrate transport through the CCD central channel to be physically implausible for molecules of the size of mycobactin and bedaquiline. Even accounting for the flexibility of the methionine side chains lining the channel wall, the large conformational changes of the α-helical hairpins relative to each other would be required to accommodate such large substrates. While iris-like opening movements have been described for TolC in the AcrAB-TolC system [33], those movements widen an already substantially larger channel, and even such dramatic conformational changes would be insufficient to open a channel as narrow as that of the MmpL4 CCD to a diameter permissive for substrate passage. We instead favor a model in which substrates are transported along the outer surface of the CCD, potentially with the assistance of additional protein partners. This is consistent with the observation that MmpL4/MmpL5-mediated siderophore secretion and drug efflux involves at least two further proteins: the periplasmic protein Rv0455, shown to be essential for mycobactin efflux [34], and an as-yet-unidentified outer membrane channel. In this context, the overall flexibility of the CCD - illustrated here by the tilted, incompletely formed conformation - may reflect the conformational dynamics required for interaction with these partner proteins, rather than being directly involved in forming a transport conduit. A complete mechanistic understanding will require identification of the missing components and structural characterization of the fully assembled efflux machinery.

      We do not think that the incompletely formed CCD represents a conformation that is relevant for transport. But it is a demonstration of the overall flexibility of the CCD, which may be required to further open the channel in case the substrates are transported within the CCD tube. Further in-depth experiments will be needed to clarify this interesting question, which is beyond the scope of this paper.

      Given that the results emphasize the flexibility of the CCD, the manuscript would be strengthened by 3D variability analysis either in cryoSPARC or using cryoDRGN (or both). This would allow the authors to better quantify the degree of motion in the CCD and how it may correlate to flexibility in other regions. Further 3D flex reconstruction in cryoSPARC may improve the map quality of the CCD.

      This is a great suggestion. We will include a 3D variability analysisin the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Earp et al reports cryoEM structures of the hexameric (MmpS4)<sub>3</sub>-(MmpL4) )<sub>3</sub> complex from Mycobacterium tuberculosis, which belongs to the RND family of transporters and is known to have a role in the export of siderophores and contribute to drug resistance. The experimental workflow showcased involves the design of disulfide pairs using distance constraints obtained from the AlphaFold predicted structure of the hexameric complex. One such disulfide pair was used to determine the ~3.0 Å structures. The structure reveals density for the previously unresolved coiled-coil domain (CCD), a tilted CCD arrangement, and a cavity within the periplasmic domain, which the authors assert is occupied by detergent. Comparison of this complex with the monomer structure of MmpL4 shows conformational variations interpreted to implicate different domains and conserved residues involved in proton coupling, which might be related to the transport mechanism. While the methodological aspects of the manuscript are solid, enthusiasm for the overall advance/significance is less so, with doubts about the relevance of the tilted CCD structure, considering disulfide trapping and an incomplete validation of the claim that the titled CCD represents a stable intermediate conformation. A clear, updated transport mechanism is largely missing from the manuscript.

      We thank reviewer#3 for these useful comments, which we will address during the revision of the manuscript. In particular, we plan to include a scheme of an updated transport model.

      Strengths:

      Beautiful structures, AF prediction-experimental validation nexus that could be fine-tuned for different systems/difficult to target complexes.

      Weaknesses:

      Physiological relevance of the tilted CCD conformation. No clear mechanistic model for the transport. While the CCD may indeed be a stable intermediate, the fact that the rest of the trimeric arrangement is unaffected does not fully rule out disulfide trapping as a factor in promoting this. The findings would be strengthened if the same tilted conformation is seen using a different set of disulfides. The significance of the detergent molecule and the new cavity observed could also be better discussed in terms of an updated transport model.

      We believe that there was a misunderstanding about our interpretation of the tilted CCD. As a matter of fact, it must be a stable intermediate, otherwise no density would have been observed for it in the cryo-EM maps. Despite being a stable intermediate, it is indeed unlikely that it represents a conformational state that is relevant/required for transport. Firstly, only the upright, complete CCD can bridge the periplasm. because . Secondly, the structure was determined in detergent and lacks additional protein binder partners, which might stabilize the upright conformation of the CCD . It is also conceivable, as the reviewer pointed out, that disulfide cross-linking may have caused the tilt. However, as we wrote in the manuscript, we do not think that cross-linking caused this striking asymmetry of the CCD, because the three MmpL4 and MmpS4 chains are basically symmetrical in the C1-processed data (see also Figure 2E):

      Line 182 ff: “To assess whether there are asymmetries in other parts of the structure, we superimposed the individual protomers of the (MmpS4)3-(MmpL4)3 complex analyzed using C1 symmetry (Fig. 2E). Apart from the two resolved α-helical hairpins, the MmpL4 core domains and the resolved parts of MmpS4 differ by a RMSD of less than 0.6 Å and are therefore structurally identical considering the map resolution of around 3 Å. The fact that the core domains of MmpS4 and MmpL4 do not deviate between the protomers argues against the possibility that the cross-links established between them cause the (asymmetric) tilt of the CCD.”

      Regarding the DDM binding site, we will indeed include an updated transport model. That said, we wish to be cautious, because we lack experimental proof that MmpL4 can in fact transport DDM.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In "Drift in Individual Behavioral Phenotype as a Strategy for Unpredictable Worlds," Maloney et al. (2024) investigate changes in individual responses over time, referred to as behavioral drift within the lifespan of an animal. Drift, as defined in the paper, complements stable behavioral variation (animal individuality/personality within a lifetime) over shorter timeframes, which the authors associate with an underlying bet-hedging strategy. The third timeframe of behavioral variability that the authors discuss occurs within seasons (across several generations of some insects), termed "adaptive tracking." This division of "adaptive" behavioral variability over different timeframes is intuitively logical and adds valuable depth to the theoretical framework concerning the ecological role of individual behavioral differences in animals.

      Strengths:

      While the theoretical foundations of the study are strong, the connection between the experimental data (Figure 1) and the modeling work (Figure 2-4) is less convincing.

      Weaknesses:

      In the experimental data (Figure 1), the authors describe the changes in behavioral preferences over time. While generally plausible, I identify three significant issues with the experiments:

      (1) All of the subsequent theoretical/simulation data is based on changing environments, yet all the experiments are conducted in unchanging environments. While this may suffice to demonstrate the phenomenon of behavioral instability (drift) over time, it does not properly link to the theory-driven work in changing environments. An experiment conducted in a changing environment and its effects on behavioral drift would improve the manuscript's internal consistency and clarify some points related to (3) below.

      We have added further discussion of this to the discussion section.

      (2) The temporal aspect of behavioral instability. While the analysis demonstrates behavioral instability, the temporal dynamics remain unclear. It would be helpful for the authors to clarify (based on graphs and text) whether the behavioral changes occur randomly over time or follow a pattern (e.g., initially more right turns, then more left turns). A proper temporal analysis and clearer explanations are currently missing from the manuscript.

      We have added a figure (1F to better visualize the changes in handedness over days). We have also pointed out the connection between the power spectrum and the autoregressive model given by the Wiener-Khinchen theorem (which states that the autocorrelation function of a wide-sense stationary process has a spectral decomposition of its power spectrum).

      (3) The temporal dimension leads directly into the third issue: distinguishing between drift and learning (e.g., line 56). In the neutral stimuli used in the experimental data, changes should either occur randomly (drift) or purposefully, as in a neutral environment, previous strategies do not yield a favorable outcome. For instance, the animal might initially employ strategy A, but if no improvement in the food situation occurs, it later adopts strategy B (learning). In changing environments, this distinction between drift and learning should be even more pronounced (e.g., if bananas are available, I prefer bananas; once they are gone, I either change my preference or face negative consequences). Alternatively, is my random choice of grapes the substrate for the learning process towards grapes in a changing environment? Further clarification is needed to resolve these potential conflicts.

      We have discussed this further in the discussion.

      Reviewer #2 (Public review):

      Summary:

      This is an inspired study that merges the concept of individuality with evolutionary processes to uncover a new strategy that diversifies individual behavior that is also potentially evolutionarily adaptive.

      The authors use a time-resolved measurement of spontaneous, innate behavior, namely handedness or turn bias in individual, isogenic flies, across several genetic backgrounds.

      They find that an individual's behavior changes over time, or drifts. This has been observed before, but what is interesting here is that by looking at multiple genotypes, the authors find the amount of drift is consistent within genotype i.e., genetically regulated, and thus not entirely stochastic. This is not in line with what is known about innate, spontaneous behaviors. Normally, fluctuations in behavior would be ascribed to a response to environmental noise. However, here, the authors go on to find what is the pattern or rule that determines the rate of change of the behavior over time within individuals. Using modeling of behavior and environment in the context of evolutionarily important timeframes such as lifespan or reproductive age, they could show when drift is favored over bet-hedging and that there is an evolutionary purpose to behavioral drift. Namely, drift diversifies behaviors across individuals of the same genotype within the timescale of lifespan, so that the genotype's chance for expressing beneficial behavior is optimally matched with potential variation of environment experienced prior to reproduction. This ultimately increases the fitness of the genotype. Because they find that behavioral drift is genetically variable, they argue it can also evolve.

      Strengths:

      Unlike most studies of individuality, in this study, the authors consider the impact of individuality on evolution. This is enabled by the use of multiple natural genetic backgrounds and an appropriately large number of individuals to come to the conclusions presented in the study. I thought it was really creative to study how individual behavior evolves over multiple timescales. And indeed this approach yielded interesting and important insight into individuality. Unlike most studies so far, this one highlights that behavioral individuality is not a static property of an individual, but it dynamically changes. Also, placing these findings in the evolutionary context was beneficial. The conclusion that individual drift and bet-hedging are differently favored over different timescales is, I think, a significant and exciting finding.

      Overall, I think this study highlights how little we know about the fundamental, general concepts behind individuality and why behavioral individuality is an important trait. They also show that with simple but elegant behavioral experiments and appropriate modeling, we could uncover fundamental rules underlying the emergence of individual behavior. These rules may not at all be apparent using classical approaches to studying individuality, using individual variation within a single genotype or within a single timeframe.

      Weaknesses:

      I am unconvinced by the claim that serotonin neuron circuits regulate behavioral drift, especially because of its bidirectional effect and lack of relative results for other neuromodulators. Without testing other neuromodulators, it will remain unclear if serotonin intervention increases behavioral noise within individuals, or if any other pharmacological or genetic intervention would do the same. Another issue is that the amount of drugs that the individuals ingested was not tracked. Variable amounts can result in variable changes in behavior that are more consistent with the interpretation of environmental plasticity, rather than behavioral drift. With the current evidence presented, individual behavior may change upon serotonin perturbation, but this does not necessarily mean that it changes or regulates drift.

      However, I think for the scope of this study, finding out whether serotonin regulates drift or not is less important. I understand that today there is a strong push to find molecular and circuit mechanisms of any behavior, and other peers may have asked for such experiments, perhaps even simply out of habit. Fortunately, the main conclusions derived from behavioral data across multiple genetic backgrounds and the modeling are anyway novel, interesting, and in fact more fundamental than showing if it is serotonin that does it or not.

      We have adjusted our wording and contextualized our claims based on previous literature.

      To this point, one thing that was unclear from the methods section is whether genotypes that were tested were raised in replicate vials and how was replication accounted for in the analyses. This is a crucial point - the conclusion that genotypes have different amounts of behavioral drift cannot be drawn without showing that the difference in behavioral drift does not stem from differences in developmental environment.

      We have reanalyzed the behavioral data in a hierarchical model to account for batch effects. Accounting for batch effects (Fig 1G, S1G) we still observe differences between genotypes and for pharmaceutical manipulations of serotonin, though our data provides more equivocal evidence for the effects of trh<sup>n</sup> on drift.

      Reviewer #3 (Public review):

      Summary:

      The paper begins by analyzing the drift in individual behavior over time. Specifically, it quantifies the circling direction of freely walking flies in an arena. The main takeaway from this dataset is that while flies exhibit an individual turning bias (when averaged over time), their preferences fluctuate over slow timescales.

      To understand whether genetic or neuromodulatory mechanisms influence the drift in individual preference, the authors test different fly strains concluding that both genetic background and the neuromodulator serotonin contribute to the degree of drift.

      Finally, the authors use theoretical approaches to identify the range of environmental conditions under which drift in individual bias supports population growth.

      Strengths:

      The model provides a clear prediction of the environmental fluctuations under which a drift in bias should be beneficial for population growth.

      The approach attempts to identify genetic and neurophysiological mechanisms underlying drift in bias.

      Weaknesses:

      Different behavioral assays are used and are differently analysed, with little discussion on how these behaviors and analyses compare to each other.

      We have added text indicating that these two behavioral responses have previously been shown to be correlated to each other and that the spectral power analysis and autoregressive model are conceptually linked.

      Some of the model assumptions should be made more explicit to better understand which aspects of the behaviors are covered.

      We have added a table in the supplemental clarifying all of the parameters of modeling for each figure.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Highlights of the Consultation Session of 3 Reviewers

      In the consultation session, the reviewers discussed as particularly important the relative contribution of genotype and variable environment. Further analyses of the replicates of the genotypes were suggested to exclude the environment as the source of difference in the extent of drift between genotypes. If the difference in the extent of drift between replicates is greater than the difference in the extent of drift between genotypes, then one cannot really say that there is a genetic control over drift and that it would evolve (which is still an interesting result, but would be less exciting for a follow-up evolution experiment). If replicates differ, testing whether the relative difference in the extent of drift between genotypes is maintained across environments would also be strong evidence that the extent of behavioral drift is a property of a genotype and not a mere result of a fluctuating/variable environment. The authors do present two behavior paradigms that can serve the purpose of comparing the relative extent of drift between genotypes across the two paradigms that they already have. The authors might consider whether experimental data could be brought closer to theory by including an experiment in a variable environment (e.g temp or diet changes etc.).

      Reviewers also agreed in the consultation session that methods and definitions were somewhat cryptic, and it would be very helpful if they were more detailed. For example, linking the free walking analysis to the Ymaze and then the model1 to the model2 was not straightforward.

      We have added text to make more explicit the theoretical connection between the freewalking analysis, the ymaze analysis, and the model. We have added text and a supplemental table to clarify the methods.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 161: The authors state in the supplement that they used DGRP strains, which are inbred and not isogenic. According to the original authors, they possess 99.3% genetic identity. The isoD1 strain has no known crossing scheme, so complete chromosome isogeneity remains questionable, especially after 12 or more years since its creation. The authors should refer to the strains as "near-isogenic" or a similar term.

      We have adjusted the language as suggested to be more accurate.

      (2) Lines 276, 338: The manuscript contains some unfinished sentences or remnants from the drafting process (e.g., "REFREF"). A thorough editorial review is recommended to eliminate such errors.

      We have cleaned up all references and made additional passes to adjust text.

      Reviewer #2 (Recommendations for the authors):

      (1) If the authors want to claim that serotonin is a regulator of drift, they should provide a negative control experiment, using equivalent perturbations of another neuromodulator and non-modulator. Alternatively, they could simply soften the claims revolving around serotonin and its putative direct role in modulating drift.

      We have softened the claims as suggested to avoid claiming our results show a specific role for serotonin.

      (2) I would suggest always using "behavioral drift" when referring to drift, especially in the context of modeling, because it can be easily confused with genetic drift and cause confusion when reading.

      We have adjusted the language throughout the manuscript per this suggestion.

      (3) It would be good to see in the methods if the 2-hour assays were always done at the same time of the fly's subjective day and when (e.g. how many hours after lights on).

      We have clarified this.

      (4) I understand that many experiments use methodology replicated from the group's previous work, but I would recommend elaborating the experimental methods section in the supplementary such that the reader can understand and reproduce the methods without having to sift through and look for them in previous papers.

      We have expanded on our discussion of the methodology in the methods section.

      Reviewer #3 (Recommendations for the authors):

      The paper begins by analyzing the drift in individual behavior over time. Specifically, it quantifies the circling direction of freely walking flies in an arena. The main takeaway from this dataset is that flies exhibit an individual turning bias (when averaged over time), yet their preferences fluctuate over slow timescales. However, it's unclear why the authors chose to switch to a different assay to compare strains. In particular, it's ambiguous whether the behavioral measure in one setup is comparable to that in the other; specifically, whether a bias in one setup reflects the same type of bias in the other. The behavior is also sampled differently across setups (though the details are unclear; see comments below) and analyzed using different methods. Consequently, it remains uncertain whether the slow fluctuations observed in the arena setup are also present in the Y maze. It appears that the analysis of the Y maze data only addresses individual behavioral variance or, at most, day-to-day changes, without accounting for longer-term correlations in bias-which I understood to be the primary interest in the arena setup. Some clarification is needed here (see specific comments below).

      In Figure 2, the authors attempt to show the potential advantage of individual drift for survival in unpredictable, fluctuating environments. They demonstrate that while bet-hedging provides an advantage over timescales matching the generation time (since reproduction is required), it offers less benefit on shorter timescales, where an increased individual drift could be advantageous. This approach is well-conceived, and the findings are convincing, though the model would benefit from further clarification and additional explanation in the text.

      Here are some more specific comments:

      PART 1:

      (1) L 223 one probably cannot see a circadian peak at 24h if the data were filtered at 24h, did they look with another low pass cutoff?

      We clarified in the text that the power spectrum analysis was performed on unfiltered data.

      (2) L 243 the spread in standard deviation is said to be consistent with drifting bias, however, I do not agree with this. The variation could be stochastic but independent across days, and show no temporal correlation. As done with the circular arena, a drift should be estimated as a temporal correlation in the behavior.

      It is consistent insofar as seeing a non-zero standard deviation is a necessary condition for drift. While it does not show that there is any consistency over time, this can be inferred from the autoregressive model (as well as previous work). We have added text to make this clearer.

      (3) In the autoregressive model this temporal aspect seems to be incorporated only to the first order (from day to day). Therefore, from what I understand, the drift term is not correlated over time. This seems very different from the spectral analysis done in the circular assay, and I wonder if it fits at all the initial definition of drift. For example, is the model compatible with a fixed mean and a similar power spectrum as in Figure 1C? The text should clarify that.

      can be made clear in the case of σ = 0 and ϕ = 1, where values wouldϕ ≠ be0 In an AR(1) process, datapoints day to day are correlated as long as . This perfectly correlated with each other across time. The AR(1) model and the PSD of circling can be related via the Wiener-Khinchin theorem. We have added text to make this connection clear.

      (4) Did serotonin have no role in turning bias? My understanding of previous work was that serotonin should affect the bet-hedg variance as well - the authors should discuss what is expected or not, especially given that the pharmacological and genetic approaches do not have the same effect on bet-edging (Figure 1H-I).

      As the pharmacological methods were only applied after eclosion, we do not find it surprising that we do not measure differences in the initially measured distribution of handedness in that case. We do see more evidence of it in the mutations, though the trh<sup>n</sup> experiments provide a less clear effect after our adjustments to account for batch effects.

      (5) Methods: It is unclear how flies were handled across days; e.g. in Y mazes: 2h each day for how many days? In the arena flies were imaged either twice daily for 2h per session, or continuously for 24h (L138) - but which data are used where?

      We will make this more clear, but all data in figure 1 was the continuous 24h data

      This part of the methods is not well explained and I think it should be described in more detail.

      (6) How many flies per genotype were tested in fig 1E?

      Information was added to the caption to duplicate information in the table.

      PART 2:

      (7) In Figure 2B I do not understand the formulation N(50−ϕ: 50, σ), N(phi-et: et, σ) or in general N(x: m, s): does this mean that the variable x has normal distribution with mean m and variance s? Usually this would be written as N(x|m, s) or N(x; m, s)

      If so then: N(50−ϕ: 50, σ) = N(ϕ: 0, σ) which has mean=0 while the figure caption says "from a normal distribution centred on the long term environmental mean" - what is the long term environmental mean?

      If this is correct, and, therefore, we are just centering the mean, what about N(et-phi: et, σ)?

      Et is the environment at the time, not the mean of the environment (which is 50). We have added more detail in supplementary methods to address this.

      (8) Should ϕ vary between 1-100? And is the environmental parameter in Figure 2C also varying between 1-100? These ranges should be written somewhere.

      While implied in the sigma notation, we have added more detail in supplementary methods to explain the situation.

      (9) As far as I understand the bounding envelope in Figure 2B is necessary to contain the drift model. In Figure 1F, a bounding effect was generated by the "tendency to revert to no bias." It is unclear to me whether these two formulations are equivalent. Moreover, none of these two models might be able to recapitulate the correlations observed in the circular arena and analyzed spectrally in Figure 1C. It would be necessary that the author make an effort to relate these models/quantifications one to another. My understanding of Figure 1B is that there are slow fluctuations around the mean. Is the bounded drift model in 2B not returning to the same mean? And do these models generate slow fluctuations? Further explanation could help clarify these points.

      We have added additional explanation to explain the connection between the power spectrum and the two methods of (phi and bounding envelop) of establishing stationarity.

      (10) Expanding on the above: I thought that the definition of individuality is based on some degree of stability over days. However, both models assume drift to occur from day to day (and also the analysis of the DGRP lines assumes so). Some clarification here could help: is the initial bet-edging variation maintained in the population? And is the mean individual bias still a thing or it is just drifting away all the time?

      The initial bet-hedging is maintained to some degree, based on the parameter of phi and the bounding envelope. We have added text to make this clearer.

      (11) In both Figures 2C and 2E the populations are always shrinking, is that correct? And if so, is it expected? Does the model allow growth in a constant environment?

      As the plotted values are the log, the optimal environments do allow growth (visible more clearly in 2D). We have added some text to make this clearer.

      (12) Growth is quantified only across 100 days (Figure 2D) but at day 100 there is not something like a steady state, how is 100 chosen? Would it make sense to check longer times to see if the system eventually takes off? And if not, why?

      (13) Related to the above: what is the growth range achieved in Figure 3A-B? Is the heatmap normalized to the same value across conditions? I think it would be important to consider the absolute range of variation of growth or at least the upper value across conditions.

      Moreover: is growth quantified at day 100? What happens at longer times? Does the temporal profile of the growth curve differ across environmental conditions? (I'm referring to a Figure as 2D).

      As we are plotting the log change, we are ultimately showing the growth rate. While a more realistic model would involve carrying capacity, we believe a simplified model showing growth or no growth captures the difference in growth rate between different strategies. We have added some text to make this clearer.

      (14) Suddenly at line 502, sexual maturity is introduced as a parameter, which was never mentioned before, called a_min in the figure legend of panel 3a, but it is unclear where this is in the model. And please also clarify if sex maturity is the same as generation time.

      Sexual maturity is the same as generation time, we have standardized terminology throughout the paper.

      (15) Regarding lines 505-508, could one simply conclude that in this model formulation, the generation time has the effect of a low pass filter on environmental fluctuation? The question is: is this filtering effect the only effect of generation time?

      While this seems to capture the high-frequency effect we see, it does not explain the shift from bet-hedging->drift we see at lower-frequency environmental fluctuations.

      (16) What reproductive rate is used for the PCA analysis? Is the variance associated with the drift so low because of choosing a fast reproductive rate? A comment in the main text would be helpful.

      We have clarified that these plots were done at 10 days.

    1. Author response:

      Thank you for the eLife assessment and the constructive reviews. We appreciate the reviewers’ valuable insights and the time they dedicated to providing such thoughtful feedback on our manuscript. The reviewers highlighted the technical rigor of our study, specifically the tracking of individual neurons across both anesthetized and awake states using two-photon imaging. They also emphasized the importance of our cell-type-specific analysis (excitatory, PV, and SOM neurons) and noted that the study provides solid evidence for isoflurane-induced shifts in preferred spatial frequency (SF).

      Based on our team's evaluation of the reviewers' comments, we would like to outline our planned revisions.

      (1) Expanded Population and Single-Neuron Analysis

      We will re-analyze our dataset to include all neurons that were responsive under anesthesia, in the awake state, or both. This will ensure our findings accurately represent the entire population of visually responsive neurons. We will also provide examples of individual tuning curves to clarify the relationship between tuning shape and SF shifts in individual neurons.

      (2) Addressing Methodological Scope and Behavioral Metrics

      Receptive Field Size and Dynamics: While we did not utilize a stimulus set specifically designed to map receptive field (RF) sizes, we intend to examine how other functional parameters co-varied with the shift in preferred SF within each cell type. Furthermore, although characterizing the precise temporal dynamics during anesthesia onset presents technical challenges, we will attempt to analyze the time-dependence of the observed changes to provide deeper insight into the transition between states.

      Behavioral Metrics: While pupil size is a well-established proxy for brain state, we will explore the inclusion of other available behavioral parameters.

      (3) Cell-type Specificity (SOM, PV, and VIP)

      SOM vs. PV Comparison: We will perform a detailed comparison of preferred SFs between SOM and PV interneurons, including those responsive only under anesthesia or only in the awake state.

      VIP Neurons: While VIP neurons are known to play critical roles in cortical circuits, such as disinhibition, we have decided not to conduct new recordings for VIP interneurons in the present study. Based on existing literature, the proportion of visually responsive VIP cells is too low to yield statistically reliable conclusions for this specific study (de Vries et al., Nature Neuroscience 23, 138-151, 2020). Additionally, we intend to focus our analysis on inhibitory interneuron subtypes that provide direct input to pyramidal cells.

      Histology: We will provide additional histological validation.

      (4) Refined Framing

      As suggested, we will focus the manuscript strictly on isoflurane anesthesia. This includes updating the title and abstract to reflect this specificity and discussing how our results compare with other anesthetics like urethane. Furthermore, we will substantially deepen our discussion on the potential mechanisms by which anesthesia induces a downward shift in preferred spatial frequency.

      We believe these additions will significantly strengthen the manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, Huang et al. revealed the complex regulatory functions and transcription network of 172 unknown transcriptional factors (TFs) in Pseudomonas aeruginosa PAO1. They have built a global TF-DNA binding landscape and elucidated binding preferences and functional roles of these TFs. More specifically, the authors established a hierarchical regulatory network and identified ternary regulatory motifs, and co-association modules. Since P. aeruginosa is a well known pathogen, the authors thus identified key TFs associated with virulence pathways (e.g., quorum sensing [QS], motility, biofilm formation), which could be potential drug targets for future development. The authors also explored the TF conservation and functional evolution through pan-genome and phylogenetic analyses. For the easy searching by other researchers, the authors developed a publicly accessible database (PATF_Net) integrating ChIP-seq and HT-SELEX data.

      Strengths:

      (1) The authors performed ChIP-seq analysis of 172 TFs (nearly half of the 373 predicted TFs in P. aeruginosa) and identified 81,009 significant binding peaks, representing one of the largest TF-DNA interaction studies in the field. Also, The integration of HT-SELEX, pan-genome, and phylogenetic analyses provided multi-dimensional insights into TF conservation and function.

      (2) The authors provided informative analytical Framework for presenting the TFs, where a hierarchical network model based on the "hierarchy index (h)" classified TFs into top, middle, and bottom levels. They identified 13 ternary regulatory motifs and co-association clusters, which deepened our understanding of complex regulatory interactions.

      (3) The PATF_Net database provides TF-target network visualization and data-sharing capabilities, offering practical utility for researchers especially for the P. aeruginosa field.

      Thank you for your positive feedback!

      Weaknesses:

      (1) There is very limited experimental validation for this study. Although 24 virulence-related master regulators (e.g., PA0815 regulating motility, biofilm, and QS) were identified, functional validation (e.g., gene knockout or phenotypic assays) is lacking, leaving some conclusions reliant on bioinformatic predictions. Another approach for validation is checking the mutations of these TFs from clinical strains of P. aeruginosa, where chronically adapted isolates often gain mutations in virulence regulators.

      Thank you for this valuable suggestion. We have performed the EMSA experiment to validate the binding result and also constructed the mutants for further functional validation. The details can be found in Figure S5.

      (2) ChIP-seq in bacteria may suffer from low-abundance TF signals and off-target effects. The functional implications of non-promoter binding peaks (e.g., coding regions) were not discussed.

      Thank you for this insightful comment regarding ChIP-seq data quality and non-promoter binding events. While we acknowledge that completely eliminating all non-specific binding signals is technically challenging in bacterial ChIP-seq experiments, we implemented stringent quality control measures including replicates, negative controls, and FDR cutoffs to minimize false positives.

      Although the coding binding peaks represent a smaller fraction of total binding events, they are functionally significant rather than mere technical artifacts. Our previous work systematically demonstrated that bacterial TFs can bind to coding sequences and regulate gene expression through multiple mechanisms, including modulating cryptic promoter activity and antisense RNA transcription, hindering transcriptional elongation, and influencing translational efficiency[1]. We have now expanded the Discussion section to address these regulatory mechanisms.

      (3) PATF_Net currently supports basic queries but lacks advanced tools (e.g., dynamic network modeling or cross-species comparisons). User experience and accessibility remain underevaluated. But this could be improved in the future.

      Thank you for this constructive feedback on PATF_Net. We acknowledge that more advanced features would further enhance the platform’s utility. To enhance the utility of PA_TFNet, we have implemented two new features: (1) a virulence pathway browser that allows users to explore TF binding across curated gene sets for key virulence pathways (quorum sensing, secretion systems, biofilm, motility, etc.), and (2) a target gene search function that enables rapid identification of all TFs regulating any gene of interest by locus tag query.

      Achievement of Aims and Support for Conclusions

      (1) The authors successfully mapped global P. aeruginosa TF binding sites, constructed hierarchical networks and co-association modules, and identified virulence-related TFs, fulfilling the primary objectives. The database and pan-genome analysis provide foundational resources for future studies.

      (2) The hierarchical model aligns with known virulence mechanisms (e.g., LasR and ExsA at the bottom level directly regulating virulence genes). Co-association findings (e.g., PA2417 and PA2718 co-regulating pqsH) resonate with prior studies, though experimental confirmation of synergy is needed.

      Thank you for your positive feedback! We have added experimental validation in the Results section.

      Impact on the Field and Utility of Data/Methods

      (1) This study fills critical gaps in TF functional annotation in P. aeruginosa, offering new insights into pathogenicity mechanisms (e.g., antibiotic resistance, host adaptation). The hierarchical and co-association frameworks are transferable to other pathogens, advancing comparative studies of bacterial regulatory networks.

      (2) PATF_Net enables rapid exploration of TF-target interactions, accelerating candidate regulator discovery.

      Thank you for your positive feedback!

      Reviewer #3 (Public review):

      Summary:

      The authors utilized ChIP-seq on strains containing tagged transcription factor (TF)-overexpression plasmids to identify binding sites for 172 transcription factors in P. aeruginosa. High-quality binding site data provides a rich resource for understanding regulation in this critical pathogen. These TFs were selected to fill gaps in prior studies measuring TF binding sites in P. aeruginosa. The authors further perform a structured analysis of the resulting transcriptional regulatory network, focusing on regulators of virulence and metabolism, in addition to performing a pangenomic analysis of the TFs. The resulting dataset has been made available through an online database. While the implemented approach to determining functional TF binding sites has limitations, the resulting dataset still has substantial value to P. aeruginosa research.

      Strengths:

      The generated TF binding site database fills an important gap in regulatory data in the key pathogen P. aeruginosa. Key analyses of this dataset presented include an analysis of TF interactions and regulators of virulence and metabolism, which should provide important context for future studies into these processes. The online database containing this data is well organized and easy to access. As a data resource, this work should be of significant value to the infectious disease community.

      Thank you for your positive feedback!

      Weaknesses:

      Drawbacks of the study include 1) challenges interpreting binding site data obtained from TF overexpression due to unknown activity state of the TFs on the measured conditions, 2) limited practical value of the presented TRN topological analysis, and 3) lack of independent experimental validation of the proposed master regulators of virulence and metabolism.

      We thank the reviewer for summarizing these key concerns. We acknowledge the limitations raised regarding TF overexpression, TRN topological analysis interpretation, and experimental validation. We provide detailed point-by-point responses to each of these concerns in our replies to the specific comments below, where we explain our rationale, the measures taken to address these limitations, and our plans for improvement.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Future Directions for the authors to consider for next steps:

      (1) Key TFs (e.g., PA1380, PA5428) should be validated via gene knock out experiments, fluorescent reporter assays, or animal models to confirm roles in virulence pathways.

      Thank you for this important suggestion. We agree that experimental validation is essential to confirm their regulatory roles and biological functions.

      Firstly, we selected a subset of key TFs, including PA0167, PA1380, PA0815, and PA3094, and performed Electrophoretic Mobility Shift Assays (EMSA) experiments to validate their direct binding to target promoters. These results confirmed the ChIP-seq-identified interactions and are now included as Figure S5A-F.

      We also constructed a clean deletion mutant of PA1380 and PA 3094 (ΔPA1380 and ΔPA3094) and their complementary strains (ΔPA1380/p and ΔPA3094/p). We then performed RT-qPCR analysis to validate their regulatory effects on key target genes. We found that PA1380 positively regulate the expression of cupB1 and cupB3 genes (Figure S5F). While the CupB cluster was known not be as important as CupA cluster in the biofilm information, so we did not find significant difference in biofilm formation between WT and ΔPA1380. Additionally, we found TF PA3094 also positively regulate lecA expression, which were shown in Figure S5G.

      We agree that comprehensive functional validation, including animal model studies, would further strengthen the biological significance of these findings. Such experiments are currently underway in our laboratory and will be the subject of follow-up studies.

      We have revised the Results section and Method section to include these validation experiments and their implications. Please see Figure S5 and Lines 283-300.

      “To experimentally validate the regulatory interactions identified by ChIP-seq, we performed biochemical and genetic analyses on selected TFs. First, we conducted Electrophoretic Mobility Shift Assays (EMSA) for four TFs, including PA0167, PA0815, PA1380, and PA3094, using DNA fragments containing their predicted binding sites from target gene promoters. These TFs showed specific binding to their cognate DNA sequences (Figure S5A-D), confirming the direct binding of the ChIP-seq-identified interactions.

      To further validate the functional regulatory roles of these TFs, we constructed clean deletion mutants of PA1380 and PA3094 (ΔPA1380 and ΔPA3094) along with their complemented strains (ΔPA1380/p and ΔPA3094/p). RT-qPCR analysis revealed that PA1380 positively regulates the expression of cupB1 and cupB3 (Figure S5E), two genes within the CupB fimbrial cluster identified as ChIP-seq targets. Similarly, PA3094 was confirmed to positively regulate lecA expression (Figure S5F), which encodes a lectin involved in biofilm formation and host interactions[2]. Expression of these target genes was restored to wild-type (WT) levels in the complemented strains, validating the regulatory relationships predicted by ChIP-seq. These combined biochemical and genetic validations demonstrate the accuracy and biological relevance of our TF binding data.”

      (2) Non-promoter binding events (e.g., coding regions) may regulate RNA stability, warranting integration with translatomics or epigenomics data.

      Thank you for this suggestion. We have now expanded the Discussion section to address this comment. Please see Lines 478-482.

      “Our analysis revealed that TF binding events occur within coding regions, which is consistent with our previous study demonstrating that bacterial TFs possess binding capabilities for coding regions and can regulate transcription through multiple mechanisms [1]. Besides, it may also regulate RNA stability, warranting integration with translatomics or epigenomics data.”

      (3) Incorporate strain-specific TF data (e.g., clinical isolates) and dynamic visualization tools to broaden PATF_Net's applicability.

      Thank you for this constructive suggestion. To enhance the utility of PA_TFNet, we have implemented two new features: (1) a virulence pathway browser that allows users to explore TF binding across curated gene sets for key virulence pathways (quorum sensing, secretion systems, biofilm, motility, etc.), and (2) a target gene search function that enables rapid identification of all TFs regulating any gene of interest by locus tag query. These features are now live on the database and described in the revised manuscript.

      Regarding strain-specific TF data, we agree this would be valuable for understanding regulatory diversity in clinical isolates. However, such an expansion would require ChIP-seq profiling across multiple strains. The current dataset is based on the reference strain PAO1, which serves as the foundation for most P. aeruginosa research and allows direct comparison with existing genomic and functional studies. We have added a statement in the revised manuscript acknowledging this limitation and highlighting strain-specific TF analysis as an important future direction for the field. Please see Lines 372-390.

      “The database offers multiple search modalities to facilitate data exploration: users can perform TF-centric searches to query binding sites, target genes, and regulatory networks for individual TFs, or utilize the target gene search function to identify all TFs that regulate any gene of interest by entering its locus tag. To connect regulatory data with biological function, we have implemented a virulence pathway browser that allows users to explore TF binding patterns across curated gene sets for major P. aeruginosa virulence pathways. Interactive visualization tools, including network graphs and binding profile plots, facilitate intuitive exploration of regulatory relationships. The primary purpose of PATF_Net is to store, search, and mine valuable information on P. aeruginosa TFs for researchers investigating P. aeruginosa infection. The current resource is based on the reference strain PAO1, which serves as the foundation for most P. aeruginosa molecular studies and allows direct integration with existing genomic annotations and functional data. However, P. aeruginosa exhibits substantial genomic diversity across clinical isolates, and strain-specific differences in TF binding patterns may contribute to phenotypic variation in virulence, antibiotic resistance, and host adaptation. Extension of this resource to include strain-specific regulatory maps from diverse clinical isolates would provide valuable insights into the regulatory basis and represents an important direction for future investigation.”

      (4) Phylogenetic analysis highlights TF conservation in bacteria; future work could explore functional homology in other Gram-negative pathogens (e.g., E. coli).

      Thank for this insightful suggestion. Our phylogenetic analysis revealed that P. aeruginosa TFs exhibit varying degrees of conservation across bacterial species, with some showing broad distribution across Gram-negative pathogens while others are lineage-specific.

      We agree that exploring functional homology of orthologous TFs across species would be highly valuable. Such comparative studies could address whether conserved TFs regulate similar target genes and biological processes across species, or whether regulatory networks have been rewired during evolution. For example, comparative ChIP-seq analysis of P. aeruginosa TFs and their orthologs in Klebsiella pneumoniae or even Gram-positive pathogen like Bacillus cereus could reveal conserved regulatory modules governing universal virulence or metabolic strategies versus species-specific adaptations. This represents an important direction for future investigation and would be facilitated by the comprehensive TF binding dataset we provide here. We have expanded the Discussion section to highlight this future direction. Please see Lines 539-550.

      “While our phylogenetic analysis reveals varying degrees of TF conservation across bacterial species, the functional implications of this conservation remain to be fully explored. Many P. aeruginosa TFs have clear orthologs in both Gram-negative (e.g., Klebsiella pneumoniae) and Gram-positive pathogens (e.g., Bacillus cereus), yet whether these orthologs regulate similar target genes and biological processes is largely unknown. Future comparative ChIP-seq profiling of orthologous TFs could reveal the extent to which regulatory network architecture is conserved versus rewired during bacterial evolution, potentially identifying core regulatory modules governing universal bacterial strategies versus species-specific innovations. Such cross-species comparisons would enhance our understanding of regulatory network evolution and enable functional prediction in less well-characterized pathogens based on homology to experimentally validated P. aeruginosa regulators.”

      Reviewer #3 (Recommendations for the authors):

      Major comments

      - Limitations of the ChIP-seq approach: With overexpression plasmids as an approach to TRN elucidation, there are always a set of concerns. First, TF expression is not enough to ensure regulatory activity - metabolite effects must be such that the TF is active which requires growing the cells in activating conditions. Second, the presence of a binding event does not mean that the binding has a regulatory effect - the authors are clearly aware of this as they specify binding sites in promoter regions, which should be helpful, but they also mention the possibility of regulatory binding events in coding regions. These issues should be listed as weaknesses of the approach in the Discussion.

      Thank you for these important suggestions. We agree that these limitations should be explicitly discussed. We have now added a dedicated paragraph in the Discussion section addressing these concerns. Please see Lines 492-501.

      “However, several limitations of the ChIP-seq approach should be acknowledged. Firstly, TF overexpression ensures sufficient protein levels for ChIP-seq signal detection but does not guarantee that all TFs are in their active conformational states, as many bacterial TFs require allosteric activation by metabolites, cofactors, or post-translational modifications. The cells under standard laboratory conditions which may not activate all TFs to their maximal regulatory states, potentially leading to underestimation of condition-specific binding peaks. Secondly, while we observed TF binding at thousands of genomic sites, binding per se does not equate to functional regulation, as chromatin context, cofactor availability, and competitive binding all influence regulatory outcomes.”

      - Lack of independent validation: The study seems to lack substantial independent validation of either the functional nature of the binding sites as well as the proposed physiological regulatory role of the TFs. For example, for the 103 identified TF motifs, do any of these agree with existing motifs in motif databases that may be homologous to P. aeruginosa TFs? The authors claim to have discovered master regulators of virulence and associated core regulatory clusters - but there does not seem to be any independent validation of the proposed associations. The authors selected the TF targets to cover TFs that had not yet been characterized; however, it would have been nice to have some overlap with previous studies so that consistency and data quality could be assessed.

      Thank you for raising these critical points about validation.

      As for motif validation, we compared the existing motifs in the RegPrecise database[3] and we found that the motif of PA3587 show significant similarity to homologous TFs in Pseudomonadaceae. We have added the related description in the Results section. Please see Figure S3B and Lines 228-231.

      As for the validation of master regulators, we have performed EMSA experiments for validating the binding events and constructed the mutants for function validation. We have added the related contents in Results section. Please see Figure S5 and Lines 283-300.

      We have discussed the overlap between our results and previous studies in the Discussion section. Please see Lines 530-538.

      “PA0797 is known to regulate the pqs system and pyocyanin production[4]. In the present study, it was also found to bind to the pqsH promoter region and its motif was visualised. PA5428 was found to bind to the promoter regions of aceA and glcB genes[5], which was also demonstrated in our ChIP-seq results. PA4381 (CloR) was found to be associated with polymyxin resistance in a previous study[6] and to be possibly related to ROS resistance in the present study. Furthermore, PA5032 plays a putative role in biofilm regulation and also forms an operon with PA5033, an HP associated with biofilm formation[7].”

      - Uncertain value of TRN topology analysis: The relationship between ternary motifs and pathogenicity of P. aeruginosa, and why the authors argue these results motivated TF-targeting drugs (the topic of the last paragraph of the Discussion), are unclear to me. The authors allude to possible connections between pathogenicity, growth, and drug resistance, but I don't see concrete examples here of related TF interactions that clearly represent these relationships. The sections "Hierarchical networks of TFs based on pairwise interactions" and "Ternary regulatory motifs show flexible relationships among TFs in P. aeruginosa" seem to not say much in terms of results that are actionable or possible to validate. A topological graph is constructed based on observed TF-TF connections in measured binding sites - however, it's unclear if any of these connections are physiologically meaningful. Line 178 - Why would there be any connection between the structural family of TF and its location in the proposed TRN hierarchy?

      Thank you for this valuable comment on TRN topology analysis. It is hard to quantify precisely how much this resource will accelerate P. aeruginosa research or drug development, but we believe providing this foundational network architecture has inherent value for the community, which is valued for enabling hypothesis generation even before comprehensive functional validation. We would like to clarify our perspective on these findings and have added the discussion in the revised manuscript to better describe their nature and value. Please see Lines 517-528.

      “Additionally, although the TRN analysis revealed organizational patterns in P. aeruginosa regulatory network, the functional significance these topological features, including their specific contributions to pathogenicity, metabolic adaptation, and antibiotic resistance remains to be experimentally determined in the future work. The hierarchical structure and regulatory motifs we identified represent objective network properties derived from our binding data, but translating these structural observations into mechanistic understanding will require condition-specific functional studies, genetic validation, and phenotypic characterization. Our analysis provided a systematic framework and generating testable hypotheses rather than definitive functional conclusions. Nevertheless, these network-level organizational principles provided value to the community as a foundational reference, similar to other regulatory network maps[8] that were useful even before comprehensive validation.”

      - Identification of "master" regulators: Line 527 on virulence regulators: "We first generated gene lists associated with nine pathways" - is this not somewhat circular, i.e. using gene lists generated from (I assume) co-regulated gene sets to identify regulators of those gene lists? I can't tell from the cited reference (80), which is their own prior review article, what the original source of these gene lists was. Somewhat related to this point - Line 32: 24 "master regulators" - if there are that many, is it still considered a master regulator? Line 270: This term "master regulator" would seem to require some quantitative justification. Identifying 24 (a large number of) "master" regulators of virulence would seem to dilute the implied power of the term.

      We apologize for the lack of clarity regarding the virulence pathway gene lists, and we have provided complete gene lists for virulence-related pathways, which were compiled from functional annotations, in our online PA_TFNet database.

      Additionally, we appreciate your concern about the use of “master” regulator. The usage is based on previous studies[9,10], and the master regulator is commonly known in the development of multicellular organisms as a subset of TFs that control the expression of multiple downstream genes and govern lineage commitment or key biological processes. We employed the term "master regulator" in an analogous manner to specify a class of functionally crucial TFs that participate in a pathway or biological event by regulating multiple downstream genes statistically enriched in that pathway. In line with this definition, we identified TFs whose targets were significantly enriched in genes associated with specific virulence pathways (hypergeometric test, P < 0.05).

      We understand the concern that identifying 24 master regulators might seem to dilute the term. However, we would like to clarify that each of these 24 TFs is a "master regulator" with respect to specific virulence pathways based on statistical criteria, not necessarily a global master regulator of multiple pathways of P. aeruginosa. We have revised the Method section. Please see Lines 604-612.

      - Line 234: "Genome-wide synergistic co-association of TFs in P. aeruginosa." This section was an interesting analysis. As I mention above, the weakness of an overexpression approach is not knowing whether the TF is active on the examined conditions. By looking at shared binding peaks across overexpression of different TFs, it should indeed be possible to glean some regulatory connections across TFs. Furthermore, the authors discuss specific examples that appear physiologically reasonable, which is appreciated.

      We thank the reviewer for this positive assessment of our co-association analysis. We agree with the limitation of the overexpression approach, which have been discussed in the Discussion section. We are pleased that the reviewer found the approach and specific examples valuable.

      Minor comments

      - Line 35 - "high-throughput systematic evolution of ligands by exponential enrichment" - no idea what this means. Is this related to the web-based database, or why is it mentioned in the same sentence?

      We apologize for the unclear presentation. To clarify: “High-throughput systematic evolution of ligands by exponential enrichment” (HT-SELEX) is an in vitro technique for determining TF DNA-binding motifs, which our group previously applied to a subset of P. aeruginosa TFs in a prior publication[11]. In the current study, we performed ChIP-seq for 172 TFs, which represent the majority of TFs not covered by the previous HT-SELEX study. Together, these two complementary approaches (HT-SELEX for in vitro binding motifs, ChIP-seq for in vivo genomic binding sites) provide near-complete coverage of the P. aeruginosa TF repertoire. Both datasets are integrated into our PA_TFNet database.

      Due to space constraints in the abstract, we could not provide detailed explanation of HT-SELEX, but we have now improved the clarity in the Introduction to better explain the relationship between our previous HT-SELEX work and the current ChIP-seq study, and why both are mentioned together in the context of the database. Please see Lines 99-105.

      - Line 193 - Only 9 auto-regulating TFs seems like a low number, given the frequency of negative auto-regulation in other organisms like E. coli. Could the authors comment on their expectations based on well-curated TRNs?

      Thank you for this comment. We agree that 9 auto-regulating TFs is lower than might be expected based on E. coli, where auto-regulation is more prevalent. This likely reflects technical limitations of ChIP-seq approach that our detection was limited to standard growth conditions rather than the diverse physiological states where auto-regulation often occurs. Therefore, the 9 TFs we report represent a high-confidence subset, and the true frequency of auto-regulation in P. aeruginosa likely is higher. We added the content in the revised manuscript. Please see Lines 193-196.

      “This number likely represents a conservative estimate, as experiments may not optimally capture auto-regulatory events that depend on native expression levels or specific physiological conditions.”

      - Line 230 - "This conservation suggests that TFs within the same cluster co-regulate similar sets of genes." - Why would clustering of TF binding site motifs need to be done to make this assessment? Couldn't the shared set of regulated genes be identified directly from the binding site data? Computing TF binding site motifs has obvious value, but I am struggling to understand the point of clustering the motifs. Is there some implied evolutionary or physiological connection here? No specific physiological roles or hypotheses are discussed in this section.

      Thank you for this important question. We agree that shared target genes can be identified directly from ChIP-seq binding data, which we also analyzed (co-association analysis). The motif clustering analysis serves a complementary and distinct purpose that provides information not directly obtainable from overlapped targets alone. Specifically, target overlap is inherently condition dependent, and motif clustering captures this intrinsic binding specificity, which reflects the structural similarity of DBDs, evolutionary relationships, and potential for functional redundancy or cooperativity under specific conditions. We have revised the related content in the manuscript, and please see Lines 236-242.

      “Clustering of TF binding motifs identified groups of TFs with similar intrinsic DNA-binding specificities. As expected, many clusters contained TFs from the same DBD families, reflecting evolutionary conservation and potential functional redundancy or competitive binding at shared regulatory elements. Notably, the clustering also uncovered associations between TFs from different DBD families, suggesting convergent evolution of binding specificity or novel regulatory interactions that warrant further investigation.”

      - Line 284 - should "metabolomic" be "metabolic"? I didn't see metabolomic data

      Yes, we have revised. Please see Line 311.

      - Several of the figures are too small (e.g. Fig S4A) or complex (Fig 2A) to see clearly or glean information from.

      Thank you for this comment. We acknowledge that Figure 2A and Figure S4A contain dense information due to the comprehensive nature of the regulatory network and the large number of TFs analyzed. We believe these overview figures serve an important purpose in conveying the scale and organization of the regulatory network, while the tables (Table S6 for Fig. S4A and Table S3 for Fig. 2A) provide the granular data needed for specific inquiries. We have also made the figures available in higher resolution and increased font sizes where possible without compromising the overall layout.

      - I don't understand the organization of the "Ternary regulatory motifs" in Supplementary Data File 4 - A table of contents explaining the tabs and columns would be welcome (for this as well as other supplementary files, some of which are more straightforward than others).

      Thank you for this suggestion. We have now revised all supplementary data files to include header and necessary annotations in the first row. Specifically for Supplementary Data File 4, the three columns (Top, Middle, Bottom) represent the left, middle, and right node, respectively, in each ternary regulatory motif.

      - I would have expected genomic locations of TF binding sites would have been one of the Supplementary Tables, to increase the accessibility of the data. However, the data is made available through their website, https://jiadhuang0417.shinyapps.io/PATF_Net/, which was easy to access and download the full dataset, so this is a minor issue.

      Thank for accessing our PA_TFNet database and for the positive feedback on data accessibility. We agree that providing genomic locations of TF binding sites is crucial. These data are fully available and downloadable through the web interface, which allows flexible searching, filtering, and batch download of binding sites. We felt that the interactive and database format provides more functionality than static supplementary tables (e.g., dynamic filtering by TF, genomic region, or binding strength), given the large scale of this dataset.

      References

      (1) Hua, C., Huang, J., Wang, T., Sun, Y., Liu, J., Huang, L. et al. Bacterial Transcription Factors Bind to Coding Regions and Regulate Internal Cryptic Promoters. Mbio 13, e0164322 (2022).

      (2) Chemani, C., Imberty, A., de Bentzmann, S., Pierre, M., Wimmerová, M., Guery, B. P. et al. Role of LecA and LecB lectins in Pseudomonas aeruginosa-induced lung injury and effect of carbohydrate ligands. Infect Immun 77, 2065-2075 (2009).

      (3) Novichkov, P. S., Kazakov, A. E., Ravcheev, D. A., Leyn, S. A., Kovaleva, G. Y., Sutormin, R. A. et al. RegPrecise 3.0–a resource for genome-scale exploration of transcriptional regulation in bacteria. Bmc Genomics 14, 745 (2013).

      (4) Cui, G. Y., Zhang, Y. X., Xu, X. J., Liu, Y. Y., Li, Z., Wu, M. et al. PmiR senses 2-methylisocitrate levels to regulate bacterial virulence in Pseudomonas aeruginosa. Sci Adv 8 (2022).

      (5) Hwang, W., Yong, J. H., Min, K. B., Lee, K.-M., Pascoe, B., Sheppard, S. K. et al. Genome-wide association study of signature genetic alterations among pseudomonas aeruginosa cystic fibrosis isolates. Plos Pathog 17, e1009681 (2021).

      (6) Gutu, A. D., Sgambati, N., Strasbourger, P., Brannon, M. K., Jacobs, M. A., Haugen, E. et al. Polymyxin resistance of Pseudomonas aeruginosa phoQ mutants is dependent on additional two-component regulatory systems. Antimicrob Agents Chemother 57, 2204-2215 (2013).

      (7) Zhang, L., Fritsch, M., Hammond, L., Landreville, R., Slatculescu, C., Colavita, A. et al. Identification of genes involved in Pseudomonas aeruginosa biofilm-specific resistance to antibiotics. PLoS One 8, e61625 (2013).

      (8) Galan-Vasquez, E., Luna, B. & Martinez-Antonio, A. The Regulatory Network of Pseudomonas aeruginosa. Microb Inform Exp 1, 3 (2011).

      (9) Fan, L. G., Wang, T. T., Hua, C. F., Sun, W. J., Li, X. Y., Grunwald, L. et al. A compendium of DNA-binding specificities of transcription factors in Pseudomonas syringae. Nat Commun 11 (2020).

      (10) Chan, S. S.-K. & Kyba, M. What is a master regulator? Journal of stem cell research & therapy 3, 114 (2013).

      (11) Wang, T. T., Sun, W. J., Fan, L. G., Hua, C. F., Wu, N., Fan, S. R. et al. An atlas of the binding specificities of transcription factors in Pseudomonas aeruginosa directs prediction of novel regulators in virulence. Elife 10 (2021).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      There are a few remaining issues:

      (1) The manuscript quantifies changes over learning in prefrontal goal-selective cells (equated to "splitter" place cells in hippocampus) and task-phase selective cells (similar to non-splitter place cells that are not goal modulated). A subset of these task cells remain stable throughout learning, and are equated to schema representations in the study. In the memory literature, schemas are generally described as relational networks of abstract and generalized information, that enable adapting to novel context and inference by enabling retrieval of related information from previous contexts. The task-phase selective cells that stay stable throughout learning clearly will have a role in organizing task representations, but to this reviewer, denoting them as forming a schema is an unwarranted interpretation. By this definition, hippocampal non-splitter place cells that emerge early in learning and are stable over days would also form a schema. Therefore, schema notation cannot just be based on stability, it requires further evidence of abstraction such as cross-condition generalization.

      We agree with the reviewer that task phase selective cells (“non-splitter cells”) alone do not fulfill the “relationality” criterion of schemas. We found only few of them, and so we cannot really say something about how they covary. We, however, would like to stress that our finding that task phase selective cells have stable firing field comparing learned (task) and habituation (no-task) conditions can be considered as “cross-condition generalization.” We have further specified our discussion of schemas with a particular emphasis on a potential interpretation of the generalizing task phase cells as “potential building blocks of schemas.”

      (2) The quantification of prefrontal replay sequences during reward is useful, but it is still unconvincing that the distinction between existence of sequences in the odor sampling phase and reward phase is not trivially expected based on prior literature. This is odor guided task, not a spatial exploration task with no cues, and it is very well-established (as noted in citations in the previous review) that during odor sampling, animals' will sniff in an exploratory stage, resulting in strong beta and respiratory rhythms in prefrontal cortex. Not having LFP recordings in this task does not preclude considering prior literature that clearly shows that odor sampling results in a unique internal state network state, when animals are retrieving the odor-associated goal, vastly different from a reward sampling phase. The authors argue that this is not trivial since they see some sequences during sampling, although they also argue the opposite in response to a question from Reviewer 2 about shuffling controls for sequences, that 'not' seeing these sequences in the sampling phase is an internal control. The bigger issue here is equating these sequences during sampling to replay/ preplay or reactivation sequences similar to the reward phase, since the prefrontal network dynamics are engaged in odor-driven retrieval of associated goals during sampling, as has been shown in previous studies.

      We agree with the reviewer that sampling and reward phase represent two very different behavioral states. Nevertheless, correlations on short time scales could be similar, which we show is not the case and therefore we do not consider this result trivial. Regarding the interpretation of sequences, we apologize that we have not been sufficiently clear on distinguishing replay with pure sequences. While we find such sequences in the sampling phase (indicative of fast temporal correlation structure beyond cofiring quantified in Figure 3) they are NOT pre/replaying any task related information. Otherwise, our results are fully in line with previous literature on oscillations that we have included in the previous round of revisions. We added a similar explanation at multiple instances in the Results and Discussion section.

      Reviewer #2 (Public review):

      Comments on revisions:

      Further changes are needed to improve the description of the methods and the discussion needs to be extended to contrast the results with previously published results of the group. Some control figures would also be needed to quantitatively demonstrate, across the entire dataset, that sequence detection did not identify random events as sequences, even if the detection method was designed to exclude such sequences. For example, showing that sequences are not detected in randomised data with the current method would better convince readers of the method's validity.

      We have added control quantifications from time randomized sequences which produce a much lower amount of detected sequences. See response below.

      Although differences in the classification scheme relative to the Muysers et al. (2025) paper have been explained, the similarity (perhaps equivalence of results) is not sufficiently acknowledged - e.g., at the beginning of the discussion.

      We have added a paragraph at the beginning of the Discussion on how our results align with the Muysers et al. 2025 paper.

      Although the control of spurious sequences may have been built into the method, this is not sufficiently explained in the method. It is also not clear what kind of randomization was performed. Importantly, I do not see a quantification that shows that the detected sequences are significantly better than the sequence quality measure on randomized events. Or that randomized data do not lead to sequence clusters.

      In response to this question, we have added the requested shuffling control (Supplement 1B to Figure 4). In the shuffled data the amount of detected recurring sequence clusters is only about half of those in the original data. The amount of bursts assigned to clusters in the shuffled data only remains 46% of the originally assigned bursts on average, clearly indicating that the detected sequences in the non-randomized data cannot be explained without assuming stable temporal order.

      Some clusters, however, are still detected in randomized data, which, however, is expected if participation of cells is heterogeneous with some highly active cells occurring in more than half of the bursts. Then random sequences spuriously occur above chance level representing the clusters of random order of few highly active cells. In line with this interpretation, we see that

      (1) Bursts that were removed after shuffling have exactly 0 high-firing cells

      (2) Clusters derived from shuffled sequence have a less sparse contribution of high firing cells, i.e., high firing cells contribute to significantly more clusters in randomized data than in nonrandomized data.

      The difference in the distribution of high firing cells further indicates that sequences obtained with and without randomization are of different quality.

      The spurious (false positive) clusters detected after randomization nevertheless may have a physiological meaning as they identify rate coactivation patterns that were also picked up by analysis in Figure 3.

      Also, it is still not clear how the number of clusters was established. I understand that the previously published paper may have covered these questions; these should be explained here as well.

      The Methods sections states “The [cluster merging] procedure was repeated until no pair [of clusters] satisfied the merging criterion.”

      Also, the sequence similarity description is still confusing in the method; please correct this sentence "Only the l neurons active in both sequences of a pair were taken into account."

      We do not see what is wrong with this sentence. To avoid confusion.” we have replaced lower case l with upper case L as sequence length.

      Reviewer #3 (Public review):

      One comment is that the threshold for extracting burst events (0.5 standard deviations, presumably above the mean) seems lower than what one usually sees as a threshold for population burst detection, and the authors show (in Supplementary Fig 1) that this means bursts cover ~20-40% of the data. However, it is potentially a strength of this work that their results are found by using this more permissive threshold.

      We have added further specifications following the Reviewer’s suggestion and now mention that the threshold is permissive and “capturing large amount cofiring structure.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Most importantly, in accordance with questions raised by Reviewer 1, we now include a detailed comparison of the cell type frequencies between the two examined time points as well as comparison of the pseudotimes along those lineages. This is detailed in the new section “Many cell types are shared between day 8 and day 16 EBs” and illustrated in Supplementary Figure 6c and Supplementary Figures 7-8.

      Besides this new chapter and its accompanying methods part, we mainly edited the language and to clarify methods and assumptions according to the Reviewer suggestions.

      The main concern of Reviewer 2 was our use of the liftoff gene annotation. We explained our reasoning for this choice extensively in our public response to the Reviewer, but did not incorporate this into our manuscript because even though this is an important subject it is not within the main scope of our paper.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Jocher, Janssen, et al examine the robustness of comparative functional genomics studies in primates that make use of induced pluripotent stem cell-derived cells. Comparative studies in primates, especially amongst the great apes, are generally hindered by the very limited availability of samples, and iPSCs, which can be maintained in the laboratory indefinitely and defined into other cell types, have emerged as promising model systems because they allow the generation of data from tissues and cells that would otherwise be unobservable.

      Undirected differentiation of iPSCs into many cell types at once, using a method known as embryoid body differentiation, requires researchers to manually assign all cell types in the dataset so they can be correctly analysed. Typically, this is done using marker genes associated with a specific cell type. These are defined a priori, and have historically tended to be characterised in mice and humans and then employed to annotate other species. Jocher, Janssen, et al ask if the marker genes and features used to define a given cell type in one species are suitable for use in a second species, and then quantify the degree of usefulness of these markers. They find that genes that are informative and cell type specific in a given species are less valuable for cell type identification in other species, and that this value, or transferability, drops off as the evolutionary distance between species increases.

      This paper will help guide future comparative studies of gene expression in primates (and more broadly) as well as add to the growing literature on the broader challenges of selecting powerful and reliable marker genes for use in single-cell transcriptomics.

      Strengths:

      Marker gene selection and cell type annotation is a challenging problem in scRNA studies, and successful classification of cells often requires manual expert input. This can be hard to reproduce across studies, as, despite general agreement on the identity of many cell types, different methods for identifying marker genes will return different sets of genes. The rise of comparative functional genomics complicates this even further, as a robust marker gene in one species need not always be as useful in a different taxon. The finding that so many marker genes have poor transferability is striking, and by interrogating the assumption of transferability in a thorough and systematic fashion, this paper reminds us of the importance of systematically validating analytical choices. The focus on identifying how transferability varies across different types of marker genes (especially when comparing TFs to lncRNAs), and on exploring different methods to identify marker genes, also suggests additional criteria by which future researchers could select robust marker genes in their own data.

      The paper is built on a substantial amount of clearly reported and thoroughly considered data, including EBs and cells from four different primate species - humans, orangutans, and two macaque species. The authors go to great lengths to ensure the EBs are as comparable as possible across species, and take similar care with their computational analyses, always erring on the side of drawing conservative conclusions that are robustly supported by their data over more tenuously supported ones that could be impacted by data processing artefacts such as differences in mappability, etc. For example, I like the approach of using liftoff to robustly identify genes in non-human species that can be mapped to and compared across species confidently, rather than relying on the likely incomplete annotation of the non-human primate genomes. The authors also provide an interactive data visualisation website that allows users to explore the dataset in depth, examine expression patterns of their own favourite marker genes and perform the same kinds of analyses on their own data if desired, facilitating consistency between comparative primate studies.

      We thank the Reviewer for their kind assessment of our work.

      Weaknesses and recommendations:

      (1) Embryoid body generation is known to be highly variable from one replicate to the next for both technical and biological reasons, and the authors do their best to account for this, both by their testing of different ways of generating EBs, and by including multiple technical replicates/clones per species. However, there is still some variability that could be worth exploring in more depth. For example, the orangutan seems to have differentiated preferentially towards cardiac mesoderm whereas the other species seemed to prefer ectoderm fates, as shown in Figure 2C. Likewise, Supplementary Figure 2C suggests a significant unbalance in the contributions across replicates within a species, which is not surprising given the nature of EBs, while Supplementary Figure 6 suggests that despite including three different clones from a single rhesus macaque, most of the data came from a single clone. The manuscript would be strengthened by a more thorough exploration of the intra-species patterns of variability, especially for the taxa with multiple biological replicates, and how they impact the number of cell types detected across taxa, etc.

      You are absolutely correct in pointing out that the large clonal variability in cell type composition is a challenge for our analysis. We also noted the odd behavior of the orangutan EBs, and their underrepresentation of ectoderm. There are many possible sources for these variable differentiation propensities: clone, sample origin (in this case urine) and individual. However, unfortunately for the orangutan, we have only one individual and one sample origin and thus cannot say whether this germ layer preference says something about the species or is due to our specific sample. Because of this high variability from multiple sources, getting enough cell types with an appreciable overlap between species was limiting to analyses. In order to be able to derive meaningful conclusions from intra-species analyses and the impact of different sources of variation on cell type propensity, we would need to sequence many more EBs with an experimental design that balances possible sources of variation. This would go beyond the scope of this study.

      Instead, here we control for intra-species variation in our analyses as much as possible: For the analysis of cell type specificity and conservation the comparison is relative for the different specificity degrees (Figure 3C). For the analysis of marker gene conservation, we explicitly take intra-species variation into account (Figure 4D).

      The same holds for the temporal aspect of the data, which is not really discussed in depth despite being a strength of the design. Instead, days 8 and 16 are analysed jointly, without much attention being paid to the possible differences between them.

      Concerning the temporal aspect, indeed we knowingly omitted to include an explicit comparison of day 8 and day 16 EBs, because we felt that it was not directly relevant to our main message. Our pseudotime analysis showed that the differences of the two time points were indeed a matter of degree and not so much of quality. All major lineages were already present at day 8 and even though day 8 cells had on average earlier pseudotimes, there was a large overlap in the pseudotime distributions between the two sampling time points (Author response image 1). That is why we decided to analyse the data together.

      Are EBs at day 16 more variable between species than at day 8? Is day 8 too soon to do these kinds of analyses?

      When we started the experiment, we simply did not know what to expect. We were worried that cell types at day 8 might be too transient, but longer culture can also introduce biases. That is why we wanted to look at two time points, however as mentioned above the differences are in degree.

      Concerning the cell type composition: yes, day 16 EBs are more heterogeneous than day 8 EBs. Firstly, older EBs have more distinguishable cell types and hence even if all EBs had identical composition, the sampling variance would be higher given that we sampled a similar number of cells from both time points. Secondly, in order to grow EBs for a longer time, we moved them from floating to attached culture on day 8 and it is unclear how much variance is added by this extra handling step.

      Are markers for earlier developmental progenitors better/more transferable than those for more derived cell types?

      We did not see any differences in the marker conservation between early and late cell types, but we have too little data to say whether this carries biological meaning.

      Author response image 1.

      Pseudotime analysis for a differentiation trajectory towards neurons. Single cells were first aggregated into metacells per species using SEACells (Persad et al. 2023). Pluripotent and ectoderm metacells were then integrated across all four species using Harmony and a combined pseudotime was inferred with Slingshot (Street et al. 2018), specifying iPSCs as the starting cluster. Here, lineage 3 is shown, illustrating a differentiation towards neurons. (A) PHATE embedding colored by pseudotime (Moon et al. 2019). (B) PHATE embedding colored by celltype. (C) Pseudotime distribution across the sampling timepoints (day 8 and day 16) in different species.

      (2) Closely tied to the point above, by necessity the authors collapse their data into seven fairly coarse cell types and then examine the performance of canonical marker genes (as well as those discovered de novo) across the species. However some of the clusters they use are somewhat broad, and so it is worth asking whether the lack of specificity exhibited by some marker genes and driving their conclusions is driven by inter-species heterogeneity within a given cluster.

      Author response image 2.

      UMAP visualization for the Harmony-integrated dataset across all four species for the seven shared cell types, colored by cell type identity (A) and species (B).

      Good point, if we understand correctly, the concern is that in our relatively broadly defined cell types, species are not well mixed and that this in turn is partly responsible for marker gene divergence. This problem is indeed difficult to address, because most approaches to evaluate this require integration across species which might lead to questionable results (see our Discussion).

      Nevertheless, we attempted an integration across all four species. To this end, we subset the cells for the 7 cell types that we found in all four species and visualized cell types and species in the UMAPs above (Author response image 2).

      We see that cardiac fibroblasts appear poorly integrated in the UMAP, but they still have very transferable marker genes across species. We quantified integration quality using the cell-specific mixing score (cms) (Lütge et al. 2021) and indeed found that the proportion of well integrated cells is lowest for cardiac fibroblasts (Author response image 3A). On the other end of the cms spectrum, neural crest cells appear to have the best integration across species, but their marker transferability between species is rather worse than for cardiac fibroblasts (Supplementary Figure 9). Cell-type wise calculated rank-biased overlap scores that we use for marker gene conservation show the same trends (Author response image 3B) as the F1 scores for marker gene transferability. Hence, given our current dataset we do not see any indication that the low marker gene conservation is a result of too broadly defined cell types.

      Author response image 3.

      (A) Evaluation of species mixing per cell type in the Harmony-integrated dataset, quantified by the fraction of cells with an adjusted cell-specific mixing score (cms) above 0.05. (B) Summary of rank-biased overlap (RBO) scores per cell type to assess concordance of marker gene rankings for all species pairs.

      Reviewer #2 (Public review):

      Summary:

      The authors present an important study on identifying and comparing orthologous cell types across multiple species. This manuscript focuses on characterizing cell types in embryoid bodies (EBs) derived from induced pluripotent stem cells (iPSCs) of four primate species, humans, orangutans, cynomolgus macaques, and rhesus macaques, providing valuable insights into cross-species comparisons.

      Strengths:

      To achieve this, the authors developed a semi-automated computational pipeline that integrates classification and marker-based cluster annotation to identify orthologous cell types across primates. This study makes a significant contribution to the field by advancing cross-species cell type identification.

      We thank the reviewer for their positive and thoughtful feedback.

      Weaknesses:

      However, several critical points need to be addressed.

      (1) Use of Liftoff for GTF Annotation

      The authors used Liftoff to generate GTF files for Pongo abelii, Macaca fascicularis, and Macaca mulatta by transferring the hg38 annotation to the corresponding primate genomes. However, it is unclear why they did not use species-specific GTF files, as all these genomes have existing annotations. Why did the authors choose not to follow this approach?

      As Reviewer 1 also points out, also we have observed that the annotation of non-human primates often has truncated 3’UTRs. This is especially problematic for 3’ UMI transcriptome data as the ones in the 10x dataset that we present here. To illustrate this we compared the Liftoff annotation derived from Gencode v32, that we also used throughout our manuscript to the Ensembl gene annotation Macaca_fascicularis_6.0.111. We used transcriptomes from human and cynomolgus iPSC bulk RNAseq (Kliesmete et al. 2024) using the Prime-seq protocol (Janjic et al. 2022) which is very similar to 10x in that it also uses 3’ UMIs. On average using Liftoff produces higher counts than the Ensembl annotation (Author response image 4A). Moreover, when comparing across species, using Ensembl for the macaque leads to an asymmetry in differentially expressed genes, with apparently many more up-regulated genes in humans. In contrast, when we use the Liftoff annotation, we detect fewer DE-genes and a similar number of genes is up-regulated in macaques as in humans (Author response image 4B). We think that the many more DE-genes are artifacts due to mismatched annotation in human and cynomolgus macaques. We illustrate this for the case of the transcription factor SALL4 in Author response image 4C, D. The Ensembl annotation reports 2 transcripts, while Liftoff from Gencode v32 suggests 5 transcripts, one of which has a longer 3’UTR. This longer transcript is also supported by Nanopore data from macaque iPSCs. The truncation of the 3’UTR in this case leads to underestimation of the expression of SALL4 in macaques and hence SALL4 is detected as up-regulated in humans (DESeq2: LFC= 1.34, p-adj<2e-9). In contrast, when using the Liftoff annotation SALL4 does not appear to be DE between humans and macaques (LFC=0.33, p.adj=0.20).

      Author response image 4.

      (A) UMI-counts/ gene for the same cynomolgus macaque iPSC samples. On the x-axis the gtf file from Ensembl Macaca_fascicularis_6.0.111 was used to count and on the y-axis we used our filtered Liftoff annotation that transferred the human gene models from Gencode v32. (B) The # of DE-genes between human and cynomolgus iPSCs detected with DESeq2. In Liftoff, we counted human samples using Gencode v32 and compared it to the Liftoff annotation of the same human gene models to macFas6. In Ensembl, we use Gencode v32 for the human and Ensembl Macaca_fascicularis_6.0.111 for the Macaque. For both comparisons we subset the genes to only contain one-to-one orthologs as annotated in biomart. Up and down regulation is relative to human expression. C) Read counts for one example gene SALL4. Here we used in addition to the Liftoff and Ensembl annotation also transcripts derived from Nanopore cDNA sequencing of cynomolgus iPSCs. D) Gene models for SALL4 in the space of MacFas6 and a coverage for iPSC-Prime-seq bulk RNA-sequencing.

      (2) Transcript Filtering and Potential Biases

      The authors excluded transcripts with partial mapping (<50%), low sequence identity (<50%), or excessive length differences (>100 bp and >2× length ratio). Such filtering may introduce biases in read alignment. Did the authors evaluate the impact of these filtering choices on alignment rates?

      We excluded those transcripts from analysis in both species, because they present a convolution of sequence-annotation differences and expression. The focus in our study is on regulatory evolution and we knowingly omit marker differences that are due to a marker being mutated away, we will make this clearer in the text of a revised version.

      (3) Data Integration with Harmony

      The methods section does not specify the parameters used for data integration with Harmony. Including these details would clarify how cross-species integration was performed.

      We want to stress that none of our conservation and marker gene analyses relies on cross-species integration. We only used the Harmony integrated data for visualisation in Figure 1 and the rough germ-layer check up in Supplementary Figure S3. We will add a better description in the revised version.

      Reference

      Janjic, Aleksandar, Lucas E. Wange, Johannes W. Bagnoli, Johanna Geuder, Phong Nguyen, Daniel Richter, Beate Vieth, et al. 2022. “Prime-Seq, Efficient and Powerful Bulk RNA Sequencing.” Genome Biology 23 (1): 88.

      Kliesmete, Zane, Peter Orchard, Victor Yan Kin Lee, Johanna Geuder, Simon M. Krauß, Mari Ohnuki, Jessica Jocher, Beate Vieth, Wolfgang Enard, and Ines Hellmann. 2024. “Evidence for Compensatory Evolution within Pleiotropic Regulatory Elements.” Genome Research 34 (10): 1528–39.

      Lütge, Almut, Joanna Zyprych-Walczak, Urszula Brykczynska Kunzmann, Helena L. Crowell, Daniela Calini, Dheeraj Malhotra, Charlotte Soneson, and Mark D. Robinson. 2021. “CellMixS: Quantifying and Visualizing Batch Effects in Single-Cell RNA-Seq Data.” Life Science Alliance 4 (6): e202001004.

      Moon, Kevin R., David van Dijk, Zheng Wang, Scott Gigante, Daniel B. Burkhardt, William S. Chen, Kristina Yim, et al. 2019. “Visualizing Structure and Transitions in High-Dimensional Biological Data.” Nature Biotechnology 37 (12): 1482–92.

      Persad, Sitara, Zi-Ning Choo, Christine Dien, Noor Sohail, Ignas Masilionis, Ronan Chaligné, Tal Nawy, et al. 2023. “SEACells Infers Transcriptional and Epigenomic Cellular States from Single-Cell Genomics Data.” Nature Biotechnology 41 (12): 1746–57.

      Street, Kelly, Davide Risso, Russell B. Fletcher, Diya Das, John Ngai, Nir Yosef, Elizabeth Purdom, and Sandrine Dudoit. 2018. “Slingshot: Cell Lineage and Pseudotime Inference for Single-Cell Transcriptomics.” BMC Genomics 19 (1): 477.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1B: the orangutan tubulin stain looks a bit unusual - just confirming that this is indeed the right image the authors want to include here.

      We agree, this unfortunately also reflects the findings from the scRNA-seq analysis in that we found hardly any cells that we would classify as proper neurons.

      (2) Typo on line 90: 'loosing' should be 'losing'.

      Fixed

      (3) Line 118: why do the authors believe that using singleR will give better results than MetaNeighbour? This certainly seems supported by the data in S4 and S5, but the reasoning is not clear.

      We think that this might depend on the signal to noise ratio, which is a property specific to each dataset. Here we just wanted to state that our approach seems to work better for our developmental data, but we didn’t test out other data and thus cannot generalize.

      (4) Figure 2B: there are some coloured lines on the first filled black bar from the left - do they mean anything? I couldn't work it out from looking at the figure.

      Indeed this is a bit misleading the colors on the left represent the species identity: this was to illustrate the mixing of the of species for each cell type: The legend reads now: “Each line represents a cell which are colored by their species of origin on the left and by their current cell type assignment during the annotation procedure on the right.”

      (5) Figure 3: I did not understand how the seven bins of the cell type specificity metric were derived until much later - it is just the number of cell types in which a gene is expressed, yes? Might be worth making this clearer earlier in the text.

      We made this more explicit in the legend. “Boxplot of expression conservation of genes according to the number of different cell types in which a gene is expressed in humans (cell type specificity).”

      (6) It would be great to provide a bit more thorough documentation for the shiny app, so it can serve as a stand-alone resource and not require going back and forth with the paper to make sure one knows what one is doing at every point.

      Agree, this would be a good idea. We are on it.

      (7) Line 477: I think this is unclear - the authors retain over 11000 cells per species but then set the maximum number of cells in a cluster for pairwise comparison to 250... which is a lot fewer. What happens to all the other cells? This probably needs some rewriting to clarify it.

      We did this to minimize the power differences due to cell numbers and thus make the results more comparable across species. We added this explanation to the methods section for Marker gene detection.

      Reviewer #2 (Recommendations for the authors):

      How was the clustering resolution (0.1) determined?

      This resolution was only used for the initial rough check up of the germ layers as reported in Figure 1 and Supplementary Figures S3. We chose this resolution because it yielded roughly the same number of clusters as the number of cell types that we got from classification with the Rhodes et al data.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides evidence that cerebellar projections to the thalamus are required for learning and execution of motor skills in the accelerating rotarod task. This important study adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The data presentation is generally sound, especially the main observations, with some limitations in describing the statistical methods and a lack of support for two separate cerebello-thalamic pathways, which is incomplete in supporting the overall claim.

      We completed the MS by adding a double retrograde labelling study showing that the two pathways have limited overlap and by addressing the other concerns.

      Public Reviews:

      Reviewer #1 (Public review):

      This is an interesting manuscript tackling the issue of whether subcircuits of the cerebellum are differentially involved in processes of motor performance, learning, or learning consolidation. The authors focus on cerebellar outputs to the ventrolateral thalamus (VL) and to the centrolateral thalamus (CL), since these thalamic nuclei project to the motor cortex and striatum respectively, and thus might be expected to participate in diverse components of motor control and learning. In mice challenged with an accelerating rotarod, the investigators reduce cerebellar output either broadly, or in projection-specific populations, with CNO targeting DREADD-expressing neurons. They first establish that there are not major control deficits with the treatment regime, finding no differences in basic locomotor behavior, grid test, and fixed-speed rotarod. This is interpreted to allow them to differentiate control from learning, and their inter-relationships. These manipulations are coupled with chronic electrophysiological recordings targeted to the cerebellar nuclei (CN) to control for the efficacy of the CNO manipulation. I found the manuscript intriguing, offering much food for thought, and am confident that it will influence further work on motor learning consolidation. The issue of motor consolidation supported by the cerebellum is timely and interesting, and the claims are novel. There are some limitations to the data presentation and claims, highlighted below, which, if amended, would improve the manuscript.

      We thank the reviewer for the positive comments and insightful critics.

      (1) Statistical analyses: There is too little information provided about how the Deming regressions, mean points, slopes, and intercepts were compared across conditions. This is important since in the heart of the study when the effects of inactivating CL- vs VL- projecting neurons are being compared to control performance, these statistical methods become paramount. Details of these comparisons and their assumptions should be added to the Methods section. As it stands I barely see information about these tests, and only in the figure legends. I would also like the authors to describe whether there is a criterion for significance in a given correlation to be then compared to another. If I have a weak correlation for a regression model that is non-significant, I would not want to 'compare' that regression to another one since it is already a weak model. The authors should comment on the inclusion criteria for using statistics on regression models.

      We thank the reviewer for pointing out this weakness of description. The description of the Methods has thus been expanded and better justified in the “Quantification and statistical analysis” section.

      We agree with the reviewer that comparison between Deming regressions would be fragile due to the weakness of these regression in treatment groups (while they are quite robust for control groups) and they are not included in the MS, although Deming regression coefficients with their confidence intervals are now provided for all groups in the statistical tables. As now more clearly explained in the Methods, the comparisons between groups are based on the distribution of residuals around regressions of the control regression lines. If we understand correctly the reviewer’s request, the control groups are all included.

      (2) The introduction makes the claim that the cerebellar feedback to the forebrain and cortex are functionally segregated. I interpreted this to mean that the cerebellar output neurons are known to project to either VL or CL exclusively (i.e. they do not collateralize). I was unaware of this knowledge and could find no support for the claim in the references provided (Proville 2014; Hintzer 2018; Bosan 2013). Either I am confused as to the authors' meaning or the claim is inaccurate. This point is broader however than some confusion about citation.

      The references are not cited in the context of collaterals from the DCN but for the output channels of the basal ganglia and cerebellum: “They [basal ganglia and cerebellum] send projections back to the cortex via anatomically and functionally segregated channels, which are relayed by predominantly non-overlapping thalamic regions (Bostan, Dum et al. 2013, Proville, Spolidoro et al. 2014, Hintzen, Pelzer et al. 2018).” Indeed, the thalamic compartments targeted by the basal ganglia and cerebellum are distinct, and in the Proville 2014, we showed some functional segregation of the cerebello-cortical projections (whisker vs orofacial ascending projections). Hintzen et al. have indeed performed an extensive review indicating the limited overlap between cerebellar- and basal ganglia-recipient territories. The sentence has been corrected to clarify what the “They” referred to.

      The study assumes that the CN-CL population and CN-VL population are distinct cells, but to my knowledge, this has not been established. It is difficult to make sense of the data if they are entirely the same populations, unless projection topography differs, but in any event, it is critical to clarify this point: are these different cell types from the nuclei? how has that been rigorously established?; is there overlap? No overlap? Etc. Results should be interpreted in light of the level of this knowledge of the anatomy in the mouse or rat.

      There is indeed a paragraph devoted to the discussion of this point (last part of the section “A specific impact on learning of CL-projecting CN neurons.”). Briefly, we actually know from the literature that there is a degree of collateralization (CN neurons projecting to both VAL and CL, see refs cited above), but as the reviewer says, it does not seem logically possible that the exact same population would have different effects, which are very distinct during the first learning days. The only possible explanation is the CN-CL and CN-VAL infections recruit somewhat different populations of neurons. We have now added more experiments to support our finding using retrograde infections using two rAAV viruses expressing red and green fluorescent reporter. These experiments confirm the limited overlap of the two populations of interest obtained by retrograde infection. We feel thus confident that while some CN neurons may project to both structures, retrograde infection strategies thus appear to differentially infect CN populations.

      (3) It is commendable that the authors perform electrophysiology to validate DREADD/CNO. So many investigators don't bother and I really appreciate these data. Would the authors please show the 'wash' in Figure 1a, so that we can see the recovery of the spiking hash after CNO is cleared from the system? This would provide confidence that the signal is not disappearing for reasons of electrode instability or tissue damage/ other.

      The recordings were not extended to the wash period, but examination of the firing rate before CNO on successive days did not evidence major changes in the population firing rate (this is now shown in a new supplementary figure 6).

      (4) I don't think that the "Learning" and "Maintenance" terminology is very helpful and in fact may sow confusion. I would recommend that the authors use a day range " Days 1-3 vs 4-7" or similar, to refer to these epochs. The terminology chosen begs for careful validation, definitions, etc, and seems like it is unlikely uniform across all animals, thus it seems more appropriate to just report it straight, defining the epochs by day. Such original terminology could still be used in the Discussion, with appropriate caveats.

      Since reference to these time windows is repeatedly used in the text we have shifted to “Early” and “Late” phase terminology.

      (5) Minor, but, on the top of page 14 in the Results, the text states, "Suggesting the presence of a 'critical period' in the consolidation of the task." I think this is a non-standard use of 'critical period' and should be removed. If kept, the authors must define what they mean specifically and provide sufficient additional analyses to support the idea. As it stands, the point will sow confusion.

      This has been corrected to: “suggesting the cerebellar contribution to the consolidation of the task is critical early in the learning process and cannot be easily reinstated later”

      Reviewer #2 (Public review):

      Summary:

      This study examines the contribution of cerebello-thalamic pathways to motor skill learning and consolidation in an accelerating rotarod task. The authors use chemogenetic silencing to manipulate the activity of cerebellar nuclei neurons projecting to two thalamic subregions that target the motor cortex and striatum. By silencing these pathways during different phases of task acquisition (during the task vs after the task), the authors report valuable findings of the involvement of these cerebellar pathways in learning and consolidation.

      Strengths:

      The experiments are well-executed. The authors perform multiple controls and careful analysis to solidly rule out any gross motor deficits caused by their cerebellar nuclei manipulation. The finding that cerebellar projections to the thalamus are required for learning and execution of the accelerating rotarod task adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The finding that silencing the cerebellar nuclei after a task impairs the consolidation of the learned skill is interesting.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      While the controls for a lack of gross motor deficit are solid, the data seem to show some motor execution deficit when cerebellar nuclei are silenced during task performance. This deficit could potentially impact learning when cerebellar nuclei are silenced during task acquisition.

      One of our key controls are the tests of the treatment on fixed speed rotarod, which provides the closest conditions to the ones found in the accelerating rotarod (the main difference between the protocols being the slow steady acceleration of rod rotation in the accelerating version). Indeed, small but measurable deficits are found at the highest speed in the fixed speed rotarod in the CN-VAL group, while there was no measurable effect on the CN-CL group, which actually shows lower performances from the second day of learning; we believe this supports our claim that the CN-CL inhibition impacted more the learning process than the motor coordination. In contrast, the CN-VAL group only showed significantly lower performance on day 4 consistent with intact learning abilities. Yet, under CNO, CN-VAL mice could stay for more than a minute and half at 20rpm, while in average they fell from the accelerating rotarod as soon as the rotarod reached the speed of ~19rpm (130s). Overall, we focused our argument on the first days of learning where the differences between the groups are more pronounced. We clarified the discussion (section “A specific impact on learning of CL-projecting CN neurons.”)

      Separately, I find the support for two separate cerebello-thalamic pathways incomplete. The data presented do not clearly show the two pathways are anatomically parallel. The difference in behavioral deficits caused by manipulating these pathways also appears subtle.

      There is indeed a paragraph devoted to the discussion of this point (last part of the section “A specific impact on learning of CL-projecting CN neurons.”). Briefly, we actually know from the literature that there is a degree of collateralization (CN neurons projecting to both VAL and CL, see refs cited above), but it does not seem logically possible that the exact same population would have different effects, which are very distinct during the first learning days. The only possible explanation is the CN-CL and CN-VAL infections recruit somewhat different populations of neurons. We have now added more experiments to support our finding using retrograde infections using two rAAV viruses expressing red and green fluorescent reporter. These experiments confirm the limited overlap of the two populations of interest obtained by retrograde infection. We feel thus confident that while some CN neurons may project to both structures, retrograde infection strategies thus appear to differentially infect CN populations.

      While we agree that after 3-4 days of learning the difference between the groups becomes elusive, we respectfully disagree with the reviewer that in the early stages these differences are negligible.

      Reviewer #3 (Public review):

      Summary:

      Varani et al present important findings regarding the role of distinct cerebellothalamic connections in motor learning and performance. Their key findings are that:

      (1) Cerebellothalamic connections are important for learning motor skills

      (2) Cerebellar efferents specifically to the central lateral (CL) thalamus are important for shortterm learning

      (3) Cerebellar efferents specifically to the ventral anterior lateral (VAL) complex are important for offline consolidation of learned skills, and

      (4) That once a skill is acquired, cerebellothalamic connections become important for online task performance.

      The authors went to great lengths to separate effects on motor performance from learning, for the most part successfully. While one could argue about some of the specifics, there is little doubt that the CN-CL and CN-VAL pathways play distinct roles in motor learning and performance. An important next step will be to dissect the downstream mechanisms by which these cerebellothalamic pathways mediate motor learning and adaptation.

      Strengths:

      (1) The dissociation between online learning through CN-CL and offline consolidation through CN-VAL is convincing.

      (2) The ability to tease learning apart from performance using their titrated chemogenetic approach is impressive. In particular, their use of multiple motor assays to demonstrate preserved motor function and balance is an important control.

      (3) The evidence supporting the main claims is convincing, with multiple replications of the findings and appropriate controls.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      (1) Despite the care the authors took to demonstrate that their chemogenetic approach does not impair online performance, there is a trend towards impaired rotarod performance at higher speeds in Supplementary Figure 4f, suggesting that there could be subtle changes in motor performance below the level of detection of their assays.

      This is now better acknowledged in the discussion in the section “A specific impact on learning of CL-projecting CN neurons.” However, we want to underline that the strongest deficit in learning is found in animals with CN->CL inhibition which latency to fall saturates at about 100s on the rotarod; this indicates that mice fall as soon as the accelerating rotarod speed reaches about 16rpm. In fixed speed rotarod, the inhibition of CN->CL neurons shows not even a trend of difference at 15rpm with control mice, and the animals run 2 minutes without falling at this speed. This makes us confident that the CN->CL pathway interfers more with the learning than with the actual locomotor function on the rotarod.

      (2) There is likely some overlap between CN neurons projecting to VAL and CL, somewhat limiting the specificity of their conclusions.

      This issue is treated in the discussion. (see also replies to reviewers 1 and 2 above). We added experiments with simultaneous retro-AAV infections in CL and VAL and the data are presented in Supplementary Figure 5. We found that retrograde infection targeted different populations of CN neurons; although collaterals in both CL and VAL may be present for (some of) these two populations of neurons, they are likely strongly biased toward one or the other thalamic regions, explaining the differential retrograde labelling in the CN. We hope these experiments will answer the reviewer’ s concern.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Multiple studies have reported on the effect of cerebellar nuclei (CN) manipulation on locomotion. Here the authors perform several controls and careful analysis to rule out gross motor deficits caused by DREADD-mediated CN silencing. As the authors point out in the discussion, part of the difference from prior studies could be the mild degree of inhibition here. However, it is possible that the CN inhibition here induces a subtle motor deficit and the accelerating rotarod task is challenging and more readily reveals this motor deficit, rather than a deficit in motor learning per se. Two pieces of data seem to suggest this:

      (a) under CN inhibition during the task (Figure 1i), mice could never achieve the level of performance as mice under CN inhibition after the task, even after several days of training, which suggests the CN inhibition is interfering with task performance;

      (b) in highly trained mice (after learning), applying the CN inhibition impaired performance to a similar extend as mice in Figure 1i (Figure 4).

      Can the authors rule out the possibility that CN inhibition during the task is impairing motor execution rather than motor learning?

      We do not rule out a contribution of impaired motor coordination at the highest speed (last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.”). Indeed, most of our argument in favor of deficit in learning is primarily in the first days (Early phase), particularly for the CN->CL CNO group (Fig 3h). A crucial control in our work is the use of fixed speed rotarod, where no deficit is observed. The difference between the fixed and accelerating rotarod is rather minimal since the acceleration of the rotarod is rather small (0.12rpm/s for speed up to >20 rpm).

      Interpreting the effect of treatment reversal is challenging. If the only effect of CNO was a motor deficit, the animals who learned under CNO should rapidly regain higher performance under saline, which is not observed. When switching from CNO to Saline after 7 days of training, it is difficult to disentangle which part is due to a crude motor deficit (which would not show in fixed speed rotarod), and which part is due to an unability to resume motor learning after the task has been (mis-)consolidated.

      (2) The separation of the cerebellar pathways to the intralaminar thalamus (IL) and ventral thalamus (VAL) is not clear to me. It is not clear the CN neurons projecting to these nuclei are distinct. In addition, although IL projects to the striatum and VAL does not, both IL and VAL project to motor cortex. It is unclear to what extent these pathways can be separated. The argument for distinct pathways (as laid out in the discussion) is the distinct behavior deficits when manipulating these two pathways, but this difference seems subtle (point 3).

      We now clarify that CN populations are different help to retrograde labelling experiments (new Suppl Fig 5). A discussion on the differences in IL and VAL projections is now discussed in the last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.” Briefly, we argue that the despite some overlap of their targets, the profiles of the CL and VAL differ substantially.

      (3) The pattern of behavioral deficits induced by CN->CL and CN->VAL neurons appear similar in Figure 3b-c and e-f. I have difficulty seeing how these data lead to the differences in the regression fits in panels 3g-k, which seem to show distinct patterns of performance change within and across sessions. One notable difference in Figure 3b-c and e-f seems to be that CN->VAL CNO treated mice exhibit lower performance on the very first trial for most days. Somehow, this pattern is present even after the CNO treatment is switched to saline (Figure 3f). I wonder if this data point is driving the difference. One control analysis the authors could do is to exclude the 1st trial and test if the effects are preserved.

      Since the learning is cumulative and involves varying degree of consolidation it is indeed difficult to substantiate the difference from the average performance: a performance on day 3 may be limited by slow learning and perfect consolidation or good learning and imperfect consolidation. That is why we designed an analysis which takes into account the observed relationships between initial performance, within session gain of performance and acrosssession carry-over of this gain of performance (Fig 2). This analysis focuses on the first days of learning, before the performance plateau is reached in the CNO groups. While a clear deficit in consolidation is observed with full CN inhibition, this is not the case for the CN→CL CNO groups, despite their weaker performance after 3 days, similar to that seen with full CN inhibition. In contrast, normal learning is observed in the CN→VAL CNO group during these three days. The consolidation deficit in the CN→VAL CNO group is more subtle than in the CN CNO group and is indeed largely driven by the first data point. This is consistent with the idea that CN→VAL inhibition only partially impairs consolidation (compared to full CN inhibition), leaving some “savings” that allow rapid reacquisition.

      (4) The quantification of locomotion in Figure S2 needs more information. What is linear movement? What is sigma? What is the alternation coefficient? These are not defined in the legends or the Methods as far as I can tell. Related to point 1 above, the authors should provide some analysis of the stride length and hindlimb to forelimb distance as measures of locomotion execution.

      These measures were taken from Simon J Neurosci 2004 24(8):1987-1995 which is now cited and their description is now provided in the Methods.

      Minor:

      (5) To help readers follow the logic of experimental design, please explain why CNO was switched to saline after day 4 in Figures 1j, 3c, and f. Specifically, is the saline manipulation meant to test something as opposed to applying CNO throughout the entire course of the behavioral test?

      Since we had no difference between the groups at the end of the Early phase, we decided to test whether the skill consolidated under CNO remained available when the CNO was removed (and it indeed was). This is now more clearly stated in the Results.

      (6) I have difficulty understanding what is plotted in Figure 4b and d. The legend says the change in performance is calculated the same way as in Figure 2a, so the changes are presumably the regression slopes. But how are the regression slopes calculated for daily start (1st trial) and daily end (last trial)?

      Skill level at the beginning and end of each trial correspond to the values of the regression line for abscissae values of trial 1 and trial 7 (green points). This has been added to the figure legend.

      (7) Do CN-CL and CN-VAL neurons also project to other brain regions besides the thalamus? Might these pathways also contribute to learning and consolidation of the accelerating rotarod task? Please discuss.

      This is now discussed in more detail in the last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.”

      Reviewer #3 (Recommendations for the authors):

      (1) Please check the anatomic evidence for the strict dichotomy between intralaminar (specifically central lateral nucleus) nuclei projecting to the striatum and the ventral-anteriorlateral (VAL) complex projecting to the cortex. For example, while the Chen et al paper shows that there are cerebellar-intralaminar-striatal projections, it does not exclude intralaminar cortex projections, which have at least been demonstrated in rats. Similarly, VAL has projections to striatum (see, e.g., Smith et al, "The thalamostriatal system in normal and diseased states", Frontiers in Systems Neuroscience, 2014). It may be that some of these projections are stronger, but I don't think it's true that these pathways are as well-separated as the authors suggest. I also don't think this changes the fundamental conclusions but is important for potential mechanisms by which differential learning could occur and necessitate modification of Figure 5.

      We have toned down the interpretation of CL and VAL relaying specifically to different brain structures and mostly put forward the duality of the pathways. The connections with the cortex are now discussed at the end of the section “A specific impact on learning of CL-projecting CN neurons.”

      (2) Please provide more details on the spike sorting. By what metrics were single units declared to be well-separated? How many units were identified under each condition? What was the distribution of firing rates with and without CNO treatment? Are the units shown in panel 1f from before and after CNO as in panel E or are just 2 examples of isolated units? The units by themselves are not very helpful to the reader. Showing sample auto and/or crosscorrelograms for units recorded on the same electrode would be more helpful to show how well-isolated the units are.

      Single units were considered well-isolated based on quantitative quality metrics computed after MountainSort 4 spike sorting (Phyton 3.8). Units were required to have a signal-to-noise ratio (SNR) greater than 5, inter-spike interval (ISI) violations less than 1%, an amplitude cutoff below 0.1, a presence ratio above 0.9, a firing rate greater than 0.1 Hz, and at least 50 detected spikes. In addition, units were assessed for temporal stability across the recording using autocorrelograms and presence over the recording, ensuring there were no prolonged periods of total inactivity. Units meeting these criteria were deemed well-separated and reliable for further analysis. This has been added to the Methods.

      Cell numbers are provided with the statistics in the supplementary table for fig panel 1g. Panels are from the same unit before and after CNO. Example of auto- crosscorr- are provided in the new Supplementary Figure 6.

      (3) Panel 2g - "firing rate modulation" is unclear. I think the authors are showing the mean firing rate with DREADD+CNO treatment divided by the mean firing rate in the pre-CNO condition for the same group (I couldn't find that in the Methods, my apologies if I missed it)? However, firing rate modulation to me means variability in firing rate within a recording. Perhaps "relative firing rate" or "% pre-CNO firing rate" would be clearer?

      The definition has been added to the Method and the axis has been changed to ‘Change in FR induced by SAL/CNO’

      (4) Figure 3f - why does consolidation appear to be impaired after the transition from CNO to saline between sessions, when in panel 1j suppressing the CN does not have a similar effect once CNO is switched to saline? Could this be driven by a small number of mice? Since a central conclusion of the paper is that CN-VAL connections are uniquely important for posttraining consolidation, this discrepancy is important to explain - if the results post-saline are spurious, how do we know that the results post-CNO aren't also spurious? Panels similar to Figure 4b and d showing all the data from the last/first trial of each session I think would be convincing.

      Our results overall indicate that the overnight consolidation of the improvement in performance seem only effective in the early phase (as pointed out on the summary figure 5). We do not believe then that the saline results are spurious.

      It can be seen indeed in the control groups of the figure 1; to make this more visible, we plot in Author response image 1 the difference between trial 7 and trial 1 the next day. An overnight drop in performance becomes visible in the late phase.

      Author response image 1.

      The decrement on the first trial in the first 3 days is visible for the majority of the mice. The plot asked by the reviewer is represented in the Author response image 2.

      Author response image 2.

      Minor points:

      (5) In panel 1a, the solid yellow line obscures a lot of the image and I don't think adds anything.

      We assume this was referring to a line on fig1d, which has been removed.

      (6) Panel 2a - color selection could present problems for those with red-green color blindness.

      This has been fixed.

      (7) Supplementary Figure 3 - what are the arrows and arrowheads indicating?

      These have been removed.

      (8) In the Discussion: "Studies of cerebellar synaptic plasticity provide clearly support the involvement of cerebellum in rotarod learning..." Delete the word "provide"

      This has been fixed

      (9) "This indicates that either the distinct functional roles of VAL-projecting or CLprojecting." The second "of" should be "or", I think.

      This has been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Fecal virome transfer (FVT) has the potential to take advantage of microbiome associated phages to treat diseases such as NEC. However, FVT is also associated with toxicity due to the presence of eukaryotic viruses in the mixture, which are difficult to filter out. The authors use a chemostat propagation system to reduce the presence of eukaryotic viruses (these become lost over time during culture). They show in pig models of NEC that chemostat propagation reduce the incidence of diarrhea induced by FVTs.

      Strengths:

      The authors report an innovative yet simple approach that has the potential to be useful for future applications. Most of the experiments are easy to follow and performed well.

      Weaknesses:

      The biggest weakness is that the authors show that their technique addresses safety, but they are unable to demonstrate that they retain efficacy in their NEC model. This could be due to technical issues or perhaps the efficacy of FVT reported in the literature is not robust. If they cannot demonstrate efficacy of the chemostat propagated virome mixture, the value of the study is compromised.

      We appreciate the reviewer’s assessment and fully acknowledge that our inability to demonstrate NEC protection by FVT is a limitation to the study. If technical issues cover the variability in disease phenotype in our animal model, which is of a spontaneous nature, then yes we fully agree. Issues with FVT preparation are however unlikely, as this is performed per protocol. The effect of FVT on NEC has hitherto only been demonstrated by our research group in two individual studies using separate donor fecal material, so it is indeed too early to speculate about robustness in FVT response. We have briefly mentioned this in the results (lines 563-565) and discussion (lines 777-779), but agree that it needs further elaboration. We have now revised the discussion and conclusion to better emphasize the extent and consequences of this limitation (lines 793-797 + lines 817-818). Importantly, we show that inclusion of specific nutrients, such as milk oligosaccharides, impacts the resulting propagated fecal-derived virome. One can argue that this is not surprising, but it has nevertheless not been shown before – and it opens up possibilities for future “tailor-made” fecal-derived viromes with predictable profiles and effects.

      Even though we do not demonstrate an effect of the chemostat-propagated virome, we still believe that the study provides valuable insights as a proof-of-concept. Specifically, we demonstrate that in vitro chemostat propagation can significantly modulate the safety profile of FVT, while still driving changes in the microbiome, e.g., by decreasing C. perfringens.

      The above issue is especially concerning because the chemostat propagation selected for bacteria that may not necessarily be the ones that harbor the beneficial phages. Without an understanding of exactly how FVT works, is it possible to make any conclusion about the usefulness of the chemostat approach?

      The chemostat work was based on the idea that if we culture a fecal inoculum under suitable conditions, then the phageome would propagate alongside and allow for a scalable production method for standardized donor-independent FVT. We are cognizant that the chemostat end-culture diverged quite markedly from the fecal inoculum. In reality, such divergence is unavoidable when performing in vitro simulation of intestinal growth conditions. On the positive side, we showed that we could drive an expansion of Bacteroides spp. by supplementing the media with human milk oligosaccharides. We have previously shown that Bacteroides spp. engraft FMT recipients that are in turn protected from NEC. However, there is much room for refinement of the chemostat culture condition; i.e. to preserve the rich repertoire of lactobacilli from the inoculum e.g. by means of lowering the pH. Moreover, the loss of viral diversity in the chemostat end-culture also needs to be addressed, potentially by lowering the chemostat dilution-rate to allow the time for phage propagation. Based on these insights, we will in the near future invest heavily in improving the chemostat procedure to end up with a propagated fecal virome with better resemblance to the fecal inoculum.

      Finally, can the authors rule out that their observations in THP-1 cells are driven by LPS or some other bacterial product in the media?

      We thank the reviewer for raising this point. To minimize the influence of bacterial contaminants such as LPS or other small bacterial products, we implemented several steps during sample preparation. Specifically, we performed ultrafiltration using a 300 kDa molecular weight cut-off, which should remove small molecules, including LPS, bacterial metabolites, and other potential soluble immunomodulators. Hereafter, all viral preparations underwent endotoxin removal procedures prior to cell exposure. These precautions reduce the likelihood that our observed effects in THP-1 cells are attributable to bacterial products rather than viral components. This is explained in the referenced article (20), but we have now added the clarification to the Methods section of the revised manuscript (lines 222 and 227). The immune expression profile differs markedly between the viral preparations and the E. coli control, e.g. IFNG, TLR3, TLR8, making it highly likely that viral epitopes are the major drivers of the viral preparations with less impact by any potential bacterial epitope contaminant. This is now mentioned in the results section (line 541-543):

      Reviewer #2 (Public review):

      Major revision

      (1) As authors state that the aim of the research is 'We hypothesized that chemostat propagated viromes could modulate the GM and reduce NEC lesions while avoiding potential side effects, such as earlier onset of diarrhea'.

      (a) For the efficacy, in Fig 5, there are no significance in stomach pathology and enterocolitis between groups, even between control group and experimental groups, is it because of the low incidence of NEC? This may affect the statistical power of the conclusions. Therefore, it is unclear how one can draw the conclusion that chemostat can reduce NEC lesions?

      Thank you for highlighting this important point. We fully agree and would like to clarify that it is not our intention to conclude that chemostat propagation reduces NEC lesions under the experimental settings within this paper. Rather, this was our initial hypothesis, which could not be confirmed. The unexpectedly low incidence of NEC across groups in Piglet Experiment 1 did not allow for a clear conclusion, but the second Piglet Experiment 2 failed to show a NEC-reducing effect. We have stated this important point in the following sections:

      - Abstract (line 42-44): “However, these signatures were lost in recipients of chemostat-propagated viromes, and only minor microbiome effects and no NEC prevention were observed.”

      - Results (line 699): “This highlights that while chemostat propagation effectively mitigates virus-associated diarrhea, the method needs further optimization to targt NEC.”

      - Discussion (lines 773–775): “However, the MO-propagated chemostat virome did not increase Bacteroides or Parabacteroides spp. in the recipient’s gut, nor did it provide NEC protection.”

      - We have rephrased this to emphasize the importance of Experiment 2.

      - To avoid any potential misinterpretation, we have rephrased line 598 to reflect that we observed “a difference in the clinical side effect pattern” rather than implying efficacy.

      - Furthermore, we have updated the summary title for Figure 8 (line 704) to clearly state: “MO-propagated virome modestly exacerbates gastric injury and fails to improve NEC.”

      - Also, we have added the following section to the discussion (lines 793-797): “However, we acknowledge that the absence of demonstrated NEC prevention by the native donor virome is a significant limitation to conclusions regarding efficacy. Without a protective baseline, we cannot assess whether the virome efficacy was lost during chemostat propagation. Consequently, we cannot confirm or dismiss the hypothesis that chemostats can preserve a phage community capable of preventing NEC.”

      - Lastly, we have updated the conclusion (lines 817-818): “However, as neither the chemostat-propagated viromes nor the native donor virome demonstrated NEC prevention, the efficacy of the chemostat approach remains inconclusive.”

      - These changes should clarify that while the study demonstrates improved safety via reduced diarrhea, NEC efficacy was not obtained.

      (b) More convincing pathology images would be helpful.

      Since we did not observe a protective effect against NEC with either of the treatments, we opted not to include pathology images. However, extensive examples can be found in the cited paper (reference 37), which describes our NEC scoring methodology in the Methods section (lines 268-271): https://doi.org/10.1016/j.yexmp.2024.104936.

      (c) For the safety, such as body weight development, FVT had no statistical significance difference from control, CVT, and CVT-MO, so how can you drawn the conclusion that chemostat can avoid potential side effects?

      We appreciate the reviewer’s observation. To clarify, we do not claim that chemostat propagation completely avoids all potential side effects, but rather that it mitigates them. As shown in Fig. 5G, FVT recipients exhibited significantly reduced body weight gain compared to controls, CVT, and CVT-MO specifically on day 4, but not on day 5. This transient effect suggests that side effects such as reduced growth and early-onset diarrhea are delayed, not entirely prevented, by chemostat propagation. This is stated in the results section in lines 593-595. We also believe that this is consistent with the paper title and the conclusion that the chemostat process minimizes the adverse effects associated with native FVT (line 813).

      (d) There is lack of evidence to convince the reader that there is a decrease of eukaryotic viruses. More quantitative data here would be useful.

      Apart from the fact that it is impossible for eukaryotic viruses to shed in a system devoid of eukaryotic cells, and that the chemostat runs continuously exchanges the culture, thereby diluting any substance incapable of propagation, we agree that quantitative data to demonstrate a reduction of eukaryotic virus load is lacking.

      However, in this case we believe the relative viral abundance data are almost as convincing. To make this even clearer, we have produced new graphs showing 1) the eukaryotic viral abundance relative to total viral abundance and 2) observed eukaryotic viral species, both after medium subtraction. Eukaryotic viral relative abundances decrease from around 0.4% to approach zero already in the batch phase, and similarly number of eukaryotic viral species decrease from around 10 in the fecal inoculum to zero midway through the chemostat phase. These new graphs are now part of Supplementary figure S3 B-C. Moreover, an error in the eukaryotic viral heatmaps presented in Figure 3F now means that the relative abundance of each sample (column) now sums up to 100%. Please also notice from the lower heatmap (where the virome signature of the medium is subtracted) that no eukaryotic viruses are identified from the sequencing data of the samples from the chemostat from 50 hours and onwards.

      However, for future experiments we will consider adding a known quantity of a marker virus to the inoculum and monitoring its concentration (e.g., by qPCR) throughout the culture process. Importantly, if the resulting virome is meant for in vivo testing, this marker virus should be inert to the receiving organism.

      (2) Questions regarding Fig 3F,

      (a) How can the medium have 'the baseline viral content' ?

      As we have previously seen persistent eukaryotic viral signals in metagenomics sequencing data from chemostat experiments, we sampled and sequenced the culture medium. As is seen from Figure 3F, this only concerns Dicistroviridae, as the patterns of the remaining eukaryotic viral signals before and after medium subtraction are virtually similar. For some reason, a component of the culture medium contains a genetic signal from this entity. Since all culture components are sterilized, it is most likely genomic traces that are then continuously supplied with the medium and appears in all culture samples. As it is unlikely to derive from intact viruses, the in vivo implications are deemed minimal.

      (b) What is the statistical significance of relative abundance of specific eukaryotic viruses?

      The same as any statistical comparison on single OTU level in a nucleotide sequencing dataset. As commented above, it does not prove a quantitative depletion of eukaryotic virus throughout the chemostat process but given the context a reduction in relative abundance supports the notion that eukaryotic viruses are indeed depleted when the culture medium is exchanged. The relevant question to us is: What is the magnitude of depletion? Which is particularly relevant since the clinical data indicates a delay and not a prevention of side effects after transplantation. Hence, as proposed above, the use of a marker virus would provide us with that answer.

      (c) The hosts for some of the listed eukaryotic viruses are neither pigs or human, as such the significance of a decrease in these viruses to humans is unclear.

      Dicistroviridae is not present in the inoculum and shows up only when medium is added. Picobirnavirus and Astrovirus are relevant mammalian intestinal viruses, whereas Smacoviridae is less well described (dois: 10.3389/fvets.2020.615293 and 10.3390/v8020042). Genomoviridae as a fungal virus indeed appears to be less relevant in the case of the mammalian intestine. Indeed, at any given time point in any given individual, be it a pig or a human, it would carry with it several viral species that are incapable of infecting it, most likely transiting after being ingested with food, or in the case of pigs through rummaging. It is no secret that we have been searching for a causative agent responsible for the clinical side effect patterns related with FVT, but there seems to be no consistent viral agent that is overabundant in diarrheal piglets. Hence, in this study, we are mostly interested in the proof-of-concept for overall eukaryotic virus reduction through chemostat propagation, and we believe we have presented data in support of this.

      (3) In this study, pH 6.5 was selected as the pH value for chemostat cultivation, but considering the different adaptability of different bacteria to pH, it is recommended to further explore the effect of pH on bacteria and virus groups. In particular, it was optimized to maintain the growth of beneficial bacteria such as Lactobacillaceae and Bacteroides in order to improve the effect of chemostat cultivation.

      We agree that pH is a key parameter in shaping microbial communities during chemostat cultivation. As noted, we selected pH 6.5 to balance physiological relevance and bacterial viability, but we acknowledge that this pH may not be optimal for supporting the growth of certain potentially beneficial taxa such as Lactobacillaceae. We explicitly address this in the discussion (lines 736–741), where we state that the selected pH may have limited engraftment and that future studies should investigate pH optimization to better support bacterial groups and improve the overall effectiveness of the cultivation system.

      (4) Please improve the quality of the images, charts, error bars and statistical significance markers throughout and mark the n's. used in each experiment.

      We have carefully reviewed all figures and could not identify any general image quality issues. If some specific images or panels appear unclear or problematic, we would appreciate it if the reviewer could point them out so we can address them directly.

      Regarding sample sizes, the number of animals (n) is indicated in Fig. 5A and its legend, as well as in Fig. 8A. We have now also added this information to the legend of Fig. 8 for clarity.

      To improve the clarity of statistical findings, we have added asterisks to denote significance in panels 6A, 6F, and 7A, as requested.

      To improve the clarity of Fig. 3B, we have added a dashed line to separate LAC and LAC-MO.

      Reviewer #3 (Public review):

      Major revisions

      This study investigated the in vitro amplification of donor fecal virus using chemostat culturing technology, aiming to reduce eukaryotic virus load while preserving bacteriophage community diversity, thereby optimizing the safety and efficacy of FVT. The research employed a preterm pig model to evaluate the effects of chemostat-propagated viromes (CVT) in preventing necrotizing enterocolitis (NEC) and mitigating adverse effects such as diarrhea.

      Strengths:

      Enhanced Safety Profile: Chemostat cultivation effectively reduced eukaryotic virus load, thereby minimizing the potential infection risks associated with virome transplantation and offering a safer virome preparation method for clinical applications.

      Process Reproducibility: The chemostat system achieved stable amplification of bacteriophage communities (Bray-Curtis similarity >70%), mitigating the impact of donor fecal variability on therapeutic efficacy.

      Weaknesses:

      Loss of Phage Functionality: The chemostat cultivation resulted in a reduction in phage diversity (e.g., the loss of Lactobacillaceae phages), which may compromise their protective effects against NEC (potentially linked to the immunomodulatory functions of Lactobacilli). The authors should explicitly address this limitation in the discussion section, particularly if additional experiments cannot be conducted to resolve it within the current study.

      We appreciate the reviewer’s concern and agree that the loss of phage diversity during chemostat cultivation, especially phages targeting Lactobacillaceae, is an important limitation with potential implications for NEC protection.

      We already described the depletion of Lactobacillaceae in the chemostat and its implications in the discussion (lines 742-751 + 787-793), along with our plans to address this in future work by adjusting culture pH. However, we acknowledge that the significance of losing phage diversity deserves more explicit attention. Accordingly, we have expanded the discussion to highlight the possible consequences of this loss and its impact on phage functionality (see lines 758–762), as suggested by the reviewer.

      Limitations in Experimental Design: The low incidence of NEC lesions in the control group reduced the statistical power of the study. This limitation undermines the ability to conclusively evaluate the efficacy and safety of the chemostat-propagated virome as a novel intervention for NEC. Future studies should optimize experimental conditions (e.g., using a more NEC-susceptible model or diet) to ensure adequate disease incidence for robust statistical comparisons.

      We agree that the low NEC incidence in Experiment 1 limited the statistical power to evaluate efficacy. To address this, we designed Experiment 2 using a more NEC-inducing diet (formula 2), which resulted in a higher level of baseline lesions. This allowed for a more conclusive assessment, demonstrating that the MO-propagated chemostat virome did not provide NEC protection when using the donor feces and culture conditions applied in this experiment.

      We acknowledge that this was too unclear in the original manuscript. Please see the response to the first comment by Reviewer 2, where we have highlighted several revisions to improve clarity.

      However, we do believe the data are robust enough to conclude that the level of diarrhea — and thereby safety — was improved in the piglet model, which is why we chose to focus on this aspect in the paper’s title.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      The manuscript presents a well-structured study investigating the feasibility of using chemostat-based culturing of the fecal virome to reduce the transfer of eukaryotic viruses during fecal virome transfer (FVT). Utilizing both in vitro fermentation systems and a preterm piglet model, the authors explore whether this method could be a safer and equally effective alternative to raw FVT for treating neonatal intestinal diseases, such as necrotizing enterocolitis (NEC). This study introduces a novel mitigation strategy for FVT through chemostat fermentation. However, a significant revision is recommended before the manuscript can be considered for publication.

      Major Changes:

      - A central aim of the study was to assess whether chemostat-cultured viromes maintain protective effects against NEC. However, this key outcome remains "unresolved" due to the low incidence of NEC in the control group. The discussion should address this limitation.

      We fully acknowledge this limitation and agree that our study cannot conclude whether the NEC effect of FVT was maintained without demonstrating an effect of this native virome. Please see our response to a similar concern raised by Reviewer 1, where we describe the revisions made to the discussion (lines 793-797) and conclusion (lines 817-818).

      - The section on viral particle enrichment should be expanded and discussed in more detail. It would be beneficial to examine its efficiency in separating bacteria from viral-like particles (VLPs) compared to findings from previously reported studies. The authors should clarify the rationale behind the selected dose of VLPs used in the experiments and their role in virus engraftment results.

      We selected the virome isolation method based on previous experiments within our lab, demonstrating efficient separation of bacteria and virus particles, using a 0.45 um filter syringe. Filtrates were quality assessed by fluorescence microscopy, showing absence of intact bacteria. Using a diverse mock virus community, we also showed a high degree of preservation of infective viruses in the FVT following the isolation procedures. We have now expanded the description of the separation method in the results section with a reference to this work (lines 188-190). We did however choose to increase the molecular weight cut off (MWCO) to enhance the exclusion of non-viral components.

      We acknowledge that the rationale and importance of the VLP dose was lacking in the discussion. This has now been added (line 758-762).

      - The viral richness of chemostat viromes was significantly lower than that of native feces. The authors should discuss how this may impact microbiome and virome outcomes.

      We have included this point in the new section about VLP dose in the discussion. Please see lines 758-762.

      - The immune response was assessed through THP-1 cells and a limited piglet cytokine panel. These may not fully represent the intestinal epithelial or mucosal immune responses. Thus, authors should acknowledge these limitations in the discussion section.

      Thank you for the comment. The limitation of using THP-1 cells as an in vitro model is already acknowledged in the results section (line 545): “Since fecal-derived eukaryotic viruses mainly infect intestinal cells, an

      in vivo stimulation may reveal a different response pattern. ”

      The limited panel of porcine cytokines was not intended as a comprehensive assessment of the mucosal immune response, but rather as supportive data for NEC-associated inflammation, as we have previously demonstrated (reference 37: https://doi.org/10.1016/j.yexmp.2024.104936). To obtain a comprehensive view of the immune response, a few days after diarrhoea onset, we additionally performed RNA-Seq analyses of the intestinal lymph node.

      - While the manuscript is comprehensive, it is also lengthy and text-heavy. Some sections could be condensed for clarity.

      The manuscript has been through multiple revisions by authors. While it is indeed lengthy, we have removed non-essential information and redundancies and now feel that the balance between data, text, figures, and supplementary information is acceptable.

      - Several figures (e.g., Figs. 1-5) contain significant data but need clearer summaries in their captions.

      We appreciate the suggestion and have revised the captions for Figs. 1-8 to provide clearer, more informative summaries of the data they present.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely thank the reviewer for the thorough and constructive evaluation of our manuscript. We greatly appreciate the recognition of our work's strengths, particularly the integration of experiments and mathematical modeling, the stochastic framework for describing sloughing events, and the insights into pressure-driven detachment dynamics.

      We have carefully considered each point raised and provide detailed responses below. In response to the reviewer's comments, we have revised the Methods section to better clarify our approach to three-dimensional assessment. We believe these revisions have improved the clarity of the manuscript.

      Below, we address each of the specific concerns raised by the reviewer:

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:<br /> The study achieves its primary goal of integrating experiments and modeling to understand the coupling between flow and biofilm growth and detachment in a microfluidic channel, but it should have highlighted the weaknesses of the methods. I list the ones that, in my opinion, are the main ones:

      The study does not consider biofilm porosity, which could significantly affect the flow and forces exerted on the biofilm. Porosity could impact the boundary conditions, such as the no-slip condition, which should be validated experimentally.

      Porosity is indeed a key component of biofilm structures, resulting from the polymeric nature of the EPS matrix, mechanical forces, and biological processes such as cell death or predation. When considering flow-biofilm interactions, this porosity may allow fluid flow through the biofilm, with reported permeability values spanning an extremely broad range from 1015 to 10-7 m2 (Kurz et al., 2023).

      However, we argue that biofilm permeability is not the primary driver in our system:

      (1) In microscopy visualization, our biofilms form dense structures where flow around the biofilm through narrow channels dominates over flow through the porous biofilm matrix.

      (2) We performed microrheology experiments in these biofilms by imaging the Brownian motion of nanoparticles in the biofilm. Their trajectories indicate that, in our conditions, the viscoelastic flow of the biofilm itself largely dominates over the flow of culture medium through the biofilm matrix.

      (3) We argue that the extreme variability in reported permeability values (spanning several orders of magnitude, Kurz et al., 2023) reflects not only differences in experimental systems, but also fundamental challenges in defining and measuring permeability for viscoelastoplastic biofilms (the biofilm itself is actually flowing). Given this uncertainty, incorporating permeability into our model would introduce parameters that cannot be reliably constrained from literature or independently measured in our setup. Our approach (i.e. treating the biofilm as impermeable and focusing on flow obstruction) avoids this parametrization complexity while successfully capturing the observed dynamics.

      (4) Our model successfully predicts the observed scaling laws (φmax ∝ Q1/2, Fig. 7f) and hydraulic resistance dynamics (Fig. 3) without invoking permeability, suggesting that flow obstruction rather than flow penetration is the dominant mechanism.

      Reference: Kurz, D. L.; Secchi, E.; Stocker, R.; Jimenez-Martinez, J. Morphogenesis of biofilms in porous media and control on hydrodynamics. Environ. Sci. Technol. 2023, 57 (14), 5666−5677.

      The research suggests EPS development as a stage in biofilm growth but does not probe it using lectin staining. This makes it impossible to accurately assess the role of EPS in biofilm development and detachment processes.

      We respectfully disagree that lectin staining is necessary to assess the role of EPS in our system, and we argue that our approach using genetic mutants is superior for the following reasons. Lectin staining has significant limitations. While widely used, lectin staining (e.g., concanavalin A) is non-specific (binding not only to EPS polysaccharides but also to bacterial cell surfaces) and is non-quantitative. It can confirm the presence of polysaccharides but cannot establish causal relationships between specific EPS components and mechanical properties or detachment dynamics. We performed preliminary experiments with ConA-rhodamine (data not shown), which showed widespread presence of polysaccharides. However, this provided limited insight beyond confirming EPS production, which is well-established for P. aeruginosa PAO1 biofilms. We employed a more rigorous genetic approach to directly assess the role of EPS composition. We used Δpel and Δpsl mutants (strains lacking key exopolysaccharides that are the primary structural components of the PAO1 matrix). Our results demonstrate that both mutants show significantly reduced maximum clogging compared to wild-type. The Δpsl mutant is particularly affected, with near-complete detachment at certain flow rates. These differences directly link EPS composition to mechanical stability and detachment dynamics. This genetic approach provides causal, quantitative evidence for the role of specific EPS components in biofilm development and detachment, information that lectin staining cannot provide. We believe this addresses the reviewer's concern more rigorously than lectin staining would.

      While the force and flow are three-dimensional, the images are taken in two dimensions. The paper does not clearly explain how the 2D images are extrapolated to make 3D assessments, which could lead to inaccuracies.

      We thank the reviewer for this important observation. We would like to clarify our methodological approach. Our primary three-dimensional measurement is the hydraulic resistance R(t), obtained from pressure drop measurements across the biofilm-containing channel section. This pressure-based measurement inherently captures the three-dimensional flow obstruction caused by the biofilm. We then employ a geometric model (uniform biofilm layer on all channel walls) to convert R(t) into volume fraction φ(t).

      The two-dimensional fluorescence imaging serves to validate this model-based approach rather than being the basis for three-dimensional extrapolation. The uniform layer assumption is supported by three independent lines of evidence: (i) the excellent quantitative agreement between predicted and measured scaling laws (φmax ∝ Q1/2, Fig. 7f), obtained without adjustable parameters; (ii) the high reproducibility of φmax values across different flow rates and replicates; and (iii) the strong correlation between model-derived φ(t) from pressure measurements and integrated fluorescence intensity (Fig. 3b-d).

      We have added clarifying text in the Methods section (subsection "Data analysis for the calculation of the hydraulic resistance and volume fraction") to better explain this approach and emphasize that pressure measurements provide the three-dimensional information, with the geometric model serving as the link to volume fraction.

      Although the findings are tested using polysaccharide-deficient mutants, the results could have been analyzed in greater detail. A more thorough analysis would help to better understand the role of matrix composition on the stochastic model of detachment.

      We thank the reviewer for this suggestion. Our mutant analysis demonstrates that Δpsl and Δpel strains have significantly reduced φmax and altered detachment dynamics compared to wild-type (Fig. 8), directly linking EPS composition to mechanical stability as predicted by our model. A rigorous quantitative connection between matrix composition and the stochastic parameters (interevent times, jump amplitudes) would require: (i) substantially more sloughing events for statistical power, (ii) independent mechanical characterization of each mutant, and (iii) a mechanistic model linking EPS composition to detachment parameters. We are currently developing microrheology approaches to characterize mutant mechanical properties, which could enable such refinement in future work.

      However, this represents a substantial study beyond the scope of the current manuscript, which establishes the self-sustained sloughing-regrowth cycle and its stochastic nature. The mutant results serve their intended purpose: demonstrating that EPS composition affects detachment, consistent with our model's framework.

      Reviewer #2 (Public review):

      This manuscript develops well-controlled microfluidic experiments and mathematical modelling to resolve how the temporal development of P. aeruginosa biofilms is shaped by ambient flow. The experiment considers a simple rectangular channel on which a constant flow rate is applied and UV LEDs are used to confine the biofilm to a relatively small length of device. While there is often considerable geometrical complexity in confined environments and feedback between biofilm/flow (e.g. in porous media), these simplified conditions are much more amenable to analysis. A non-dimensional mathematical model that considers nutrient transport, biofilm growth and detachment is developed and used to interpret experimental data. Regimes with both gradual detachment and catastrophic sloughing are considered. The concentration of nutrients in the media is altered to resolve the effect of nutrient limitation. In addition, the role of a couple of major polysaccharide EPS components are explored with mutants, which leads results in line with previous studies.

      There has been a vast amount of experimental and modelling work done on biofilms, but relatively rarely are the two linked together so tightly as in this paper. Predictions on influence of the non-dimensional Damkohler number on the longitudinal distribution of biofilm and functional dependence of flow on the maximum amount of biofilm (𝜙max) are demonstrated. The study reconfirms a number of previous works that showed the gradual detachment rate of biofilms scales with the square root of the shear stress. More challenging are the rapid biofilm detachment events where a large amount of biofilm is detached at once. These events occur are identified experimentally using an automated analysis pipeline and are fitted with probability distributions. The time between detachment events was fitted with a Gamma distribution and the amplitude of the detachment events was fitted with a log-normal distribution, however, it is not clear how good these fits are. Experimental data was then used as an input for a stochastic differential equation, but the output of this model is compared only qualitatively to that of the experiments. Overall, this paper does an admirable job of developing a well-constrained experiments and a tightly integrated mathematical framework through which to interpret them. However, the new insights this provides the underlying physical/biological mechanisms are relatively limited.

      We thank the reviewer for the thorough evaluation of our work and for highlighting the tight integration between experiments and modeling. We appreciate the constructive feedback regarding the goodness-of-fit for the probability distributions.

      To address the concern that "it is not clear how good these fits are," we have added quantile-quantile (Q-Q) plots for the Gamma distribution fits of inter-event times to the Supplementary Materials (Supplementary Figure S20). These plots demonstrate that the sample quantiles track the theoretical Gamma quantiles across all flow rates (0.2, 2, and 20 μL/min), indicating that the Gamma distribution provides a reasonable approximation of the overall distributional behavior. For detachment amplitudes, we selected the lognormal distribution based on the observed high skewness and kurtosis in the data, which are characteristic signatures of lognormal processes.

      Formal goodness-of-fit tests (chi-square, Kolmogorov-Smirnov) yielded mixed results across datasets, passing for some while failing for others. This variability reflects inherent noise from measurements, discrete temporal sampling, automated detection thresholds, and intrinsic biological variability. Importantly, our goal is to capture essential distributional characteristics for input into the stochastic model, not to achieve perfect statistical fit across all individual datasets. The Q-Q plots confirm that these distributions provide reasonable approximations, and the qualitative agreement between model predictions and experimental observations validates this modeling approach. We have revised the Methods section to clarify this rationale.

      We respectfully disagree that “new insights this provides the underlying physical/biological mechanisms are relatively limited.” Beyond confirming previous findings (e.g., scaling for gradual detachment), we believe our work provides several novel mechanistic insights. First, the Pe/Da criterion enables quantitative prediction of nutrient limitation regimes, allowing systematic decoupling of nutrient effects from other phenomena in biofilm studies. Second, we demonstrate that pressure, not shear, drives sloughing detachment events, a mechanism overlooked in previous studies where the notion of “shear-induced detachment” clearly dominates. Third, we show that sloughing-regrowth cycles occur even in single channels, establishing pressure-driven fluctuations as a signature of confined biofilm growth, independent of geometric complexity. Finally, the stochastic description of sloughing demonstrates that, while instantaneous biofilm states are irreproducible, the underlying randomness is predictable, therefore addressing a fundamental challenge in biofilm research.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In the abstract, I suggest clarifying the term "bacteria development." It is unclear if it refers to bacterial growth, biofilm formation, or biofilm detachment. The concept is expressed more clearly at the end of the Introduction.

      We have modified the entire abstract to make it clearer. The abstract now explicitly establishes the key processes - growth ('nutrients necessary for growth', 'growing bacteria obstruct flow paths') and detachment ('mechanical stresses that cause detachment', 'flow-induced detachment', 'sloughing') - before using 'bacterial development' as a collective term to refer to these coupled spatiotemporal dynamics. We believe the abstract is now clear as written.

      (2) Findings from Sanfilippo et al. (2019) were slightly questioned by Padron et al. (PNAS, 2023), who discovered that H2O2 transport is responsible for fro operon upregulation.

      Thanks for the clarification, which is indeed significant. The new sentence now reads: Pseudomonas aeruginosa has been found to regulate the fro operon in response to flow-modulated H2O2 concentrations (Sanfilippo et al. 2019, Padron et al. 2023).

      (3) Additionally, Kurz et al. (2022) account for pressure buildup as the mechanism controlling sloughing.

      We respectfully disagree and note that Kurz et al. (2022) identify shear stress, not pressure buildup, as the primary mechanism controlling sloughing. Besides the title, key sentences include “opening was driven by a physical process and specifically by the shear forces associated with flow through the biofilm”, “The opening of the PFPs is driven by flow-induced shear stress, which increases as a PFP becomes narrower due to microbial growth, causing biofilm compression and rupture.” While pressure differences are measured as indicators of system state and do contribute to normal compression stresses, their mechanistic explanation emphasizes that narrowing PFPs experience increased shear rates that eventually exceed the biofilm's yield stress, triggering viscoplastic deformation and detachment. The pressure buildup is a hydraulic consequence of narrowing rather than the direct cause of sloughing. In contrast, our work demonstrates that in confined geometries, pressure differences generate tangential stresses at the biofilm-solid interface that directly drive detachment.

      (4) The flow control strategy represented in Fig. 1 is not explained and should be detailed in the Methods section.

      The methods section reads as follows. Inoculation and flow experiments BHI suspensions were adjusted at optical density at OD640nm= 0.2 (108 CFU/mL) and inoculated inside the microchannels from the outlet, up to approximately ¾ of the channel length in order to keep a clean inlet. The system was let at room temperature (25°C) for 3h under static conditions. Flow experiments were then performed at 0.02, 0.2, 2, 20 and 200 μL/min constant flow rates for 72h in the microchannels at room temperature. For the experiments at 0.2, 2, 20 and 200 μL/min, the fluidic system was based on a sterile culture medium reservoir pressurized by a pressure controller (Fluigent FlowEZ) and connected with a flow rate controller (Fluigent Flow unit). The flow rate was maintained constant by using a controller with a feedback loop adjusting the pressure in the liquid reservoir. The reservoir was connected to the chip using Tygon tubing (Saint Gobain Life Sciences Tygon™ ND 100-80) of 0.52 mm internal diameter and 1.52 mm external diameter, along with PEEK tubing (Cytiva Akta pure) with 0.25 mm inner diameter adapters for flow rate controller. The waste container was also pressurized by another independent pressure controller to reduce air bubble formation in the inlet part. For the experiments at 0.02 μL/min, we used an Harvard Phd2000 syringe pump for the flow.

      (5) Including images of the actual biofilms formed in a portion of the channel would aid in understanding the analysis presented in Fig. 2.

      Images are introduced later on (eg Figure 5). There is also supplementary material showing videos.

      (6) The boundary conditions used to calculate the stress in the developed model should be discussed. The authors should specify why biofilm porosity is neglected.

      We have added a detailed discussion in the supplementary (Section I.2).

      (7) In the first section of the Results, the authors hypothesize that heterogeneity in biofilm development could be due to oxygen limitation. However, given the high oxygen permeability of PDMS, this hypothesis is later denied by their data. It would be prudent to avoid this hypothesis initially to streamline the presentation. Additionally, the authors should specify how oxygen levels at the inlet and outlet are measured.

      We appreciate this comment and agree that streamlining would simplify the presentation. However, after careful consideration, we have chosen to retain the oxygen limitation hypothesis for the following reasons: (1) oxygen limitation is a frequently invoked mechanism in biofilm systems and deserves explicit consideration, (2) it is not immediately obvious that oxygen remains non-limiting in larger microchannels where transverse gradients could develop, and (3) systematically eliminating this plausible alternative hypothesis strengthens our mechanistic conclusion that BHI drives the observed heterogeneity. Regarding oxygen measurements: we did not directly measure dissolved oxygen concentrations. Our approach is only indirect.

      (8) What is the standard deviation of the doubling time measured at different flows (page 9)?

      We have indicated the standard deviation in the text. Note that the graph shows the SEM.

      (9) What is the "zone of interest" in the channel mentioned on page 9?

      We have added the following sentence to clarify: To further understand this effect, let us consider the mass balance of biofilm in the zone of interest -- the zone where biofilm grows in between the two UVC irradiation zones -- in the channel.

      (10) Minor and major detachment events should be classified based on a defined threshold or criteria, and their frequency should be measured.

      We appreciate the reviewer's concern about quantitative rigor. However, we respectfully disagree that imposing arbitrary thresholds to classify 'minor' vs. 'major' events would improve our analysis. Detachment events in our system span a continuum of magnitudes, and any threshold would be artificial and potentially misleading. Our quantitative characterization of detachment dynamics is provided through the statistical analysis of interevent times, which we show follow a gamma distribution. This stochastic framework captures the full spectrum of detachment behavior without requiring arbitrary binning. The terms 'minor' and 'major' in our manuscript are used qualitatively to illustrate the range of observed phenomena, not as formal classifications.

      (11) Have the authors identified a reason for the peaks in the volume fraction in the Δpsl mutants at the highest flow rate?

      The biofilm thickness following these sloughing events is below our detection limit, consistent with a residual layer of cells. However, these cells grow, leading to a time window where the fraction is measurable, before a new detachment event occurs. Our understanding is that the psl mutant forms a weaker matrix with a much lower threshold for sloughing.

      (12) The fit of the probability density function for the relative density function does not match the data well. The authors should comment on this.

      We have added quantile-quantile (Q-Q) plots for the Gamma distribution fits of inter-event times to the Supplementary Materials (Supplementary Figure S20). These plots demonstrate that the sample quantiles track the theoretical Gamma quantiles across all flow rates (0.2, 2, and 20 μL/min), indicating that the Gamma distribution provides a reasonable approximation of the overall distributional behavior. For detachment amplitudes, we selected the lognormal distribution based on the observed high skewness and kurtosis in the data, which are characteristic signatures of lognormal processes. Formal goodness-of-fit tests (chi-square, Kolmogorov-Smirnov) yielded mixed results across datasets, passing for some while failing for others. This variability reflects inherent noise from measurements, discrete temporal sampling, automated detection thresholds, and intrinsic biological variability. Importantly, our goal is to capture essential distributional characteristics for input into the stochastic model, not to achieve perfect statistical fit across all individual datasets. The Q-Q plots confirm that these distributions provide reasonable approximations, and the qualitative agreement between model predictions and experimental observations validates this modeling approach. We have revised the Methods section to clarify this rationale.

      (13) Additionally, the simulated fraction appears very flat, with limited detachments compared to experiments. Why?

      The model captures the essential dynamics of growth-detachment cycles, including the characteristic timescales and volume fraction ranges. Some event-to-event variability in the experimental data likely reflects biological stochasticity not captured by our current approach—for example, variations in local biofilm mechanical properties or matrix composition that affect the precise stress at which sloughing occurs. While incorporating such biological variability as a stochastic parameter would improve detailed agreement, it would require extensive additional characterization beyond the scope of this study. The current model successfully reproduces the key qualitative and semi-quantitative features of the system.

      (14) The methods section should include a more detailed explanation of how the model was validated against experimental data.

      Model validation was performed by comparing predicted biofilm volume fraction time series and sloughing event statistics against experimental observations across multiple flow rates. The model reproduces the characteristic growth-sloughing cycles, timescales, and steady-state volume fractions without additional parameter fitting beyond the experimentally measured distributions.

      (15) It would be useful to include information on the reproducibility of the experiments and any variations observed between replicates.

      Experiments were performed in N=3 biological replicates. Individual time series for all replicates are shown in Supplementary Figures, demonstrating consistent behavior across replicates.

      (16) A discussion of the limitations of the study, particularly regarding the assumptions made in the modeling and their potential impact on the results, would strengthen the paper.

      We have added a discussion on why we chose to neglect the porosity of the biofilm, and strengthened parts on the uniform biofilm layer assumption.

      Reviewer #2 (Recommendations For The Authors):

      Page 2: "A vast" —> "The vast"

      Changed.

      The text and line widths on many of the figures are far too small. I printed it out at normal size, but had to look at a PDF and magnify to actually see what the graphs are showing. Fig. 9c is particularly illegible.

      Changed.

      Fig. 1 caption "photonic" —> "optical"?

      Changed

      Can you spell out the actual mathematical definition of 𝜙 on page 5 when it is introduced? Currently it just says the "cross section volume fraction of the biofilm", but that seems potentially ambiguous. It is valid to say that this is "fraction of the cross section occupied by the biofilm"?

      Changed

      Bottom of page 5: can you state the physical interpretation of the assumption that M is bounded between 0 and 1. i.e. that growth is larger than detachment?

      There is a comment on that in the paper. It reads “In assuming that M ∈ ]0, 1] and eliminating cases where M > 1, we have not considered situations of systematic detachment 𝜙equ = 0 for any value of the concentration, since this is not a situation that we encountered experimentally.” This comes just after presenting the expression on the only non-trivial steady-state, as it becomes easier to explain the consequences of the initial choice at this point.

      Currently the choice of detachment initially used in the model is a bit confusing. You say that you are going to assume a (1-𝜙)-1 model for simplicity (bottom of page 5), but then later you find that the (1-𝜙)3/4 model is more accurate (page 16). Since the latter has already been confirmed in numerous other studies, why not start with that one from the beginning?

      We thank the reviewer for this important question, which highlights an area where our presentation could be clearer. We did not find that the (1-φ)-3/4 model is "more accurate." Rather, we deliberately chose the (1-φ)-1 scaling because it captures pressure-induced detachment, which we hypothesized would dominate in confined flows where biofilms clog a large portion of the channel. The (1-φ)-3/4 scaling, widely used in previous studies, describes shear stress at the biofilm/fluid interface and was developed primarily for reactor systems where pressure effects are negligible. Our analysis on page 16 validates this choice by demonstrating that pressure stress indeed exceeds shear stress when volume fraction is large, which corresponds to late Stage I and all of Stage II precisely where our model is applied. The excellent quantitative agreement between predicted and measured φmax values across flow rates (Fig. 7f, Table 1) further supports the (1-φ)-1 scaling. We recognize that our initial presentation may have suggested the (1-φ)-1 choice was merely for "simplicity." We have revised this section to emphasize that this scaling was chosen specifically to capture pressure-driven detachment in confined geometries, with the physical justification provided by the stress analysis that follows. We have also clarified our ideas on page 16 to express clearly that (1-φ)-3/4 is never used. We could alternatively use a multi-modal detachment function combining both scalings, but the data do not require this additional complexity.

      In general, the models you derived in this study could be better contrasted with that from previous works. e.g. can you compare your Eqn (4) with the steady-state solutions obtained by other previous studies? Is this consistent with previous works or different? (aside from framing the biofilm thickness in terms of 𝜙)

      We are currently working on a paper dedicated to modeling biofilm development in confined flows, which will do a better job at comparing approaches.

      Top of page 6 - you assume K* = 0.1 - Does this assume that cells grow at half the rate in 0.1X BHI as they do in 1X BHI? Has this been confirmed experimentally or is this just a guess?

      This was estimated rather than measured directly. Model predictions were a lot more sensitive to the Damköhler number, than to the value of K.

      "radial" is used widely in this paper, but you are using a square geometry. Is "transverse" a better choice?

      Yes it clearly is. It’s been changed.

      Fig 3. Are panels (a) and (b) showing different bioreps of the same condition? If so, please spell that out in the caption.

      There was an error here in the caption of fig a. This has been changed. The correspondence is between a and c, and these are exactly the same, not bioreps.

      In multiple places it noted that the change in hydraulic resistance is correlated with the "change in biofilm colonization." Why not demonstrate this directly using a cross correlation analysis? How is the latter connected to the 𝜙 parameter? (e.g. is this d(𝜙)/dt?)

      We thank the reviewer for this suggestion. To clarify: φ(t) represents the volume fraction of biofilm in the channel. We measure this in two independent ways: (1) φ(t) from hydraulic resistance (black line in Fig. 3) i.e. calculated from pressure measurements using φ = 1 - √(R₀/R(t)), assuming uniform layer growth (see Methods section "Data analysis for the calculation of hydraulic resistance and volume fraction") and (2) φ(t) from fluorescence (green squares in Fig. 3) i.e. estimated from integrated GFP intensity or image segmentation of the glass/liquid interface. The reviewer is correct that we should quantify this relationship directly. We have now added correlation analysis between these two independent measurements of φ (new Supplementary Figure S21). The analysis shows strong positive correlation, with r-values ranged from 0.68 to 0.77 across all flow rates. This validates two key aspects of our approach: (1) the uniform layer assumption used to convert R(t) to φ(t) is reasonable, and (2) the pressure-based measurements accurately capture the dynamics visible in fluorescence imaging, including both growth phases and sloughing events. The strong agreement is particularly notable given that these measurements probe different aspects of the biofilm: hydraulic resistance is sensitive to the three-dimensional obstruction of flow, while fluorescence captures primarily the biofilm attached to the glass surface within our focal plane. Their correlation supports the model assumptions. We have revised the manuscript to clarify this relationship and present the correlation analysis.

      Top of page 9 - a doubling time of 110 mins is reported in liquid culture - is this in shaken or static conditions? Can you provide some data on how this was calculated? (e.g. on a plate reader?) Do you think your measurements in the microfluidics could be affected by attachment/detachment of cells, rather than being solely driven by division. It is curious that your apparent growth rate varies by a factor of two across the different flow rates and there is not a monotonic dependency. Both attachment and detachment would depend on the flow rate (with some non-trivial dependencies).e.g. https://www.pnas.org/doi/10.1073/pnas.2307718120 https://doi.org/10.1016/j.bpj.2010.11.078

      Given that your doubling time in the microfluidics is sole based on changes in cell number (rather than directly tracking cell divisions) it seems possible your results here are measuring the combined effect of growth, attachment and detachment, rather than just growth.

      We agree with those comments regarding the doubling time measurement. We have added a description of how we performed the doubling time measurement in the Methods section.

      Page 9 - you discuss the role of EPS here, but the effect of EPS is not demonstrated here and this is muddled with a discussion about the non-linearity of the putative dependency. Maybe this would be on a firmer footing if you save the discussion of EPS for the section on the Psl and Pel mutants?

      Changed.

      Middle of page 9: Please define what "smooth detachment" means and contrast it with catastrophic sloughing. Also, please define what you mean by "flow, seeding, and erosion" detachment are and how these three things differ from one another.

      We have clearly defined each term in the revised version.

      The results from wavelet scalograms seem to be underutilised and not well described. Can you clearly say what time series this analyses has been calculated on the caption? e.g. hydraulic resistance? Other than simply pointing out the "blue stripes", what can be gained from this analyses that could not be obtained with another method? It would be great if the basic features of this plot could more fully discussed (e.g. is the curved envelope at the bottom caused by edge effects?)

      We have improved the text, captions and method section following the reviewer’s comment.

      Fig. 5 a and b - please list the time at which each of these images were taken. Do these have the same dt between the two sets of images?

      Yes the dt is the same (30 minutes). It’s been indicated in the caption.

      Fig. 6: you have significant 2D variation in the biofilm width along the length of the channel. The relative contribution of pressure and shear based detachment will be different at different positions along the length. However, this variation is ignored in your model. Can you please comment on this in our manuscript and how it might affect the interpretation of your results? e.g. would the longitudinally averaged description yield the same result as one that takes the geometry into account (on average)?

      Our model indeed assumes longitudinally averaged properties. A more detailed spatially resolved model would be valuable for capturing heterogeneities and will be explored in future work.

      Bottom of page 11: you say standard deviations are in the range of 10-3. How does this jibe with the error bars on the middle flow rate in Fig. 7e?

      This extremely low standard deviation only applies to the maximum value of 𝜙 and is a completely different measurement from the whisker boxes presented in fig7e.

      Fig. 7: You are calculating the "Fraction" here. Is this "𝜙"? If so, can you put that on the y-axis instead? You calculate the volume fraction two different ways e.g. with hydraulic resistance and with imaging. Is only one of these shown in (e)? Is the same powerlaw dependence shown in (f) conserved when the other measurement of the "fraction" is used? Can you include both in Fig. 7e?

      We have modified the axis and indicated 𝜙.

      (e) is calculated only from hydraulic resistance. This is the most precise measurement to evaluate 𝜙 quantitatively.

      Related to the previous comment: Some of the estimates of 𝜙max in Table 1 are obtained by fitting the model to integrated fluorescence data (Fig. 2b), while others are estimated from measurements of the hydraulic resistance. The former yields non-unique sets of parameters. Can the biofilm fraction instead actually be estimated directly from fluorescent imaging by segmenting biofilm and directly calculating how much of the cross section is occupied by cells on average across the length? This seems like a more direct measure of this quantity. Given there are multiple ways of estimating the same parameter, it would be better consistency checking to make sure that different methods actually yield the same result.

      We have now added in Fig S21 a direct comparison of these two measurement methods. These are strongly correlated. Microscopy is more direct but only provides 2D pictures. Hydraulic resistance provides a 3D measurement, but relies on a model of biofilm distribution. Both are imperfect, but correlate well. In particular, we see that the 2D measurement does capture sloughing.

      You cite a large number of supplemental figures (e.g. Fig. S21 on page 12), but the figures in your SI only go up to 11.

      We have revised references to supplementary figures.

      Bottom of page 11: Your data from liquid culture suggests that your psl mutant grows at half the rate of WT cells. Is that consistent with your microfluidic data (e.g. Fig. 8)? If not, might this be a sign that your growth rate analyses from the microfluidics might be affected by attachment/detachment? (see comment above) Psl cells should detach much more easily.

      The approach taken to measure doubling times in the microfluidic system does not rely on the macroscopic measurements presented in figure 8, but rather on the approach presented in fig 4. These measurements require specific imaging (different magnification and time stepping) and we did not perform such experiments for the mutants.

      In analyses of sloughing, you fit the times between the jumps and the relative amplitude. Are these two random variables correlated with one another? Might that influence your results? Your methods say that "jumps were identified through through the selection of local maxima" of the derivative. Do you to say "minima" here? Did you keep all local maxima/minima or did you have a threshold?

      These are two random variables, not correlated with another. This is an assumption, and it would be interesting to analyze whether these are correlated. To perform this analysis, we believe that we would first need to acquire even more data and more replications to improve the statistical analysis.

      Yes, it was minima (in the code we make everything positive, hence the confusion).

      Yes, there is a threshold on the value of the jump itself. This value is extremely low and essentially filters out noise.

      Fig. 9 - can you make it clearer in the caption what timeseries you are analysing here? I understand from the methods this that is the "volume fraction." The data/fits are difficult to see in Fig. 9 b and impossible to see in Fig. 9c because the green bars get in the way of the other two data sets. Can this visualisation be improved? It is not clear to me how good of a job the Gamma and log-normal fits are actually doing.

      We have clarified that histograms are calculated from all experiments/replicates.

      We have slightly modified the graph to make it clearer. This comparison is intrinsically hard, partly because it compares discrete data with continuous PDFs.

      Aside from noting the results from the stochastic sloughing model are 'strikingly similar to experimental data', which seems to be based on a qualitative analysis of the lines in Fig. 7 d, e, and f. However, experimental data is not plotted in the same graph nor is the experimental data that we should be comparing this to cited in the text/caption.

      We have added a note in the caption to indicate which figure it can be compared to.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Matsen et al. describe an approach for training an antibody language model that explicitly tries to remove effects of "neutral mutation" from the language model training task, e.g. learning the codon table, which they claim results in biased functional predictions. They do so by modeling empirical sequence-derived likelihoods through a combination of a "mutation" model and a "selection" model; the mutation model is a non-neural Thrifty model previously developed by the authors, and the selection model is a small Transformer that is trained via gradient descent. The sequence likelihoods themselves are obtained from analyzing parent-child relationships in natural SHM datasets. The authors validate their method on several standard benchmark datasets and demonstrate its favorable computational cost.

      They discuss how deep learning models explicitly designed to capture selection and not mutation, trained on parent-child pairs, could potentially apply to other domains such as viral evolution or protein evolution at large.

      Strengths:

      Overall, we think the idea behind this manuscript is really clever and shows promising empirical results. Two aspects of the study are conceptually interesting: the first is factorizing the training likelihood objective to learn properties that are not explained by simple neutral mutation rules, and the second is training not on self-supervised sequence statistics but on the differences between sequences along an antibody evolutionary trajectory. If this approach generalizes to other domains of life, it could offer a new paradigm for training sequence-to-fitness models that is less biased by phylogeny or other aspects of the underlying mutation process.

      Thank you for your kind words.

      Weaknesses:

      Some claims made in the paper are weakly or indirectly supported by the data. In particular, the claim that learning the codon table contributes to biased functional effect predictions may be true, but requires more justification.

      Thank you for this comment, which made us realize that we had not adequately explained the key insight of Figure S3. We have expanded the caption of Figure S3 to clarify:

      “DASM selection factors match the pattern seen in experimental measurements, while masked language models show artifacts from the codon table.

      The experimental data (left two panels) show a slight decrease in median scores for amino acids requiring multiple nucleotide mutations (“multiple”) versus single mutations (“single”).

      DASM captures this pattern, showing similar distributions for both categories.

      In contrast, AbLang and ESM assign radically lower scores to multinucleotide amino acid substitutions, consistent with the masked language modeling objective learning codon-level mutation probabilities as described in the main text (Figure 1a).”

      This figure directly supports our claim: the experimental fitness data show similar distributions for single-mutation vs multiple-mutation amino acids, yet AbLang2 and ESM assign dramatically different scores to these groups, while DASM does not.

      Additionally, the paper could benefit from additional benchmarking and comparison to enhanced versions of existing methods, such as AbLang plus a multi-hit correction.

      It's an interesting idea to consider enhancing existing models. However, this approach faces some challenges. Most fundamentally, it is difficult to recast AbLang and other such models in an evolutionary framework: the masked language objective is simply not an evolutionary one. We have written a whole paper working to do this (https://doi.org/10.1371/journal.pcbi.1013758) and the results were middling despite our best efforts. Specifically regarding multihit, the effects of multihit are minor compared to the codon table effects, and those require the structure of codon-based evolutionary model.

      Further descriptions of model components and validation metrics could help make the manuscript more readable.

      We have clarified several aspects of the model in the revision: we now describe the Thrifty neutral model in the introduction, clarify the transformer architecture and wiggle activation function in the Methods, and explain the joint branch-length optimization procedure.

      In the introduction we now describe Thrifty:

      “This fixed model uses convolutions on 3-mer embeddings to deliver wide context sensitivity without needing a large number of parameters: the variant we use has around the same number of parameters as the classic S5F 5-mer model.”

      In the Methods we clarify the architecture:

      “We parameterize the DASM f using the standard transformer-encoder architecture: an amino-acid embedding, sinusoidal positional encodings, and PyTorch's TransformerEncoder module.

      The only non-standard component to this architecture is a custom “wiggle” activation function to the output layer that prevents extreme selection factors as previously described.

      This function asymptotes to zero for highly deleterious mutations and grows sub-linearly for beneficial ones.”

      And the joint optimization:

      “This joint optimization is performed cyclically, in which a complete cycle consists of neural network optimization followed by branch length optimization for every parent-child pair.

      The parent sequence and the child sequence are pre-estimated, fixed, and used as training data.

      The branch lengths are independent and so are optimized in parallel.”

      Reviewer #2 (Public review):

      Summary:

      Endowing protein language models with the ability to predict the function of antibodies would open a world of translational possibilities. However, antibody language models have yet to achieve breakthrough success, which large language models have achieved for the understanding and generation of natural language. This paper elegantly demonstrates how training objectives imported from natural language applications lead antibody language models astray on function prediction tasks. Training models to predict masked amino acids teaches models to exploit biases of nucleotide-level mutational processes, rather than protein biophysics. Taking the underlying biology of antibody diversification and selection seriously allows for disentangling these processes through what the authors call deep amino acid selection models. These models extend previous work by the authors (Matsen MBE 2025) by providing predictions not only for the selection strength at individual sites, but also for individual amino acid substitutions. This represents a practically important advance.

      Strengths:

      The paper is based on a deep conceptual insight, the existence of a multitude of biological processes that affect antibody maturation trajectories. The figures and writing a very clear, which should help make the broader field aware of this important but sometimes overlooked insight. The paper adds to a growing literature proposing biology-informed tweaks for training protein language models, and should thus be of interest to a wide readership interested in the application of machine learning to protein sequence understanding and design.

      Thank you for your kind words.

      Weaknesses:

      Proponents of the state-of-the-art protein language models might counter the claims of the paper by appealing to the ability of fine-tuning to deconvolve selection and mutation-related signatures in their high-dimensional representation spaces. Leaving the exercise of assessing this claim entirely to future work somewhat diminishes the heft of the (otherwise good!) argument.

      This is an interesting idea! However, it seems to us that this approach has some fundamental limitations. Existing models operate on amino acid sequences with no nucleotide representation, so while they can be implicitly biased by the codon table, they have no signal to separate selection from effects related to the codon table and SHM rates.

      We interpret this comment as proposing that we could use fine-tuning on functional data to pull out the selection components (that would only affect the functional data) versus the mutation component. That sounds like an interesting research project. We would be concerned that there are correlations between mutability and selective effects (e.g., CDRs are both more mutable and under different selection), creating identifiability problems unless separate data sources are used as we do here.

      Additionally, the fine-tuning approaches we are aware of are taskspecific: they require labeled data from a specific assay (binding to antigen X, expression in system Y) that may or may not relate to the general evolutionary selection signal. Also, such approaches are limited to the specific data used and may not do a good job of guiding the model to a signal that is not present in the training data.

      By structuring the model as we do, we obtain the evolutionary interpretation directly from phylogenetic signal without requiring taskspecific supervision.

      In the context of predicting antibody binding affinity, the modeling strategy only allows prediction of mutations that improve affinity on average, but not those which improve binding to specific epitopes.

      We agree, and this is fundamental to any general purpose model. Predictions of binding patterns for a specific target requires information about that target to be specified in the training data. We look forward to developing such task-specific models in the future.

      We have added a paragraph to the Discussion clarifying this limitation:

      “The current generation of DASM model does not use any antigen-labeled training data.

      The signal that it leverages to infer some limited ability to predict binding comes from natural affinity maturation.

      This affinity maturation comes through natural repertoires and so represents a mix of all of the antigens to which the sampled individuals have been exposed.”

      Reviewer #3 (Public review):

      Summary:

      This work proposes DASM, a new transformer-based approach to learning the distribution of antibody sequences which outperforms current foundational models at the task of predicting mutation propensities under selected phenotypes, such as protein expression levels and target binding affinity. The key ingredient is the disentanglement, by construction, of selection-induced mutational effects and biases intrinsic to the somatic hypermutation process (which are embedded in > a pre-trained model).

      Strengths:

      The approach is benchmarked on a variety of available datasets and for two different phenotypes (expression and binding affinity). The biologically informed logic for model construction implemented is compelling, and the advantage, in terms of mutational effects prediction, is clearly demonstrated via comparisons to state-of-the-art models.

      Thank you.

      Weaknesses:

      The gain in interpretability is only mentioned but not really elaborated upon or leveraged for gaining insight.

      We are also excited about the ability of these models to provide interpretable predictions. We have dedicated an entire paper to this direction: “A Sitewise Model of Natural Selection on Individual Antibodies via a Transformer-Encoder" in MBE (https://doi.org/10.1093/molbev/msaf186). The interpretations offered by that paper overturn some of the oversimplified dogma about how natural selection works in antibodies (purifying in FWK and diversifying in CDR), giving a more nuanced sitewise perspective. The paper also highlights the importance of specific structural features of the antibodies.

      This eLife paper, on the other hand, is focused on comparison to antibody language models and benchmarking zero-shot prediction on functional tasks.

      We have better highlighted this new paper in our revision with:

      “We have dedicated a companion paper to leveraging this interpretability to provide new perspectives on the operating rules of affinity maturation (Matsen et al., MBE 2025): that work provides a nuanced sitewise perspective on natural selection in antibodies that challenges classical oversimplified views of selection patterns.”

      The following aspects could have been better documented: the hyperparametric search to establish the optimal model; the predictive performance of baseline approaches, to fully showcase the gain yielded by DASM.

      We appreciate the concern and the desire to reveal all the factors that lead to a strong performance result. For this particular paper, we feel that this is less of a concern because we are optimizing according to an evolutionary objective function and then evaluating according to a functional one. We now describe how other than model size, hyperparameters stayed the same as in our previous paper (Matsen et al., MBE 2025).

      Regarding baseline approaches, our previous paper includes comparisons to simpler models for the evolutionary objective. Here we focus on comparison to antibody language models for functional prediction. Comparing between state-of-the-art models is the standard practice for papers in this field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      We recommend modest amounts of revision, discussed below:

      Major comments:

      (1) In the first section of the results, there is extensive discussion on shortcomings of existing antibody language models like AbLang2 that seems to associate all of the performance gap with the inability to separate non-synonymous mutations separated by 1 or 2+ substitutions.

      In reality, some of the lower likelihoods in the 2+ substitution case could actually reflect real fitness deficits (while others could indeed be rarer occurrences in the training data). The authors should either moderate these claims or do an analysis that leverages antibody deep mutational scanning data to show that, conditioned on the fitness of the antibody (probably expression) being the same (either all high or all low), AbLang2 still artefactually considers rarer-training/less-codon-accessible variants to be less fit.

      As described above, we believe that this is addressed by Figure S3, but if not please correct us.

      (2) Some in the machine learning for antibody community might view the set of benchmarked datasets to be incomplete and somewhat arbitrarily selected, though we do think this is a good start, and the results are promising. A dataset commonly used in this field that is missing from this paper is from Shehata et al. (https://pubmed.ncbi.nlm.nih.gov/31553901/). A binding affinity experiment that is also commonly used in the field is from Phillips et al. (https://elifesciences.org/articles/71393) - this dataset measures combinatorial changes of framework regions on binding, which may be especially relevant here.

      We're glad to have the opportunity to clarify this, thanks.

      We based our evaluations on the April 2024 version of the FLAb benchmarking project (https://doi.org/10.1101/2024.01.13.575504) which preceded our work and thus was not subject to selection bias by us. We took the largest data sets in that repository. After this we became aware of the rich data sets offered by the Whitehead lab that provided binding measurements for many variants for a number of antigens, and added that to the evaluation set.

      We have clarified this in the manuscript:

      “We based our evaluations on the April 2024 version of the FLAb benchmarking project, which preceded our work and thus was not subject to selection bias by us.

      We also benchmarked high-throughput binding data (more recent than FLAb) from the Whitehead lab that provided affinity measurements across many variants and antigens.”

      The Shehata dataset is interesting but doesn't fit so much in the DASM mold: it is a survey of biophysical properties across many independent antibodies rather than a deep investigation of point mutants of a smaller collection of focal antibodies.

      FLAb has grown to include the Phillips dataset. We are working full-tilt on the next version of DASM and will be including many other datasets in our paper on DASM2. Thanks for the tip!

      (3) Similar to the above comment, we were also extremely curious as to why the authors did not test data from DeWitt et al. (https://pubmed.ncbi.nlm.nih.gov/40661619/). Instead, the authors only make a cryptic reference to this study on lines 201-6, but we could not even find a figure describing the results discussed on these lines. It would be great to actually include this data.

      We agree, however, our model is for human rather than mouse. We would like to train a mouse model in the future but have not yet lined up the appropriate data.

      (4) The authors should comment on potential data leakage if the SHM trajectories used in training have a similar sequence or antigen similarity to the benchmark expression/binding datasets.

      This is a good question that we should clarify. Our model is trained only on evolutionary trajectories and not functional data. Evaluation is then done on functional data without fine-tuning. Because these evaluation data are categorically different from the training data and thus data leakage is not a problem. Recall that our model is zero-shot: it only considers evolutionary trajectories and not functional data as such. In a similar way, other self-supervised models such as MLMs do not exclude seeing an antibody in the training data when they are doing functional prediction.

      We have clarified this in the manuscript with

      “Because the DASM is trained exclusively on evolutionary trajectories rather than functional measurements, evaluation on expression and binding benchmarks is strictly zero-shot with no risk of data leakage.”

      Relatedly, what happens if this approach is applied to completely de novo antibodies?

      We direct this reviewer to the Shanehsazzadeh dataset that involves antibodies that were suggested by an AI algorithm rather than observed in nature.

      If the reviewer is referring to completely synthetic antibody molecules, such as those generated by inverse folding, we have not attempted this.

      (5) It makes sense that you included the multihit correction as a response to your earlier instantiation (without this correction) underestimating the probabilities of multiple mutations in a codon associated with a single amino acid substitution (lines 476-477).

      However, this could potentially make for a somewhat unfair comparison to existing methods: if, say, we took AbLang (or another comparator) and also applied a multi-hit correction (even in some naive way at inference time), how would that compare to DASM? If this comparison favors DASM, it would show that models need more than just such a correction on top of existing methods to do good sequence scoring--which would only amplify the impact of the results.

      Thank you for this suggestion. We believe that we have addressed it in the response to the public reviews, but please let us know if not.

      Minor comments:

      (1) It would be worth explicitly defining/summarizing the mutation model used in the study, e.g. giving an overview of Thrifty in the introduction or where it first appears.

      Thanks, we have done this:

      “Our approach separates mutation and selection processes by encoding functional effects in a Deep Amino acid Selection Model (DASM) while explicitly modeling mutation using a separate fixed model trained on neutrally evolving data.

      This fixed model uses convolutions on 3-mer embeddings to deliver wide context sensitivity without needing a large number of parameters: the variant we use has around the same number of parameters as the classic S5F (Yaari et al., 2013) 5-mer model.”

      (2) Paragraph starting on line 58: it sounds like you're suggesting that masked deep learning models will learn certain features of genomes in a certain order. We suggest that you weaken the language, giving examples of various things the model could learn, not implying that such models will necessarily learn the most useful features after the less useful ones.

      We have fixed this by removing the "First... Second... Third... Finally" ordering:

      “It could memorize the germline genes and learn about the probabilities of V(D)J recombination.

      It could learn the codon table, as according to this table some aminoacid mutations are much more likely than others. It could learn rates of somatic hypermutation...

      It could also learn about the impact of amino acid mutations on antibody function through natural selection in the course of affinity maturation, which is the desired signal.

      However, this desired signal is confounded by the preceding factors.”

      (3) Line 72: You make a strong claim that existing models conflate mutation and selection without knowing for sure that they didn't successfully learn these components separately (it seems this would require a lot of mechanistic interpretability). The language could be softened here.

      We believe that we have addressed this in the response to public reviews, but please let us know if not.

      (4) Line 79: Say a bit more about the separate fixed mutation model here. Why shouldn't we worry about this choice (especially the word "fixed") biasing your results? Does the empirical performance of your method suggest this doesn't really matter?

      We have added to the description of the fixed mutation model, as described above.

      As described in the public response, training SHM models on out-of-frame sequences is an established methodology for characterizing mutation in the absence of selection. In principle one could jointly train a model of SHM and selection, but one could have identifiability problems as there is a correlation between more mutable sites (e.g. in the CDRs) and those under relaxed selection. Using out-of-frame sequences gives a clean an independent description of the SHM process.

      (5) Line 81: on what benchmarks does it outperform? State briefly.

      Great suggestion. Done:

      “The DASM, trained on substantially less data, outperforms AbLang2 and general protein language models including ESM2 and ProGen2-small. This outperformance holds on the largest benchmark datasets of the FLAb collection and on recent high-throughput binding assays.”

      (6) Paragraph starting on line 90: The topic sentence reads a bit vague to us. Do you mean that you want to learn the extent to which models are regurgitating nucleotide similarity of AAs in determining the scores associated with AAs at masked sites?

      Thank you. We have updated to

      "We first sought to understand the extent to which processes such as neutral mutation rate and the codon table influence antibody language model prediction at masked sites."

      (7) Paragraph starting on line 108: feels speculative and maybe better for the discussion...

      We appreciate this comment, but we have decided to keep the content where it is. Although this would make sense as a Discussion item we feel like it fits well here right next to the evidence, and the structure of our Discussion doesn't really have a place for it.

      (8) Paragraph starting on line 116: don't say "sequences from [12]" or "method of [15]." Explain what these are before giving the citation.

      Whoops! Thanks. We have fixed these.

      (9) Line 134: Consider giving a brief definition of perplexity?

      Thanks. We added our favorite definition:

      “Perplexity (as defined in the Methods) is the standard way of evaluating the plausibility of a sequence according to a model: it is the acrosssite geometric mean of the inverse probability of the observed amino acid.”

      (10) Line 154: A citation here could be useful to support the claim that these models are learning phylogeny.

      We have replaced with the more clearly established "codon table":

      “We implemented a model to learn amino-acid preferences of antibodies without being influenced by germline genes, the codon table, or SHM biases.”

      (11) Lines 161-162: Given that phylogenetic inference methods can be tough to scale, we're curious how you managed to get 2 million PCPs from the data? Did you construct a bunch of different phylogenies (in > parallel)?

      Indeed! We now clarify in the methods section that these trees were run in parallel across clonal families:

      “As in our previous work, tree inference and ancestral sequence reconstruction were performed per clonal family with the K80 substitution model...

      Because these clonal families are independent these phylogenetic inferences were run in parallel.”

      (12) Line 173-174: Can you say more about the joint optimization of the branch lengths? Are you conditioning on a phylogenetic tree topology only, and leaving the branch lengths unknown? Do you account for the fact that these branch lengths in the same phylogenetic tree aren't independent?

      Thanks for pointing out the need to clarify these points. We have done so in the methods section and provided a pointer to the methods section in the main text.

      In the main text we now say:

      “We trained DASMs of several sizes (~1M, ~4M, ~7M) using joint optimization of branch length t and parameters of the DASM (see Methods for details).”

      And in the Methods:

      “This joint optimization is performed cyclically, in which a complete cycle consists of neural network optimization followed by branch length optimization for every parent-child pair.

      The parent sequence and the child sequence are pre-estimated, fixed, and used as training data.

      The branch lengths are independent and so are optimized in parallel.”

      (13) Line 358: Yes, in a trivial sense, separating mutation and selection means that we know exactly how each of those two components has been learned. We would be curious if you could say anything about mechanistic interpretability within the deep learning selection model. If not, could this be a future research direction?

      We believe that we have addressed this in the response to public reviews, but please let us know if not.

      (14) Lines 384-386--indeed. Do you have any proposals for how a phylogeny could be constructed at this scale?

      As above this is not one big phylogeny but many, which invites parallelization.

      Reviewer #2 (Recommendations for the authors):

      (1) I agree that a full study of fine-tuning strategies for all possible alternative models is beyond the scope of the paper. However, a little bit of fine-tuning would go a long way to demonstrate how easy (or hard) it is to extract the relevant signal from a general protein language model embedding.

      As described in our response to the public reviews, we appreciate this point but have decided to focus on the core novelty of the paper and leave fine-tuning experiments to future work.

      (2) The authors might want to add some discussion about what signals their models capture with regard to binding affinity (averages), and how this limitation might be addressed in future work.

      As described in our response to the public reviews, we have added a paragraph to the Discussion clarifying this limitation.

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction: I think more references have to be provided re: Antibody "foundation" language models, e.g. adding AntiBERTy and the two versions of AntiBERTa.

      We have added citations to those two models, although we weren't sure what the second version of AntiBERTa was. There are very many antibody language models. If we could use number ranges we would cite a dozen or more, but I hesitate to add many of them in the eLife format, which has parenthetical citations. If there are others that you consider essential don't hesitate to suggest them.

      (2) A key point of the approach is the disentanglement of “mutation” and “selection”, as mentioned in the introduction. However, the explanation of what the authors mean by mutation and selection comes only later. I would anticipate it in the introduction for clarity.

      This is a great point. The revised intro has this in the second sentence:

      “Natural antibodies are generated through V(D)J recombination, and refined by somatic hypermutation and affinity-based selection in germinal centers.”

      and the "While the masked..." paragraph now more clearly calls out selection.

      (3) Line 133: expression of what? Could the authors also explain mechanistically why expression should be impacted by a mutation? In what conditions do these data sample expression?

      We have clarified that it is expression in a phage display library:

      “To do so, we used the largest dataset of the FLAb collection of benchmarks, which measures the effect of single mutations on expression in a phage display library.”

      (4) Line 142: Clarify that 0.49 and 0.3 are correlation coefficients. Also, what type of correlation coefficient is this?

      Thanks for the catch! They are Pearson correlations as we now describe.

      (5) Line 173: The hyperparametric search should have been more documented (with a description of how it was carried out and plots).

      As described in our response to the public reviews, we are optimizing according to an evolutionary objective function and then evaluating according to a functional one. Other than model size, hyperparameters stayed the same as in our previous paper (Matsen et al., MBE 2025).

      (6) Line 358: The authors say that 'DASMs provide direct interpretability'. However, this is not really inspected. A valuable addition would be to show how such interpretability is made possible, how it can recapitulate existing biological knowledge or provide hints for antibody engineering.

      As described above, this is addressed in detail in our previous paper.

      (7) Line 398: 'Inferred insertions or deletions were reversed, so that all sequences align to the naive sequence without gaps.' Could the authors comment on whether this is a limitation of the approach, why it wasn't dealt with and whether it could be the direction of future work?

      Funny you should mention this! We have been planning out such an extension in detail recently. We have added a sentence in the discussion:

      “We also have plans to extend the DASM framework to estimate the effect of natural selection on insertion and deletion events.”

      (8) Line 430-431: Could the authors clarify 'shared' over what? Also, I believe these two lines really describe the DASM architecture. This should be spelt out more clearly and tied to the description provided in lines 173-175. A diagram of the architecture would be a valuable addition to provide a full picture of the model (this could be added to the general diagram of the modelling approach of Figure S8).

      We have clarified in the text that this is indeed a description of the DASM architecture -- thanks for the catch:

      “We parameterize the DASM f using the standard transformer-encoder architecture: an amino-acid embedding, sinusoidal positional encodings, and PyTorch's TransformerEncoder module.

      The only non-standard component to this architecture is a custom “wiggle” activation function to the output layer that prevents extreme selection factors as previously described.”

      The architecture is very “stock” - just the default torch TransformerEncoder, so I don't think that it merits a diagram. We have expanded our discussion of the simple architecture in the revision. This sits in contrast to the setup for the loss function, which is quite custom and is the subject of Figure 2 and Figure S8.

      (9) Another general remark is that, to fully showcase the predictive advantage offered by DAMS with all the modelling choices entailed, one could show the performance of simpler models, like the mutation model alone (with no selection factors), or models where selection factors are just learnt independently for each site, or are learnt with a simple linear layer instead of a transformer (these are just ideas of some simpler approach that can set baselines over which DASM improvement can be shown).

      This is a great suggestion. The primary focus of this paper is in comparing to alternate antibody language models in terms of functional prediction.

      These simpler models could be used for comparing the evolutionary objective, which we did in our previous paper (https://doi.org/10.1093/molbev/msaf186). We note that a sitewise model with fixed sites cannot really be appropriately formulated due to sequences being of different lengths.

      Additional changes

      In addition to the reviewer-requested changes, we added a comparison of ESM2 model sizes (650M vs 3B parameters) on the Koenig benchmark. We found that scaling ESM2 from 650M to 3B parameters did not improve performance. Indeed, the larger model showed slightly degraded correlations, particularly for light chain predictions. This is consistent with recent observations that medium-sized protein language models can outperform larger ones on transfer learning tasks (Vieira et al., Sci. Rep. 2025). We added Table S2 documenting these results and cite this finding in the main text to justify our use of the 650M model throughout the analyses. After doing this, we realized for the Shanehsazzadeh evaluation we had accidentally used ESM2-3B instead of ESM2-650M. The corrected ESM2-650M values are slightly lower (0.191 and 0.308 for sequence lengths 119 and 120, respectively, compared to the previous values of 0.248 and 0.337). This correction does not affect our conclusions, as DASM substantially outperforms ESM2 on this benchmark before and after the change.

      We also realized in the course of revision that we had been scoring AbLang2 using the masked-marginals pseudo-perplexity approach for the single-mutant Koenig dataset (Figure 1c), rather than the standard persequence pseudo-perplexity used elsewhere in the paper. For maskedmarginals, probabilities are computed using only wild-type context, whereas standard pseudo-perplexity uses each variant's own context.

      The masked-marginals approach has a simple interpretation: for singlemutation variants, it is a linear transformation of the log ratio of the variant amino acid probability to the wild-type amino acid probability, both evaluated under wild-type context. This log-odds ratio directly measures how much the model prefers the mutation over the original residue.

      We found that masked-marginals performed better for AbLang2 on this dataset, so we continued using it for Figure 1c. However, for the benchmarking table (Table 1), we switched to per-sequence pseudoperplexity as for the other comparisons in the paper, following the standard benchmarking protocol defined in FLAb (Chungyoun et al., 2024). We document both approaches in the Methods section:

      “An alternative “masked-marginals” approach scores variants using only wild-type context.

      For a wild-type sequence w, masked-marginals computes . for all amino acids a at each position i once, then uses these wild-type-derived probabilities to compute pseudoperplexity for any variant x...

      For a single-mutation variant x that differs from wild-type w only at position j, all terms except position j cancel when comparing to wild-type, giving . Thus, the log-probability difference between variant and wild-type amino acids equals, up to an additive constant that depends only on the wild-type sequence, negative n times the log pseudo-perplexity of the variant.

      For Figure 1c on the single-mutant Koenig dataset, we found that this approach gave a higher correlation for AbLang2 and so used it in that figure.

      For benchmarking comparisons (Table 1), we followed standard practice and used per-sequence pseudo-perplexity.”

    1. Author response:

      Updated Response, March 3, 2026

      In the midst of considering the thoughtful and insightful reviews of our manuscript and updating our work accordingly, we wanted to provide an interim update.

      In the reviews of our paper, each of the reviewers brought up questions about the specificity and sensitivity of a new "TFD-Seq" assay for protein-DNA specificity in vivo that we had developed for this work and applied here for the first time with a complex eukaryote (Figure 4). While we remain strong proponents of developing in vivo assays for protein-DNA interaction, we took to heart the concerns that the reviewers had expressed. We have therefore, in the past few weeks, done a rather "deep dive" into both the technical aspects of the TFD-Seq data and the conceptual and statistical aspects of how TFD mutation data can be interpreted. From this analysis, we find ourselves in agreement with the concerns. In particular, our "deep dive" has suggested that conclusions from TFD data (particularly negative conclusions on the presence of binding sites) will require a better understanding of signal and noise in the kind assay used in Figure 4.

      As the work is current in the submitted/preprint stage, we look forward to spending some time working (as appropriate) on both improvements to current protocols and alternative experiments to support the novel assay. An updated preprint which (for now) conveys the body of work and conclusions (which are not substantially altered), while avoiding the complexities of the TFD-seq assay is available at BioRXIV, and we will look forward to sending a version-of-record over the next few months as we have had a chance to provide robust tests for the macromolecular targets/interactors for ZNF-236 factor that was identified in this study.

      We again thank the reviewers (peer review is indeed really a good thing) and look forward to updating everyone soon.

      Updated bioRxiv preprint: https://www.biorxiv.org/content/10.1101/2025.10.22.683740v3

      Original Response, January 5, 2026

      We thank the reviewers for their insights and suggestions. We appreciate that the reviewers were engaged by both the observations and their interpretation, and consider their interest in further analysis and clarified discussion to be the best possible compliment to this work.

      As noted by the reviewers, the working hypothesis of a nuclear organization role for ZNF-236 is just one model. Clarifying this model and potential alternatives will certainly add to the manuscript and this will be a key part of the revision.  Beyond this, several suggested analyses should explore extant models, while providing context for considering alternatives.  We look forward to carrying out such analyses as feasible and will report them in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, Qin and colleagues aim to delineate a neural mechanism by which the internal satiety levels modulate the intake of sugar solution. They identified a three-step neuropeptidergic system that downregulates the sensitivity of sweet-sensing gustatory sensory neurons in sated flies. First, neurons that release a neuropeptide Hugin (which is an insect homolog of vertebrate Neuromedin U (NMU)) are in an active state when the concentration of glucose is high. This activation does not require synaptic inputs, suggesting that Hugin-releasing neurons sense hemolymph glucose levels directly. Next, the Hugin neuropeptides activate Allatostatin A (AstA)-releasing neurons via one of Hugin's receptors, PK2-R1. Finally, the released AstA neuropeptide suppresses sugar response in sugar-sensing Gr5a-expressing gustatory sensory neurons through AstA-R1 receptor. Suppression of sugar response in Gr5a-expressing neurons reduces the fly's sugar intake motivation (measured by proboscis extension reflex). They also found that NMU-expressing neurons in the ventromedial hypothalamus (VMH) of mice (which project to the rostral nucleus of the solitary tract (rNST)) are also activated by high concentrations of glucose, independent of synaptic transmission, and that injection of NMU reduces the glucose-induced activity in the downstream of NMU-expressing neurons in rNST. These data suggest that the function of Hugin neuropeptide in the fly is analogous to the function of NMU in the mouse.

      Generally, their central conclusions are well-supported by multiple independent approaches. The parallel study in mice adds a unique comparative perspective that makes the paper interesting to a wide range of readers. It is easier said than done: the rigor of this study, which effectively combined pharmacological and genetic approaches to provide multiple lines of behavioral and physiological evidence, deserves recognition and praise.

      A perceived weakness is that the behavioral effects of the manipulations of Hugin and AstA systems are modest compared to a dramatic shift of sugar solution-induced PER (the behavioral proxy of sugar sensitivity) induced by hunger, as presented in Figure 1B and E. It is true that the mutation of tyrosine hydroxylase (TH), which synthesizes dopamine, does not completely abolish the hunger-induced PER change, but the remaining effect is small. Moreover, the behavioral effect of the silencing of the Hugin/AstA system (Figure Supplement 13B, C) is difficult to interpret, leaving a possibility that this system may not be necessary for shifting PER in starved flies. These suggest that the Hugin-AstA system accounts for only a minor part of the behavioral adaptation induced by the decreased sugar levels. Their aim to "dissect out a complete neural pathway that directly senses internal energy state and modulates food-related behavioral output in the fly brain" is likely only partially achieved. While this outcome is not a shortcoming of a study per se, the depth of discussion on the mechanism of interactions between the Hugin/AstA system and the other previously characterized molecular circuit mechanisms mediating hunger-induced behavioral modulation is insufficient for readers to appreciate the novelty of this study and future challenges in the field.

      We thank the reviewer for the thoughtful comment. We agree that the behavioral effects of manipulating the Hugin–AstA system alone were considerably weaker than the pronounced PER shifts induced by starvation. We have revised our Discussion to address it by positioning our findings within the broader context of energy regulation.

      More specifically, we discuss that feeding behavior is controlled by two distinct, yet synergistic, types of mechanisms:

      (1) Hunger-driven 'accelerators': as the reviewer notes, pathways involving dopamine and NPF are powerful drivers of sweet sensitivity. These systems are strongly activated by hunger to promote food-seeking and consumption.

      (2) Satiety-driven 'brakes': our study identifies the counterpart to those systems above, aka. a satiety-driven 'brake'. The Hugin–AstA pathway acts as a direct sensor of high internal energy (glucose), which is specifically engaged during satiety to actively suppress sweet sensation and prevent overconsumption.

      This framework explains the seemingly discrepancy in effect size. The dramatic PER shift seen upon starvation is a combined result of engaging the 'accelerators' (hunger pathways like TH/NPF) while simultaneously releasing the 'brake' (our Hugin–AstA pathway being inactive).

      Our manipulations, which specifically target only the 'brake' system, are therefore expected to have a more modest effect than this combined physiological state. Thus, rather than being a "minor part," the Hugin–AstA pathway is a mechanistically defined, satiety-specific circuit that is essential for the precise "braking" required for energy homeostasis. We will update our Discussion to emphasize how these 'accelerator' and 'brake' circuits must work in concert to ensure precise energy regulation.

      In this context, authors are encouraged to confront a limitation of the study due to the lack of subtype-level circuit characterization, despite their intriguing finding that only a subtype of Hugin- and AstA-releasing neurons are responsive to the elevated level of bath-applied glucose.

      We thank the reviewer for highlighting the critical issue of subtype-level specialization within the Hugin and AstA populations.

      We fully agree that the Hugin system is known for its functional heterogeneity (pleiotropy), with different Hugin neuron subclusters implicated in regulating a variety of behaviors, including feeding, aversion, and locomotion (e.g., Anna N King, Curr Biol, 2017, Andreas PLoS Biol, Sebastian et al., 2016, Nat Comm). Our finding that only a specific subcluster of Hugin neurons is responsive to glucose elevation provides a crucial first step in functionally dissecting this complexity.

      we have added a dedicated paragraph to elaborate on this functional partitioning in the discussion. We propose that this subtype-level specialization allows the Hugin system to precisely link specific physiological states (like high circulating glucose) to appropriate behavioral outputs (like the suppression of sweet taste), demonstrating an elegant solution to coordinating multiple survival behaviors. Future work using high-resolution tools such as split-GAL4 and single-cell sequencing will be invaluable in fully mapping the specific functional roles corresponding to each Hugin and AstA subcluster.

      Reviewer #2 (Public review):

      Summary:

      The question of how caloric and taste information interact and consolidate remains both active and highly relevant to human health and cognition. The authors of this work sought to understand how nutrient sensing of glucose modulates sweet sensation. They found that glucose intake activates hugin signaling to AstA neurons to suppress feeding, which contributes to our mechanistic understanding of nutrient sensation. They did this by leveraging the genetic tools of Drosophila to carry out nuanced experimental manipulations and confirmed the conservation of their main mechanism in a mammalian model. This work builds on previous studies examining sugar taste and caloric sensing, enhancing the resolution of our understanding.

      Strengths:

      Fully discovering neural circuits that connect body state with perception remains central to understanding homeostasis and behavior. This study expands our understanding of sugar sensing, providing mechanistic evidence for a hugin/AstA circuit that is responsive to sugar intake and suppresses feeding. In addition to effectively leveraging the genetic tools of Drosophila, this study further extends their findings into a mammalian model with the discovery that NMU neural signaling is also responsive to sugar intake.

      Weaknesses:

      The effect of Glut1 knockdown on PER in hugin neurons is modest, and does not show a clear difference between fed and starved flies as might be expected if this mechanism acts as a sensor of internal energy state. This could suggest that glucose intake through Glut1 may only be part of the mechanism.

      We thank the reviewer for this insightful comment and agree that the modest behavioral effect of Glut1 knockdown is a critical finding that warrants further clarification. This observation strongly supports the idea that internal energy state is monitored by a sophisticated and robust network, not a single, fragile component. We believe the effect size is modest for two main reasons, which we have addressed in revised Discussion.

      Firstly, the effect size is likely attenuated by technical and molecular redundancy. Specifically, the RNAi-mediated knockdown of Glut1 may be incomplete, leaving residual transporter function. Furthermore, Glut1 is likely only one part of the Hugin neuron's intrinsic sensing mechanism; other components, such as alternative glucose transporters or downstream K<sub>ATP</sub> channel signaling, may provide molecular redundancy, meaning that the full energy-sensing function is not easily abolished by a single manipulation.

      Secondly, and more importantly, the final feeding decision is an integrated output of competing circuits. While hunger-sensing pathways like the dopamine and NPF circuits act as powerful "accelerators" to drive sweet consumption, the Hugin–AstA pathway serves as a satiety-specific "brake." The modest effect of partially inhibiting just one component of this 'brake' system is the hallmark of a precisely regulated, multi-layered homeostatic system. We have clarified in the Discussion that the Hugin pathway represents one essential inhibitory circuit within this cooperative network that works together with the hunger-promoting systems to ensure precise control over energy intake.

      Reviewer #3 (Public review):

      Summary:

      This study identifies a novel energy-sensing circuit in Drosophila and mice that directly regulates sweet taste perception. In flies, hugin+ neurons function as a glucose sensor, activated through Glut1 transport and ATP-sensitive potassium channels. Once activated, hugin neurons release hugin peptide, which stimulates downstream Allatostatin A (AstA)+ neurons via PK2-R1 receptors. AstA+ neurons then inhibit sweet-sensing Gr5a+ gustatory neurons through AstA peptide and its receptor AstA-R1, reducing sweet sensitivity after feeding. Disrupting this pathway enhances sweet taste and increases food intake, while activating the pathway suppresses feeding.

      The mammalian homolog of neuromedin U (NMU) was shown to play an analogous role in mice. NMU knockout mice displayed heightened sweet preference, while NMU administration suppressed it. In addition, VMH NMU+ neurons directly sense glucose and project to rNST Calb2+ neurons, dampening sweet taste responses. The authors suggested a conserved hugin/NMU-AstA pathway that couples energy state to taste perception.

      Strengths:

      Interesting findings that extend from insects to mammals. Very comprehensive.

      Weaknesses:

      Coupling energy status to taste sensitivity is not a new story. Many pathways appear to be involved, and therefore, it raises a question as to how this hugin-AstA pathway is unique.

      The reviewer is correct that several energy-sensing pathways are known. However, we now clarify that these previously established mechanisms, such as the dopaminergic and NPF pathways, primarily function as hunger-driven "accelerators." They are activated by low-energy states to promote sweet sensitivity and drive consumption.

      The crucial, missing piece of the puzzle—which our study provides—is the satiety-specific "brake" mechanism. We identify the Hugin–AstA circuit as one of the “brakes”: a dedicated, central sensor that responds directly to high circulating glucose (satiety) to suppress sweet sensation and prevent overconsumption.

      Thus, our work is unique because it defines the essential counterpart to the hunger pathways. In the revised Discussion, we have explained how these 'accelerator' (hunger) and 'brake' (satiety) systems work in concert to allow for the precise, bidirectional regulation of energy intake. Furthermore, by demonstrating that this Hugin/NMU 'brake' circuit is evolutionarily conserved in mice, our findings reveal a fundamental energy-sensing strategy and suggest that this pathway could represent a promising new therapeutic target for managing conditions of excessive food intake.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Considering the comments from all three reviewers, new experiments are not necessary, but the authors are welcome to provide new pieces of evidence that would strengthen their conclusions. To assist the authors with their revisions, the comments have been categorized from the highest to lowest priority based on the concerns raised by reviewers 1, 2, and 3.

      High priority:

      (1) Acknowledgement of partial phenotypes by the genetic manipulations, especially relative to other neuromodulators that are involved in the adjustment of sugar sensitivity after starvation (1, 2).

      Please see our responses to the Public Review 1 for details.

      (2) Detailed discussion on the novelty of the present work, also in light of previous studies both in flies and mammals (known Drosophila modulators, as well as NMU-rNST circuit on sugar sensation) (1, 2, 3).

      Please see our responses to the Public Review 3 for details.

      (3) Medium priority:

      • Discussions on the subtype-specific function of hugin neurons (1).

      Please see our responses to the Public Review 1 for details.

      • Discussions on the pleiotropic effect of changes in the level of circulating sugar (including release of other sugar types) (2, 3).

      We agree that circulating sugars represent a complex, systemic signal with broad, pleiotropic effects, and we have expanded our Discussion to address this.

      We will discuss the functional distinction between key hemolymph sugars, such as trehalose (the main circulating sugar, critical for stress/flight) and glucose (the primary, rapidly mobilized energy currency). While various sugars collectively influence metabolic status, our study’s unique focus is on the direct neural link between internal energy and sweet taste modulation. We clarify that our work precisely identifies glucose as the direct, key ligand for the Hugin satiety circuit, thus providing a concrete, mechanistically defined link from systemic energy complexity to the specific regulation of sweet sensation.

      • Illustration or clear explanations of sugar application methods in mouse experiments (ex. Figure 5F vs Figure 5M), as well as discussion on the concentration of sugar solutions used (3).

      We have added the relevant details in the figure legends and explain the rationale for using this concentration of sugar in the results.

      • Less saturated image for Figure 5K (3).

      We have adjusted Figure 5K to reduce image saturation for clarity.

      • Discussions on the modest effect of NMU on rNST neurons (Figure 5M) (3).

      In the revised results, we have discussed that the modest suppression of rNST activity likely reflects partial peptide diffusion and the heterogeneous composition of sweet-responsive rNST neurons.

      (4) Low priority:

      • Systematic quantification of multiple types of sugars after starvation (3).

      We agree that circulating sugars represent a complex metabolic milieu, and a fully systematic biochemical quantification of individual hemolymph sugars after starvation would be informative. While such analyses are beyond the scope of the present study, we have addressed this point at the functional level by systematically pre-feeding flies with different types of dietary sugars prior to PER assays.

      We find that multiple sugars are capable of suppressing PER, indicating that satiety-related behavioral inhibition is not unique to a single carbohydrate source. Notably, sucrose produces the strongest suppression, consistent with its rapid metabolic conversion and effectiveness in elevating internal glucose levels. These results support the notion that diverse dietary sugars converge on a common satiety-signaling mechanism, while our mechanistic analyses specifically identify glucose as the key ligand engaging the Hugin satiety circuit.

      We now clarify this distinction in the revised Discussion.

      • Testing Gr64f neurons or mutants (3).

      Our results indicate that energy sensing in the CNS suppresses sweet-sensing neuron activity (e.g., via hyperpolarization) rather than directly blocking sugar binding to receptors. Thus, sweet perception—not sugar detection—is inhibited. As evidence, in Figure supplementary4 we measured the PER to fructose and trehalose. Although Gr5a and Gr64a differ in their sensitivity to these sugars, the CNS energy state consistently suppresses sweet perception for both. As Reviewer 3 noted, Gr5a and Gr64f are co-expressed in sweet neurons; while they respond to different sugars, their labeling of the neurons is largely equivalent.

      • Testing sugar preference (glucose vs. other sugars) (3)

      Since our primary goal was to identify a direct satiety-sensing and sensory-modulating circuit—the "brake" mechanism—PER served as the most suitable and mechanistically specific readout. While manipulation of the Hugin–AstA circuit influences internal state, and therefore likely alters long-term sugar preference, investigating the integration of this pathway with reward and post-ingestive signaling is a critical question that lies beyond the scope of the current study.

      • Cell type-specific knockout of NMU (3).

      Achieving a cell type-specific knockout of NMU using the Cre approach is not feasible in the short term. While previous studies have reported the role of NMU in the VMH region in regulating feeding, our contribution lies in revealing how these neurons sense energy. We also show that these neurons project to the vicinity of Calb2 neurons and that the neuropeptide can suppress Calb2 neuronal activity. This essentially demonstrates that the hugin–Gr5a pathway in Drosophila is conserved in mice. We believe that a detailed dissection of the precise circuitry in mice is more appropriate to address in a subsequent study.

      • Explanation of NMU detection in Figure 5K (3): this is GFP expressed by the Cre-dependent virus.

      We have revised the Figure 5K legend to clarify that NMU<sup>+</sup> neurons are labeled by GFP expression from a Cre-dependent AAV2/1-DIO-GFP, which undergoes anterograde trans-synaptic transfer. We further explain that GFP expression in rNST neurons requires local AAV-Cre injection, enabling identification of postsynaptic Calb2<sup>+</sup> target neurons.

      • Neuronal manipulation of NMU neurons by optogenetics or DREADD.

      Please see our responses to the question “Cell type-specific knockout of NMU.”

      Reviewer #1 (Recommendations for the authors):

      A major concern about the study is that the effect of genetic manipulations on Hugin/AstA system appears to account for only a small part of the dramatic shift of PER probability toward smaller concentrations of sucrose solutions among starved flies. In Figure 1B and E, PER probability is significantly higher among starved flies in response to 10-200mM of sucrose solutions than fed flies. Compared to this, RNAi knockdown of glucose transporter in hugin neurons (Figure 2C), PK2-R1 pan-neuronally (Figure 3C) or in AstA-releasing neurons (Figure 3G), AstA-R1 in Gr5a neurons (Figure 4E), systemic mutation of PK-R2 (Figure Supplement 10) and AstA-R1 (Figure Supplement 12) all produce relatively minor behavioral changes. Consistent with previous works, the mutation of TH causes a robust decrease of PER across the entire range of sucrose concentration tested (Figure Supplement 1).

      These discrepancies can be caused by many technical limitations that cannot be readily addressed. For instance, the large effect of TH can be confounded by the pleiotropic behavioral effect of the lack of dopamine. RNAi can suffer from incomplete elimination of targeted genes. However, the relatively small behavioral effect size of these manipulations cannot be entirely ignored in light of previous publications, which point to the importance of other neuromodulators such as dopamine, serotonin, Akh, and NPF, on sugar sensitivity (Marella et al., 2012; Inagaki et al., 2014; Yao et al., 2022), as well as other potentially parallel glucose-sensing systems, including Gr43a-expressing cells (Miyamoto et al., 2012) and sNPF-expressing CN neurons (Oh et al., 2019). While the neuropeptides initially tested (Figure 1) are not poor choices, it is a missed opportunity that so many other neuromodulators were excluded from the initial search.

      We appreciate the reviewer’s detailed analysis and agree that the magnitude of behavioral effects produced by manipulating the hugin–AstA pathway is smaller than the dramatic shift in PER observed under starvation conditions. This comparison is important and highlights a central conceptual point of our study.

      Starvation represents a compound physiological state that simultaneously engages multiple hunger-promoting neuromodulatory systems—most prominently dopaminergic and NPF pathways—while also releasing satiety-associated inhibitory signals. As shown previously and confirmed here (Figure supplementary 1), manipulation of dopamine synthesis produces a broad and robust reduction in PER across sucrose concentrations, consistent with its role as a powerful hunger-driven modulator.

      By contrast, our genetic manipulations specifically target a satiety-associated inhibitory circuit—the hugin–AstA pathway—that is selectively engaged by high internal glucose levels. Manipulating this pathway alone therefore isolates a single “brake” component of feeding regulation, rather than recapitulating the full physiological state of starvation, which combines both accelerator activation and brake release. Accordingly, the more modest behavioral effects we observe are an expected consequence of dissecting one defined regulatory module from a larger, cooperative network.

      We agree that multiple neuromodulators, including dopamine, serotonin, Akh, NPF, and others, as well as parallel glucose-sensing systems such as Gr43a-expressing cells and sNPF-expressing CN neurons, contribute to the regulation of sugar sensitivity. Rather than aiming to exhaustively screen all neuromodulators, our study was designed to identify and mechanistically define a central, glucose-responsive satiety sensor that directly links internal energy state to sweet taste modulation. In the revised discussion, we now explicitly position the hugin–AstA circuit as one essential, satiety-specific component within this broader regulatory landscape and discuss how it functionally complements previously characterized hunger-driven pathways.

      I am also confused by the results of Shibirets1-mediated silencing of Hugin and AstA neurons (Figure Supplement 13B, C). It is unclear to me why a feeding assay was used instead of PER, like the activation experiments. Feeding (ingestion) and PER are qualitatively different types of behavior, which cannot be directly compared. Moreover, the definition of "fold change" is not provided either in the figure legend or in the Materials and Methods section, making it difficult to understand what the figure means.

      We thank the reviewer for pointing out this important issue regarding the interpretation of the Shibire^ts1-mediated silencing experiments. We agree that proboscis extension reflex (PER) and feeding/ingestion assays reflect qualitatively different behavioral processes and should not be directly compared.

      In the original submission, feeding assays were used to assess the effect of neuronal silencing, which led to ambiguity when comparing these results with PER-based activation experiments. To directly address this concern and ensure consistency across behavioral readouts, we have now performed additional PER experiments under the same Shibire^ts1-mediated silencing conditions.

      These new data demonstrate that acute silencing of hugin neurons significantly enhances PER responses to sucrose (Figure supplementary 13B), indicating increased sweet sensitivity. This result is fully consistent with our activation experiments and supports the conclusion that the hugin–AstA pathway suppresses sweet taste perception under satiety conditions.

      In addition, we have revised the figure legend to explicitly define the “fold change” metric used in the behavioral analysis, clarifying how the values were calculated and normalized. Together, these changes resolve the ambiguity raised by the reviewer and strengthen the behavioral consistency of our conclusions.

      Of note, Marella et al. (2012) reported that silencing of Hugin-releasing neurons did not affect PER. It is therefore possible that the Hugin system is sufficient, but not necessary, for modulating PER under food deprivation.

      We agree that their observation—that silencing Hugin-releasing neurons does not alter PER in starved flies—is consistent with a state-dependent role of the Hugin system in feeding regulation.

      In starved animals, dopaminergic TH<sup>+</sup> neurons are strongly activated and promote high PER responsiveness, while circulating glucose levels are low, placing Hugin neurons in a relatively inactive state. Under such conditions, further silencing of Hugin neurons would be expected to produce minimal additional effects on PER, which likely explains the results reported by Marella et al.

      Importantly, our data show that preventing the starvation-associated reduction in Hugin neuronal activity—by thermogenetic activation of Hugin<sup>+</sup> neurons (Hugin–TrpA1; Figure 1D)—significantly suppresses the hunger-induced enhancement of PER. These results indicate that dynamic downregulation of Hugin neuronal activity is a critical component of the normal behavioral shift in sweet sensitivity in response to food deprivation. Thus, while Hugin neurons may not be required to further modulate PER once animals are already in a strongly starved state, their regulated activity change is essential for mediating state-dependent modulation of sweet taste behavior. We have added discussion in the revised manuscript.

      While no new experiments are requested, it is important for authors to acknowledge the limited effect size of Hugin/AstA manipulation. In the current manuscript, the authors briefly mention the previous works (lines 460-462, 472-474), which is insufficient. Discussions must include how the Hugin/AstA system may "complement these established mechanisms (line 460)" (described in the references listed above), under what situations this novel Hugin/AstA system can be relevant for controlling PER, and why the fly is equipped with seemingly redundant systems for sensing internal glucose levels and controlling feeding behavior. Without these discussions, it is difficult to recognize the novelty of the presented work. The data appears largely to be a minor and incremental progress on an already mature field.

      In the revised manuscript, we have substantially expanded the Discussion to explicitly acknowledge this limited effect size and to clarify the functional role of the Hugin–AstA pathway within the broader energy-regulatory network. We now emphasize that this circuit represents a satiety-specific inhibitory branch that complements, rather than replaces, previously described hunger-promoting systems such as dopaminergic, NPF, and AKH circuits.

      Importantly, we discuss the specific physiological conditions under which the Hugin–AstA system is most relevant—namely, post-feeding and high-glucose states. Unlike hunger circuits that amplify sweet sensitivity during starvation, the Hugin–AstA pathway directly senses circulating glucose and rapidly suppresses sweet taste perception when energy is sufficient, thereby acting as a brake to prevent overconsumption.

      We further address the apparent redundancy among internal sugar-sensing systems. Rather than being redundant, these pathways form a coordinated and layered network with distinct sugar specificities, temporal dynamics, and functional roles. For example, Gr43a<sup>+</sup> neurons primarily detect fructose, whereas hemolymph glucose represents the principal energetic currency in Drosophila. The use of multiple internal sugar sensors allows flies to fine-tune feeding decisions across different nutritional contexts and timescales.

      Finally, we expand the Discussion to highlight that although the Hugin–AstA circuit constitutes only one branch of the energy-sensing network, its disruption leads to excessive energy intake (Figure supplementary 13C-E, G) and increased fat accumulation (Figure S13F), underscoring its physiological relevance. We also discuss how this pathway likely interacts with other neuromodulatory systems, including TH<sup>+</sup> dopaminergic and NPF<sup>+</sup> neurons, to collectively orchestrate adaptive feeding behavior and energy homeostasis.

      Together, these additions clarify that our work does not simply add another neuromodulator to an already mature field, but instead identifies a distinct glucose-sensing, satiety-linked mechanism that fills a conceptual gap between internal energy state detection and sensory modulation.

      Another perceived weakness is the lack of subtype-level dissection among Hugin- and AstA-releasing neurons. I make a justified request to narrow down the behaviorally relevant neuron to one (or one type), which is based on a widespread but unreasonable and dangerous assumption that every behavior must be controlled by one neuron. However, the authors present very interesting data that only a subset of Hugin- and AstA-releasing neurons responds to higher levels of sucrose (Figure 1H, Figure Supplement 7A, B), which leads to a hypothesis that a specific subtype within each peptidergic neuronal group is responsible for starvation-induced behavioral change. The authors only briefly touch upon this (lines 217-218), but this is an important hypothesis that requires further discussion.

      We thank the reviewer for highlighting the importance of neuronal heterogeneity within the Hugin- and AstA-releasing populations. We fully agree that the observation that only a subset of Hugin<sup>+</sup> and AstA<sup>+</sup> neurons responds to elevated sucrose levels (Figure 1H; Figure Supplement 7A, B) strongly suggests functional specialization within these peptidergic groups.

      In the revised Discussion, we now explicitly propose that distinct subtypes of Hugin and AstA neurons differentially contribute to energy sensing and feeding modulation. We suggest that glucose-responsive subpopulations may be specifically engaged in satiety signaling, whereas other neurons within the same genetic classes may participate in additional physiological or behavioral processes. This heterogeneity provides a plausible explanation for the partial behavioral effects observed following population-level manipulations. Although we did not perform subtype-specific perturbations in this study, our findings provide a foundation for identifying these subtypes in future work using split-GAL4 lines and connectomic datasets.

      These issues are more important than the sprawling and unfocused review of various hunger and satiety-controlling systems across species in the Introduction. Lines 53-108 contain only tangential information to the main conclusion of the paper. Both the Introduction and Discussion sections must be completely restructured so that readers understand what is already known about hunger-induced changes in feeding-related behavior, what is a missing gap of knowledge in neural mechanisms controlling behavioral adaptation under starvation, and why Hugin/NMU is an interesting target in this context.

      We thank the reviewer for this important structural critique. We agree that, in the original manuscript, the Introduction placed disproportionate emphasis on a broad survey of hunger- and satiety-regulating systems across species, which may have obscured the central conceptual advance of this study.

      In the revised manuscript, we have substantially restructured both the Introduction and the Discussion to sharpen the narrative focus and clarify the specific knowledge gap addressed by our work.

      First, the Introduction has been streamlined to focus on what is already known about hunger-induced modulation of feeding-related behaviors, particularly sweet taste sensitivity and PER in Drosophila. We now emphasize that prior studies have predominantly characterized hunger-activated, feeding-promoting pathways (e.g., dopaminergic, NPF, AKH systems) that act as accelerators of food-seeking behavior.

      Second, we explicitly define the missing gap in knowledge: while hunger-driven mechanisms are well studied, it remains unclear how satiety states—specifically elevated internal glucose levels—are directly sensed by central neurons and translated into suppression of sensory gain and feeding behavior.

      Third, we reposition Hugin/NMU as an attractive and conceptually distinct target because of its peptidergic nature, evolutionary conservation, and previously reported but mechanistically unresolved links to feeding regulation. This framing motivates our central question: whether Hugin/NMU neurons function as a direct internal energy sensor that actively implements a satiety-specific inhibitory control over taste perception.

      In parallel, the Discussion has been reorganized to avoid an unfocused review of feeding circuits across species and instead to interpret our findings within a clear conceptual framework. We now emphasize that the Hugin–AstA (and NMU) pathway represents a satiety-driven “brake” that complements, rather than duplicates, established hunger-driven “accelerator” circuits. This restructuring clarifies both the novelty of our findings and their relevance within the existing literature.

      Reviewer #2 (Recommendations for the authors):

      When discussing the results of Figure 1, such as lines 203-204, "These results demonstrate that sugar intake inhibits sweet sensation, probably via increasing circulating sugar levels" it may be worth discussing the known impact of sweet sensation experience on future sweet taste responses. With the data shown here, it is difficult to conclusively separate blood glucose levels from the sweet sensation that happens during the re-feeding. The "normal diet minus sucrose" does not blunt the starved PER effect, but that could potentially be impacted by either/both sugar intake or sweet taste.

      We thank the reviewer for this thoughtful and important point. We agree that sweet taste experience itself can influence subsequent sweet sensitivity, and that separating the contribution of sensory experience from nutrient-derived internal energy is non-trivial.

      In the revised manuscript, we have clarified the experimental timing by explicitly stating that PER was assessed 15 minutes after refeeding. At this time point, hemolymph glucose levels have returned to baseline (Figure supplementary 5), supporting the physiological relevance of glucose-dependent activation of Hugin neurons under our experimental conditions.

      We also acknowledge that sweet taste exposure can induce sensory adaptation and modulate future taste responses. To directly address this potential confound, we performed additional control experiments during revision (Figure supplementary 4B) in which starved flies were refed with sorbitol (caloric but not sweet) or arabinose (sweet but non-nutritive). We found that both manipulations partially reduced PER, but neither recapitulated the full suppressive effect of sucrose refeeding.

      These results indicate that sweet taste experience and metabolic energy contribute in parallel to the regulation of sweet sensitivity. Importantly, the incomplete effects of sorbitol or arabinose alone suggest that neither sensory adaptation nor caloric value is sufficient by itself to fully account for the observed PER suppression.

      Accordingly, we have revised the Discussion to clarify that the Hugin–AstA pathway likely operates within a broader, multi-layered regulatory framework, integrating internal metabolic state with sensory experience, rather than acting as a sole determinant of post-feeding sweet sensitivity. This clarification avoids over-attribution of the behavioral effect to circulating glucose alone while preserving the central conclusion that internal energy state is a key modulator of sweet perception.

      Blocking cellular sugar intake or metabolism could be impacting the ability of neurons to function, distinct from any specific intracellular regulatory mechanism that glucose or its derivatives might be involved with. That may be a caveat worth mentioning in the results or discussion.

      We thank the reviewer for raising this important caveat. We agree that blocking cellular sugar uptake or metabolism could, in principle, impair neuronal function in a nonspecific manner, independent of any dedicated intracellular glucose-sensing mechanism.

      In the revised manuscript, we now explicitly acknowledge this possibility and clarify the scope of our interpretation. Several features of our data argue against a generalized loss of neuronal function as the primary explanation. First, the behavioral and physiological effects observed upon manipulation of glucose transport or K<sub>ATP</sub> channel activity are rapid and reversible, consistent with state-dependent modulation rather than chronic metabolic failure. Second, these manipulations selectively affect sweet sensitivity and feeding-related behaviors, without causing gross deficits in proboscis extension or neuronal responsiveness.

      Accordingly, we have revised the Results to emphasize that while intracellular glucose metabolism is required for normal neuronal activity, our findings specifically support a role for glucose-dependent modulation of neuronal excitability in satiety signaling, rather than a nonspecific energetic impairment.

      Minor suggestions:

      (1) Figure 2G: "Pryuvate" -> "Pyruvate."

      We have corrected “Pryuvate” to “Pyruvate”

      (2) "Fly" methods section: it says that flies were kept on 2% agar for 12 hours for starvation, but in the Figure 1A description, it says 24 hours.

      We have corrected the description in Figure 1A.

      Reviewer #3 (Recommendations for the authors):

      (1) SEZ Hugin+ and AstA+ neurons were activated by glucose (Figures 1G, 1I), yet hemolymph also contains trehalose and fructose. For instance, DH44 neurons respond broadly to all hemolymph sugars (Dus et al., 2015), while Gr43a neurons specifically detect fructose (Miyamoto et al., 2012). The present study does not clarify whether Hugin+ or AstA+ neurons are similarly sugar-specific or more broadly tuned. A systematic analysis is needed to determine whether these circuits are selective for glucose.

      We thank the reviewer for raising this important question regarding sugar specificity. We agree that hemolymph contains multiple sugars, including trehalose and fructose, and that distinct neural systems have been shown to differ in their tuning breadth. To address this issue, we performed additional experiments during revision in which starved wild-type flies were refed with different sugars—including sucrose, fructose, trehalose, and sorbitol—followed by PER measurements. We found that sucrose refeeding produced the strongest suppression of PER, whereas fructose, trehalose, and sorbitol induced weaker effects (Figuresupplementary 4A).

      We interpret these results as suggesting a preferential sensitivity of the Hugin/AstA pathway to glucose availability rather than a broad responsiveness to all circulating sugars. One plausible explanation is that fructose, trehalose, and sorbitol require peripheral metabolic conversion before contributing to intracellular glucose levels in neurons, whereas sucrose feeding rapidly restores hemolymph glucose within the 15-minute time window used in our experiments (Figure supplementary 5).

      Importantly, we now clarify in the revised Results and Discussion that our data support a functional preference for glucose under physiological conditions, rather than excluding the possibility that other sugars may influence this circuit indirectly or on longer timescales.

      (2) The authors state that SEZ, but not VNC, Hugin+ neurons regulate AstA activity (lines 318-319). However, comparison of Figure Supplement 8B with the severing sample in Figure Supplement 11B shows a more pronounced reduction of sweet sensation under hug>TrpA1 activation. Although the absolute response in Figure 3F (in vivo) is higher than that in the cut-off preparation (Figure S11), comparison of Figure S11C with Figure 3F indicates that hug+ neurons drive an AstA+ calcium transient more than fourfold greater in the presence of VNC neurons. Thus, the contribution of Hugin+ VNC neurons cannot be dismissed, and the conclusion should be revised accordingly.

      We thank the reviewer for this careful and quantitative comparison. We agree that our original wording overstated the exclusivity of SEZ Hugin<sup>+</sup> neurons in regulating AstA activity.

      Upon closer examination of the data, we now acknowledge that VNC Hugin<sup>+</sup> neurons likely contribute to AstA activation. As the reviewer points out, the AstA<sup>+</sup> calcium response evoked by Hugin activation is substantially larger when VNC neurons are intact (Figure supplementary11C) compared with the cut preparation (Figure 3F), indicating that descending inputs from the VNC can potentiate AstA neuronal activity.

      Accordingly, we have revised the manuscript to state that SEZ Hugin<sup>+</sup> neurons play a predominant role in driving AstA responses relevant to sweet sensation, while VNC Hugin<sup>+</sup> neurons provide additional modulatory input that enhances the overall magnitude of Hugin signaling. These revisions have been made in the Results to more accurately reflect the contributions of distinct Hugin subpopulations.

      (3) In Figure 4D, you show AstA-R1 co-localized with Gr5a-expressing cells. However, Gr5a-expressing cells also co-express Gr64f in labellum (Fuji et al., 2015, Current Biology). Are the authors sure that the sweet sensation they described is Gr5a-specific? Testing Gr64f is essential. Moreover, Fuji et al. demonstrated that Gr5a loss-of-function mutation impairs not only sucrose but also maltose, fructose, and trehalose sensation. This raises a question of whether the Hug+ and AstA+ neurons identified in the current study contribute to sensing sugars beyond sucrose. Additional experiments are required to clarify this point.

      Please see our responses to the Reviewing Editor Comments (4).

      (4) While nutritive sugar sensors such as Dh44 neurons have been directly implicated in sugar preference (Dus et al., 2015, Neuron), this study examines the hug+,AstA+, Gr5a neuronal circuit only in the context of PER responses. Why is sugar preference not assessed here, especially given that in mice, the comparison was made using preference tests?

      We thank the reviewer for this insightful question. We agree that sugar preference assays provide important information about feeding decisions and reward-based behavior. In the present study, however, we deliberately focused on the proboscis extension reflex (PER) because it offers a direct, quantitative, and temporally precise readout of sweet sensory sensitivity at the sensory–motor level.

      PER allows us to isolate changes in taste perception itself, largely independent of post-ingestive reinforcement, learning, or motivational state, all of which strongly influence preference-based assays. This distinction is particularly important given our central goal of identifying a circuit that directly links internal energy sensing to modulation of peripheral sweet-sensing neurons.

      By contrast, sugar preference reflects an integrated behavioral outcome combining sensory input, internal state, and post-ingestive reward signals, including those mediated by DH44 neurons and other nutritive sensing pathways. We therefore chose PER as the most mechanistically specific assay to dissect the Hugin–AstA–Gr5a pathway. We now explicitly acknowledge in the revised Discussion that determining how this satiety-linked sensory modulation interacts with reward and post-ingestive circuits to shape long-term sugar preference will be an important direction for future studies.

      Several other concerns:

      (5) The intraperitoneal injection of NMU is interpreted as reflecting a brain-specific NMU effect, but such systemic delivery cannot exclude peripheral actions. In Figure 5D, the use of whole-body KO mice is insufficient; targeted manipulations (e.g., NMU-Cre-driven inactivation) are required to establish circuit-specific behavioral roles.

      Please see our responses to the Reviewing Editor Comments (Low priority)

      (6) In Figure 5F and 5M, neural activity is measured under different conditions: gastric glucose infusion in 5F versus glucose licking in 5M. To establish that NMU VMH neurons and Calb2 rNST neurons belong to the same circuit, this discrepancy in stimulation timing must be resolved to support the conclusions.

      We thank the reviewer for pointing out this important issue regarding stimulation paradigms in Figures 5F and 5M. We agree that the difference between gastric glucose infusion and glucose licking requires explicit clarification.

      In the revised manuscript, we now clearly state that these two paradigms were intentionally designed to probe complementary levels of the same NMU–Calb2 circuit. In Figure 5F, gastric glucose infusion was used to isolate the internal energy-sensing property of VMH NMU<sup>+</sup> neurons, independent of oral sensory input, motor behavior, or reward expectation. This experiment establishes that NMU<sup>+</sup> neurons are directly activated by elevated circulating glucose.

      By contrast, Figures 5M examined how activation of this NMU pathway modulates downstream Calb2<sup>+</sup> rNST neurons under physiologically relevant feeding conditions, in which sweet taste signals are naturally evoked by licking. This design allows us to test the functional consequence of NMU signaling on sweet-responsive rNST neurons during normal sensory processing.

      Although the route and timing of glucose delivery differ, both paradigms converge on a unified circuit model: internal glucose elevation activates VMH NMU<sup>+</sup> neurons, and NMU signaling suppresses sweet-driven activity in Calb2<sup>+</sup> rNST neurons. We have revised the Results and figure legends to explicitly describe this layered experimental logic and to clarify that Figures 5F and 5M together establish distinct but connected nodes of the same circuit.

      (7) Figure 5I-J. The glucose concentration used appears excessively high. In mammals, blood glucose in the sated state is ~7-8 mM. It is unclear whether the observed responses represent physiological effects or artifacts of supraphysiological stimulation. Additional experiments with lower glucose concentrations would strengthen the study.

      We thank the reviewer for raising this important concern regarding the glucose concentration used in Figure 5I–J. We agree that the concentration applied in ex vivo slice experiments exceeds the typical physiological range of circulating glucose.

      This higher concentration was intentionally chosen to ensure reliable neuronal activation in acute brain slices, where glucose diffusion, uptake, and metabolic access are substantially slower than in vivo. Similar approaches have been widely used in studies of glucose-sensitive hypothalamic neurons to overcome these technical limitations (e.g., Kim et al., 2025., Neuron).

      Importantly, the physiological relevance of our findings is supported by in vivo fiber photometry experiments, which demonstrate that VMH NMU⁺ neurons are robustly activated following normal sugar ingestion under physiological conditions. Thus, while supraphysiological glucose was used to establish glucose responsiveness ex vivo, our in vivo data confirm that NMU⁺ neurons respond to glucose elevations within the normal physiological range.

      (8) Figure 5K. The VMH images are inconsistently oriented compared with Figure 5E, lacking a 3v landmark. The NMU detection method (IHC or FISH) is not specified in the legend. The GFP-Calb2 signal is heavily saturated, making it difficult to distinguish true signals from artifacts. These issues undermine interpretability.

      We thank the reviewer for pointing out these issues. In the revised manuscript, VMH images in Figure 5K have been reoriented to match Figure 5E, and the third ventricle (3v) is now indicated as an anatomical landmark. The figure legend has been revised to clarify that NMU<sup>+</sup> neurons are identified by GFP expression from a Cre-dependent AAV2/1-DIO-GFP injected into NMU-Cre mice, rather than by NMU immunohistochemistry or FISH. In addition, GFP–Calb2 images have been reprocessed to clearly distinguish true signals from background and imaging artifacts.

      (9) Figure 5L-M. Details of the NMU injection method are absent (route, dose, delivery parameters). The number of animals (n) is also not reported. Furthermore, AUC reduction alone is not sufficient evidence of robust inhibition. To convincingly demonstrate causality, NMU-IRES-Cre mice should be combined with DREADD or optogenetic approaches to directly inhibit NMU neurons and test whether rNST Calb2 activity is reduced.

      We thank the reviewer for these helpful comments. We have revised the manuscript to include all missing methodological details. These details are now clearly described in the Methods section and figure legend.

      We fully acknowledge that cell-type–specific manipulations, such as DREADD or optogenetic inhibition of NMU neurons, would provide more definitive causal evidence. However, our main goal in the mouse experiments was to demonstrate that NMU<sup>+</sup> neurons can directly sense glucose and modulate sweet sensitivity, thereby supporting the evolutionary conservation of the Hugin mechanism identified in Drosophila. Detailed dissection of the downstream circuit architecture and behavioral consequences in mammals is indeed an important direction for future research, but it lies beyond the current study’s primary focus on cross-species conservation.

      (10) In Drosophila, hugin neurons respond selectively to nutritive glucose (Fig. 2H), but whether NMU neurons share this property is unknown. Notably, Calb2 neurons in the rNST respond to the artificial sweetener AceK (Hao Jin et al., 2021, Cell), leaving open whether the NMU-rNST circuit is calorie-dependent or calorie-independent.

      We have added a statement in the Discussion acknowledging this limitation and emphasizing that future work will be needed to test whether the NMU–Calb2 circuit is selectively engaged by metabolically active sugars or also by sweet taste signals independent of caloric value.

      Minor comments

      (11) All bar graphs should include individual data points.

      We have added individual data points to all bar graphs.

      (12) In Figures 3E, 4C, and 4D, it appears that a combination of GAL4 and LexA was used, but the information about the fly lines is missing.

      We have now included the complete list of fly lines used for these experiments, including their genotypes and sources.

      (13) The source for PK2-R1 KO, AstA-R1 KO fly lines and NMU-IRES-Cre, Calb2-IRES-Cre mice is missing.

      We have added the complete source information for all genetic lines mentioned.

      (14) Figure 5B-D, This is a sucrose preference test, so why is the y-axis labeled as glucose? Is this an error, or were the values converted to glucose equivalents?

      We thank the reviewer for catching this mistake. The assay shown in Figure 5B–D measured sucrose preference, not glucose preference. The inconsistency resulted from a typographical error in the Methods description. In the revised manuscript, we have corrected this error to clearly state that sucrose was used in the preference test,

      (15) Supplementary Figure 15. The NMU images are of poor quality and should be improved.

      The punctate appearance of NMU signals in Supplementary Figure 15 is not due to poor image quality but rather reflects the physiological distribution of the NMU neuropeptide. As NMU is stored in secretory vesicles within neuronal terminals and somata, its immunostaining typically appears as discrete puncta rather than diffuse cytoplasmic labeling.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.<br /> Readers would also benefit from noting that the mice were male and discussion of the exclusion of females.

      In the revised manuscript, we have included full statistical reporting for all key experiments in the resource data. Regarding animal sex, we confirm that all mouse experiments were conducted using male mice. This choice was made to minimize variability caused by hormonal cycles in females, which can influence feeding behavior and glucose metabolism. We have now explicitly stated this information in the Methods section and included a brief discussion noting that sex-specific differences in NMU–Calb2 circuitry and feeding regulation represent an important question for future investigation.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper uses rigorous methods to determine phase dynamics from human cortical stereotactic EEGs. It finds that the power of the phase is higher at the lowest spatial phase. The application to data illustrates the solidity of the method and their potential for discovery.

      Comments on revised submission:

      The authors have provided responses to the previous recommendations.

      We thank the reviewer for reviewing our manuscript again, and for their positive evaluation.

      Reviewer #3 (Public review):

      Summary:

      The authors propose a method for estimating the spatial power spectrum of cortical activity from irregularly sampled data and apply it to iEEG data from human patients during a delayed free recall task. The main findings are that the spatial spectra of cortical activity peak at low spatial frequencies and decrease with increasing spatial frequency. This is observed over a broad range of temporal frequencies (2-100 Hz).

      Strenghs:

      A strength of the study is the type of data that is used. As pointed out by the authors, spatial spectra of cortical activity are difficult to estimate from non-invasive measurements (EEG and MEG) and from commonly used intracranial measurements (i.e. electrocorticography or Utah arrays) due to their limited spatial extent. In contrast, iEEG measurements are easier to interpret than EEG/MEG measurements and typically have larger spatial coverage than Utah arrays. However, iEEG is irregularly sampled within the three-dimensional brain volume and this poses a methodological problem that the proposed method aims to address.

      Weaknesses:

      Although the proposed method is evaluated in several indirect ways, a direct evaluation is lacking. This would entail simulating cortical current source density (CSD) with known spatial spectrum and using a realistic iEEG volume-conductor model to generate iEEG signals.

      Comments on revised version:

      In my original review, I raised the following issue:

      "The proposed method of estimating wavelength from irregularly sampled three-dimensional iEEG data involves several steps (phase-extraction, singular value-decomposition, triangle definition, dimension reduction, etc.) and it is not at all clear that the concatenation of all these steps actually yields accurate estimates. Did the authors use more realistic simulations of cortical activity (i.e. on the convoluted cortical sheet) to verify that the method indeed yields accurate estimates of phase spectra?"

      And the authors' response was:

      "We now included detailed surrogate testing, in which varying combinations of sEEG phase data and veridical surrogate wavelengths are added together. See our reply from the public reviewer comments. We assess that real neurophysiological data (here, sEEG plus surrogate and MEG manipulated in various ways) is a more accurate way to address these issues. In our experience, large scale TWs appear spontaneously in realistic cortical simulations, and we now cite the relevant papers in the manuscript (line 53)."

      The point that I wanted to make is not that traveling waves appear in computational models of cortical activity, as the authors seem to think. My point was that the only direct way to evaluate the proposed method for estimating spatial spectra is to use simulated cortical activity with known spatial spectrum. In particular, with "realistic simulations" I refer to the iEEG volume-conductor model that describes the mapping from cortical current source density (CSD) to iEEG signals, and that incorporates the reference electrodes and the particular montage used.

      Although in the revised manuscript the authors have provided indirect evidence for the soundness of the proposed estimation method, the lack of a direct evaluation using realistic simulations with ground truth as described above makes that remain sceptical about the soundness of the method.

      We thank the reviewer for reviewing our manuscript again.

      We have reviewed the literature again on volume conduction effects in LFP measures of cortical activity. In all publications we reviewed, the conclusion is that the range of the effect is <1cm. We now mention the range of volume conduction in the Methods section dealing with the surrogate models (lines 1054-9) as well as added emphasis in the Discussion (lines 594-9).

      The highest spatial frequency we consider in the present research is 50c/m, which corresponds to a cortical distance of 2cm. This is well outside the range of volume conduction effects in LFPs. Mathematically speaking, blurring (e.g. Gaussian) acts as a low-pass filter, attenuating higher spatial frequency components. But only for components within the spatial range of the Gaussian blurring i.e. for LFPs, higher than 100c/m. There will therefore be negligible effects (mathematically speaking, zero effect) of volume conduction in the results reported by us. If the veracity of these studies on volume conduction with LFPs is accepted, then the reviewer’s requested simulation reduces to “estimating spatial spectra [using] simulated cortical activity with known spatial spectrum.” This is what we have done, in a direct and simple manner.

      If the ubiquity and importance of spatio-temporal dynamics in cortex is accepted, then it is insufficient to describe “the mapping from cortical current source density (CSD) to iEEG signals”, since this presumes a model of cortical activity that does not capture the correlations in space and time that we assume are critical to cortical function. We are aware the CSD approach has a long and successful history of unravelling brain mechanisms. However, an emphasis on traveling waves (and spatio-temporal dynamics in general) is in part a challenge to this approach (and the idea of localized sources in general). CSD approaches carry similar assumptions (but at a smaller scale, <1cm) as those elaborated in Zhigalov and Jensen (2023) for extra-cranial measures. In both cases, removal of volume conduction effects emphasizes standing wave activity (localized static, oscillatory sources) over traveling wave activity. In this manner, these methods tend to confirm their starting assumptions (as does our own approach, of course). What is required is external empirical validation to break any circular confirmation of initial theoretical choice of basis. All this is a way of saying that CSD approaches are not the unproblematic, direct methods that the reviewer asserts.

      We did understand the reviewer’s request to model the effects of volume conduction. Our own view of realistic cortical simulations differs from the reviewer’s, setting aside the final step in the forward modeling pipeline which would add the effects of volume conduction in the grey matter. By simulating real-time dynamics, it should be possible to untangle the effects of volume conduction from true spatio-temporal correlations. This is because the volume conduction effects are essentially instantaneous, compared to the relatively slow motion of traveling waves. So, the measurement of purely spatial phase vectors is prone to smearing artefact, but following the trajectory of a wave over one cycle can more accurately determine the range of true interactions. One could, for example, compare the usual CSD forward modelling with TWs in simulations, see which is the best predictor of future activity, and compare these to empirical measurements. Here, the CSD analysis would remove the volume conduction effects but also emphasize standing activity over motion, even where the motion was veridical in the simulation.

      Even so, these tests are only relevant in <1cm range.

      Another issue is ephaptic coupling, which we mention in the discussion. This means that some of the local volume conduction effects are not merely artefacts from the point of view of cortical function, but have a real causal effect. The strength of the word ‘some’ has yet to be completely resolved in the literature, and it would be technically challenging to include these effects in any simulation.

      Finally, simulation should be an adjunct to empirical studies, or used when empirical studies are not possible. We do not think, in this case, they are the ‘only direct’ way to evaluate our method. We, rather, rely on the converging evidence from empirical studies of volume conduction in LFPs which show this effect is outside the range of our reported results.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors present an approach that uses the transformer architecture to model epistasis in deep mutational scanning datasets. This is an original and very interesting idea. Applying the approach to 10 datasets, they quantify the contribution of higher-order epistasis, showing that it varies quite extensively.

      Suggestions:

      (1) The approach taken is very interesting, but it is not particularly well placed in the context of recent related work. MAVE-NN, LANTERN, and MoCHI are all approaches that different labs have developed for inferring and fitting global epistasis functions to DMS datasets. MoCHI can also be used to infer multidimensional global epistasis (for example, folding and binding energies) and also pairwise (and higher order) specific interaction terms (see 10.1186/s13059-024-03444-y and 10.1371/journal.pcbi.1012132). It doesn't distract from the current work to better introduce these recent approaches in the introduction. A comparison of the different capabilities of the methods may also be helpful. It may also be interesting to compare the contributions to variance of 1st, 2nd, and higher-order interaction terms estimated by the Epistatic transformer and MoCHI.

      We thank the reviewer for the very thoughtful suggestion.

      Although these methods are conceptually related to our method, none of them can be realistically used to perform the type of inference we have done in the paper on most the datasets we used, as they all require explicitly enumerating the large number of interaction terms.

      We have included new text (Line 65-74) in the introduction to discuss the advantages and disadvantages of these models. We believe this has made our contribution better placed in the broader context of the field.

      (2) https://doi.org/10.1371/journal.pcbi.1004771 is another useful reference that relates different metrics of epistasis, including the useful distinction between biochemical/background-relative and backgroundaveraged epistasis.

      We have included this very relevant reference in the introduction. We also pointed out the limitation of these class of methods is that they typically require near combinatorically complete datasets and often have to rely on regularized regression to infer the parameters, making the inferred model parameters disconnected from their theoretical expectations. Line 49-56.

      (3) Which higher-order interactions are more important? Are there any mechanistic/structural insights?

      We thank the reviewer for pointing out this potential improvement. We have now included a detailed analysis of the GRB2-SH3 abundance landscape in the final section of the results. In particular, we estimated the contribution of individual amino acid sites to different orders (pairwise, 3-4th order, 4-8th order) of epistasis and discuss our finding in the context of the 3D structure of this domain. We also analyzed the sparsity of specific interactions among subsets of sites.

      Please see Results section “Architecture of specific epistasis for GRB2-SH3 abundance.”

      Reviewer #2 (Public review):

      Summary:

      This paper presents a novel transformer-based neural network model, termed the epistatic transformer, designed to isolate and quantify higher-order epistasis in protein sequence-function relationships. By modifying the multi-head attention architecture, the authors claim they can precisely control the order of specific epistatic interactions captured by the model. The approach is applied to both simulated data and ten diverse experimental deep mutational scanning (DMS) datasets, including full-length proteins. The authors argue that higher-order epistasis, although often modest in global contribution, plays critical roles in extrapolation and capturing distant genotypic effects, especially in multi-peak fitness landscapes.

      Strengths:

      (1) The study tackles a long-standing question in molecular evolution and protein engineering: "how significant are epistatic interactions beyond pairwise effects?" The question is relevant given the growing availability of large-scale DMS datasets and increasing reliance on machine learning in protein design.

      (2) The manuscript includes both simulation and real-data experiments, as well as extrapolation tasks (e.g., predicting distant genotypes, cross-ortholog transfer). These well-rounded evaluations demonstrate robustness and applicability.

      (3) The code is made available for reproducibility.

      We thank the reviewer for the positive feedback.

      Weaknesses:

      (1) The paper mainly compares its transformer models to additive models and occasionally to linear pairwise interaction models. However, other strong baselines exist. For example, the authors should compare baseline methods such as "DANGO: Predicting higher-order genetic interactions." There are many works related to pairwise interaction detection, such as: "Detecting statistical interactions from neural network weights", "shapiq: Shapley interactions for machine learning", and "Error-controlled nonadditive interaction discovery in machine learning models."

      We thank the reviewer for this very helpful comment. These references are indeed conceptually quite similar to our framework. Although they are not directly applicable to the types of analyses we performed in this paper (partitioning contribution of epistasis into different interaction orders in terms of variance components), we have included a discussion of these methods in the introduction (Line 70-74). We believe this helps better situate our method within the broader conceptual context of interpreting machine learning models for epistatic interactions.

      (2) While the transformer architecture is cleverly adapted, the claim that it allows for "explicit control" and "interpretability" over interaction order may be overstated. Although the 2^M scaling with MHA layers is shown empirically, the actual biological interactions captured by the attention mechanism remain opaque. A deeper analysis of learned attention maps or embedding similarities (e.g., visualizations, site-specific interaction clusters) could substantiate claims about interpretability.

      Again, we thank the reviewer for the thoughtful comment. We have addressed this comment together with a related comment by Reviewer1 by including a detailed analysis of the GRB2-SH3 landscape using a marginal epistasis framework, where we quantified the contribution of individual sites to different orders of epistasis as well as the sparsity of epistatic interactions. We also present these results in the context of the structure of this protein. Please see Results section “Architecture of specific epistasis for GRB2-SH3 abundance.”

      (3) The distinction between nonspecific (global) and specific epistasis is central to the modeling framework, yet it remains conceptually underdeveloped. While a sigmoid function is used to model global effects, it's unclear to what extent this functional form suffices. The authors should justify this choice more rigorously or at least acknowledge its limitations and potential implications.

      We agree that the under parameterization of the simple sigmoid function could be be potentially confounding. We did compare different choices of functional forms for modeling global epistasis. Overall, we found that there is no difference between a simple sigmoid function with four trainable parameters and the more complex version (sum of multiple sigmoid functions, used by popular methods such as MAVENN). Therefore, all results we presented in the paper were based on the model with a single scalable sigmoid function.

      We have added relevant text; line 153-158. We have also included side-by-side comparisons of the model performance for the GRB-abundance and the AAV2 dataset to corroborate this claim (Supplemental Figure 1).

      (4) The manuscript refers to "pairwise", "3-4-way", and ">4-way" interactions without always clearly defining the boundaries of these groupings or how exactly the order is inferred from transformer layer depth. This can be confusing to readers unfamiliar with the architecture or with statistical definitions of interaction order. The authors should clarify terminology consistently. Including a visual mapping or table linking a number of layers to the maximum modeled interaction order could be helpful.

      We thank the reviewer for the thoughtful suggestion. We have rewritten the description of our metrics for measuring the importance of "pairwise", "3-4-way", and ">4-way" interactions; Line 232-239.

      We have also added a table to improve clarity, as suggested; Table 2.

      Reviewer #3 (Public review):

      Summary:

      Sethi and Zou present a new neural network to study the importance of epistatic interactions in pairs and groups of amino acids to the function of proteins. Their new model is validated on a small simulated data set and then applied to 10 empirical data sets. Results show that epistatic interactions in groups of amino acids can be important to predict the function of a protein, especially for sequences that are not very similar to the training data.

      Strengths:

      The manuscript relies on a novel neural network architecture that makes it easy to study specifically the contribution of interactions between 2, 3, 4, or more amino acids. The study of 10 different protein families shows that there is variation among protein families.

      Weaknesses:

      The manuscript is good overall, but could have gone a bit deeper by comparing the new architecture to standard transformers, and by investigating whether differences between protein families explain some of the differences in the importance of interactions between amino acids. Finally, the GitHub repository needs some more information to be usable.

      We thank the reviewer for the thoughtful comments. We have listed our response below in the “Recommendations for the authors” section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Some of the dataset labels are confusing. For example, GRB is actually the protein GRB2 and more specifically just one of the two SH3 domains from GRB2 (called GRB2-SH3 in Faure et al.).

      We thank the reviewer for catching this. Our original naming of the datasets followed the designation of library number in the Faure et al paper (which constructed 3 variant libraries and performed different assays on them). To avoid confusion (and also save space in the figure titles), we have now renamed the datasets using this mapping:

      Author response table 1.

      Reviewer #3 (Recommendations for the authors):

      (1) What is the cost of the interpretability of the model? It would be interesting to evaluate how a standard transformer, complete with its many non-linearities, performs on the simulated 13-position data, using the r2 metric. This is important as the last sentence of the discussion seems to suggest that the model proposed by the authors could be used in other contexts, where perhaps interpretability would be less important.

      We thank the reviewer for this suggestion. We have run a generic transformer model on the GRBabundance and AAV2 datasets. Overall, we found minimal difference between the generic model and our interpretable model, suggesting that fitting the interpretable transformer does not incur significant cost in performance.

      We have included a side-by-side comparison of the performance of the generic transformer and our three-layer model in Supplemental Figure 5 and a discussion of this finding in Line 256-259.

      (2) The 10 data sets analyzed by the authors differ in their behaviour. I was wondering whether the proteins have different characteristics, beyond the number and distribution of mutants in the data sets. For instance, do high-order interactions play a bigger role in longer proteins, in proteins with more secondary structures, in more hydrophobic proteins?

      We fully agree that this is a highly relevant question. Unfortunately, the paucity of datasets suitable for the type of analyses we performed in the paper limit our ability to draw general conclusions. Furthermore, the differences in genotype distribution among the 10 datasets may be the main driving factor in the behaviors of the models.

      We included our thoughts on this issue in the discussion (Line 477-481).

      We will definitely revisit this question if this type of high-order combinatorial DMS data becomes more available in the (hopefully) near future.

      (3) Although the code appears to be available in the repository, there is no information about the content of the different folders, about what the different scripts do, or about how to reproduce the article's results. More work should be done to clarify it all.

      Thank you for pointing this out. We have substantially improved our github repository and included many annotations for reproducibility.

      (4) Typos and minor comments:

      (a) p3 "a multi-peak fitness landscapes": landscape.

      (b) p3 "Here instead of directly fitting the the regression coefficients in Eq. 2": remove 'the'.

      (c) p3 "neural network architectures do not allow us to control the highest order of specific epistasis": a word is missing.

      (d) p6 "up to 1,926, 3,014, and 4,102 parameters, respectively-all smaller than the size of the training dataset": it's not very clear what size of the dataset means: number of example sequences?

      (e) p6 "This results confirm": This result confirms.

      (f) p6 "to the convergence of of the variance components of the model landscape to the ground truth.": remove 'of'.

      (g) p7 "to characterize the importance higher-order interactions": the importance of.

      (h) p7 "The improvement varies across datasets and range": and ranges.

      (i) p9 "over the pairwise model is due to the its ability": remove 'the'.

      (j) p13 "This results suggest that pairwise": result suggests.

      (k) p13 "although the role assessed by prediction for randomly sampled genotypes seems moderate": sampled. Also, I'm not sure I understand this part of the sentence: what results are used to support this claim? It's not 6b, which is only based on the mutational model.

      This is in Supplemental Figure 7.

      (l) p13 "potentially by modeling how the these local effects": remove the.

      (m) p13 "We first note that the the higher-order models": remove the.

      (n) p15 "M layers of MHA leads to a models that strictly": lead to a model.

      (o) Supp Figure 1: "Solid lines shows the inverse": show.

      (p) Supp p 10 "on 90% of randomly sample data": sampled.

      (q) Supp p11 "Next, assume that Eq. 5 is true for m > 0. We need to show that Eq. 5 is also true for m + 1.": shouldn't it be m>=0 ? It seems important to start the recursive argument.

      Good catch.

      (r) Supp p11 "Since the sum in line 9 run through subsets": runs.

      (s) Supp p11 "we can further simplify Eq. 11 it to": remove it.

      We have fixed all these problems. We very much appreciate the reviewer’s attention.

    1. Author response:

      eLife Assessment

      This study uses the yeast two-hybrid assay to identify proteins that may interact with yeast Set1 and other subunits of COMPASS/Set1C, the histone H3K4 methyltransferase, providing also some evidence for Set1 sumoylation and a role of SET1C methylating other factors in vitro. The results are valuable, and they should contribute to understanding the functions of the conserved SET1C complex, as they suggest potential functional connections with RNA biogenesis, chromatin remodeling, and non-histone methylation, whose implications would yet need to be explored. Nevertheless, apart from the fact that only a small subset of the Y2H interactions is further examined, the validating experiments are only partial or inconclusive, the strength of evidence being at this point incomplete.

      We thank the reviewers for their thoughtful comments, which primarily raise three major concerns: the overinterpretation of the Y2H data, issues related to validation, and the manuscript’s structure. At the same time, the reviewers acknowledge that the dataset is extensive and that aspects of the validation work are valuable. Below, we provide point-by-point responses to the public reviews. We will prepare a revised version of the manuscript that carefully addresses the public comments and incorporates the referees’ recommendations.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Luciano et al is a collection of experiments about the yeast histone 3 lysine 4 methyltransferase, Set1, starting with 10 yeast two-hybrid screens (Y2H). Y2H screens were briefly popular 20+ years ago, but the persistently unfavourable false-to-true positive ratios limited their utility, and the conclusion emerged that Y2H is an unreliable approach for gathering protein-protein interaction data. Y2H outcomes are candidate interaction lists at best, strongly contaminated by false positives. Here, the authors employed a company (Hybridomics) to perform the Y2H screens.

      The primary data is not presented, and the outcomes are summarized using the Hybridomics in-house quality scoring system in Figure 1A. It is not possible to evaluate these data, and the manuscript presents cartoon summaries that the reader must accept as valuable.

      We agree that false positives contaminate the list of potential interactors. Some interactions may also be indirect through a common interactor and do not reflect a physiological interaction. Nevertheless, some positives reflect real interactions that can occur under specific physiological conditions. This is the case, for example, with the interaction between Spp1 and Mer2 (from this screen), which has led to major discoveries (Acquaviva et al. Science 2013; Sommermeyer et al. Mol Cell 2013). The publication of these 10 screens should be viewed as a valuable resource for the broader community.

      Hybrigenics brings extensive experience from conducting numerous screens, enabling the team to recognize recurring false positives that commonly arise in screening assays.

      (1) Based on the extensive knowledge about Set1C/COMPASS acquired from genetics and biochemistry by many labs (including the Geli lab), the results presented here from the 10 Y2H screens are notably patchy. Of the 7 subunits of this complex, only one (Spp1) was identified using Set1 as bait. Conversely, as baits, Swd2, Spp1, Shg1, captured Set1, and the Bre2-Sdc1 interaction was reciprocally identified. These interactions were scored at the highest confidence level, which lends some confidence to the screens. However, the missing interactions, even at the third confidence level, indicate that any Y2H conclusions using these data must be qualified with caution. The authors do not appear to be cautious in their lengthy evaluations of these candidate interactions, which are illustrated with cartoons in Figures 2 and 3, with some support from the literature but almost without additional evidence. Snf2 is a particularly interesting candidate, which the authors support with pull-down experiments after mixing the two proteins in vitro (Figure 4). After Y2H, this is the least convincing evidence for a protein-protein interaction, and no further, more reliable evidence is supplied.

      We agree with referee 1 that more caution is needed, and we will take this into account in the revised version. We agree that Y2H interaction is an indication of potential interaction and not proof of interaction. We have therefore made a significant effort to compile elements from the literature that may support the interaction. Once again, this study can be considered a resource.

      (2) Figure 5 continues the cartoon summary of extrapolations from the Y2H screens, again without supporting evidence, except that the authors state, "We have refined the interaction region between Set1, Prp8 and Prp22, showing that Prp8 and Prp22 interact strongly with Set1-F4 (n-SET). Prp22 interacts in addition with Set1-F1 (Figure S2)." However, Figure S2 does not show this evidence and is incoherent.

      When we say that we have refined the interaction region between Set1, Prp8, and Prp22, we mean that we have restricted the interaction regions according to Y2H criteria. Indeed, we have not shown the spots illustrating the results. This will be corrected in the revised version.

      The figure legends for Figure S2B and C (copied here in bold) do not correspond to the figure.

      We agree that the legend for Figure S2 is unclear and does not accurately describe the panels shown in the figure. We will revise the legend accordingly in the updated version to ensure it accurately reflects the content of all panels.

      (B) Expression of the F1-F5 fragments in yeast cells. Fusion proteins were detected with an anti-GAL4 monoclonal antibody. TOTO yeast cells (Hybrigenics) were transformed with the different pB66-Set1-F1 to F5 plasmids and subsequently with either P6, pP6-Snf2 762-968, pP6-Prp8 37-250, or pP6-Prp22 379-763 that were identified in the Y2H screens. Transformed cells were incubated 3 days at 30{degree sign}C on SD-LEU-TRP and then restreaked on SD-LEU-TRP-HIS with 3AT. Cell growth was monitored after 2 days at 30{degree sign}C.

      (C) Solid and dotted arrows indicate that transformed TOTO cells transformed with pB66-Set1-F1 to F5 and the indicated prey (Snf2, Prp8, and Prp22) are growing in the presence of 20 mM and 5 mM of AT, respectively.

      Figure S2D is two almost featureless dark grey panels accompanied by the figure legend D) Control experiment showing that TOTO cells transformed with p6 and pB66-Set1-F4 are not gowing (sic) in the presence of 5 mM or 20 mM AT.

      Line 343. Interestingly, the two-hybrid screens reveal that Set1 1-754 interacted with Gag capsid-like proteins of Ty1 (Figure S5), raising the possibility that Set1 binding to Ty1 mRNA is linked to the interaction of Set1 1-754 with Gag.

      This is another example of the primary mistake repeatedly made by the authors -Y2H interactions are candidate results and not conclusive evidence.

      This statement is supported by our previous findings demonstrating that Set1 binds Ty1 mRNA independently of it dRRM and represses Ty1 mobility at a post-transcriptional stage (Luciano et al., Cell Discovery, 2017 PMID:29071121). Binding of Set1 to Ty1 mRNA could stem from the interaction between Set1 1-754 and the Gag capsid-like protein.

      To further illustrate this point, the authors highlight the candidate interaction between Nis1 and 3 Set1C subunits.

      While we agree that the Nis1-Set1C interaction has not been demonstrated beyond doubt, we feel that our Y2H and in vitro binding experiments provide reasonable evidence that the interactions may be relevant. It is important to consider that any interaction assay can provide negative (and false positive) results, this includes Y2H, in vitro binding and mass-spec analysis of purified complexes from cells. We feel that it is not appropriate to only trust protein interactions that are strong and stable enough to be demonstrated via purified complexes. It is clear that some protein interactions do occur in transient and weak manner and therefore are not compatible with biochemical purification approach. This indeed is the strength of alternative methods like Y2H and in vitro binding assays, that interactions can be identified and tested even if the physiological context of the interaction may be more complex.

      (3) After multiple speculations based on the Y2H candidates, the authors changed to focus on sumoylation of Set1, which has previously reported to be sumoylated. Evidence identifying two sumoylation sites in Set1, in the N-SET and SET domains, is valuable and adds important progress to the role of sumoylation in the regulation of H3K4 methyltransferase, relevant for all eukaryotes. This illuminating part of the manuscript is only tenuously connected to the preceding Y2H screens and concomitant speculations.

      We thank Referee 1 for their comment. While it is true that there is only a modest connection between Set1 interactors involved in direct or indirect sumoylation and the characterization of Set1 SUMOylation sites, we believe that this does not constitute a weakness of the manuscript.

      (4) The manuscript then describes a red herring exercise involving Set1 methylation of Nrm1. In an already speculative and difficult manuscript, it is exasperating to read a paragraph about a failed idea. Apart from panel E, Figure 7 is a distraction, and I believe it should not be shared.

      According to this comment, we will remove Fig. 7 panels A-D.

      (5) However, despite the failure with Nrm1, Line 443 - The H3K4-like domain in Nrm1 raised our attention to other yeast proteins that carry such sequences.

      This line of thinking is even less connected to the Y2H screens than the sumoylation work.

      However, the authors present a reasonable evaluation of the yeast proteome screened for six amino acids similar to the known H3K4 motif ARTKQT (Figure 7e).

      (6) However, this evaluation goes nowhere and has no connection with the next section of the manuscript, which is entirely speculation about the regulation of metabolism and stress responses based on the Y2H results and selected evidence from the literature.

      We will take into account of these remarks (points 5 and 6) in the revised version.

      (7) The manuscript then describes more failed experiments regarding lysine methylation of Snf2 by Set1C, which unexpectedly reports arginine methylation rather than lysine. The manuscript does not currently meet the standard expected for this type of paper - the composition is somewhat incoherent and there are no previous reports of arginine methylation by SET domain proteins.

      We respectfully disagree with referee 1. We have integrated extensive in vitro reconstruction experiments with complementary in vivo studies, all conducted according to the rigorous standards expected by leading journals. These approaches have allowed us to reach the conclusions presented in this manuscript. While some of these findings are unexpected, they are supported by the data. We have carefully discussed the results and their limitations to provide a comprehensive interpretation.

      The manuscript presents a very experienced grasp of the literature and a sophisticated appreciation of the forefront issues, but a surprising failure to eliminate uninformative failures and peripheral distractions. The overinterpretation of Y2H results is a dominating failure. There are some valuable parts within this manuscript, and hopefully, the authors can reformat to eliminate the defects and appropriately qualify the candidate data.

      We thank Referee 1 for these insightful comments. In the revised version, we will follow the advice to remove non-informative failures and peripheral distractions. Additionally, we will exercise greater caution to avoid overinterpreting the Y2H results.

      Reviewer #2 (Public review):

      Summary:

      This paper starts with a large-scale yeast two-hybrid (Y2H) screen using Set1 (full-length and smaller parts) and other Set1C/COMPASS subunits as bait. There are hundreds of possible interactions identified, but only a small number are given any follow-up. While it's useful to document all the possible interactions, the unfocused and preliminary nature of the results makes the paper feel scattered and incomplete.

      Strengths:

      The Y2H screen was very comprehensive, producing lots of interesting possible leads for further experiments.

      Weaknesses:

      The results are useful but incomplete because only a small subset of the Y2H interactions is further examined. Even in the case of those that were further tested, the validating experiments are only partial or inconclusive.

      Referee 2’s comments align in some respects with those of Referee 1. We will follow the detailed Referee 2 suggestions to reduce the scattered nature of the manuscript.

      We will follow his/her recommendations, in particular we will provide and AlphaFold model of the interaction between the Set1 N-term 1-754 with the SID domain of Kap104 that involves the proposed Set1 PY-NLS sequence.

      Reviewer #3 (Public review):

      The SET1C/COMPASS complex is the histone H3K4 methyltransferase in Saccharomyces cerevisiae, where it plays pivotal roles in transcriptional regulation, DNA repair, and chromatin dynamics. While its canonical function in histone methylation is well-established, its full interactome remains poorly defined. Moreover, whether SET1C methylates non-histone substrates has been an open question. In this study, Luciano et al. employ systematic yeast two-hybrid (Y2H) screening to uncover novel interactors and functions of SET1C. Their findings reveal potential functional connections to RNA biogenesis, chromatin remodeling, and non-histone methylation.

      The authors performed multiple Y2H screens using Set1 (full-length, N-terminal, and C-terminal fragments) and each of its seven subunits as baits. They identified high-confidence interactors that link SET1C to diverse cellular processes, including chromatin regulation (e.g., the SWI/SNF complex via Snf2), DNA replication (e.g., Mcm2, Orc6), RNA biogenesis (e.g., spliceosome components Prp8 and Prp22; polyadenylation factors Pta1 and Ref2), tRNA processing (e.g., Trm1, Trm732), and nuclear import/export (e.g., importins Kap104 and Kap123). Some of these interactions were further validated by immunoprecipitation or in vitro assays.

      Given the interaction of Set1 with Slx5 and Wss1 - proteins involved in SUMO-dependent processes - the authors investigated and convincingly demonstrated that Set1 is sumoylated. This modification may influence the function and regulation of the SET1C complex.

      Finally, the authors provide evidence that SET1C methylates proteins beyond histone H3K4, notably Nrm1, a transcriptional corepressor, and Snf2, the catalytic subunit of the SWI/SNF chromatin remodeling complex. Although Nrm1 contains a domain resembling the H3K4-methylated sequence (H3K4-like domain), this region does not appear to be required for its methylation. The search for other proteins containing similar domains as potential methylation candidates (p.12, first paragraph) seems less justified, given the lack of evidence supporting the requirement for the H3K4-like domain in methylation.

      This study offers valuable insights into the interactome of SET1C, suggesting potential links between the complex and a wide range of cellular processes. However, the functional implications of the Y2H interactions remain to be explored further. Additionally, the study provides intriguing information on the possible regulation of Set1 by sumoylation. The discovery of Nrm1 and Snf2 as methylation substrates could significantly expand the known targets and functions of SET1C.

      The results are supported by high-quality data.

      We thank referee 3 for his/her positive comments

    1. Author response:

      We sincerely appreciate the constructive comments and valuable suggestions from the editors sand reviewers. We highly value the feedback and will carefully address all concerns in our revised manuscript.

      (1) We will supplement more details of the processing steps and key results in the analyses of sCCA and SVR to improve the transparency and reproducibility of our methods.

      (2) According to the reviewers’ suggestions, we will adjust and present a more conventional and cautious conclusion regarding clinical specificity and neuroplasticity reserve.

      (3) We will supplement the results of structural connections (termed “symptom-related network” in the manuscript) across the three subgroups to strengthen the interpretation of subgroup-specific neurobiological characteristics.

      (4) All the suggestions from the reviews will be respected, and we will carefully revise our manuscript to improve its clarity, rigor, and scientific quality.

      We believe these revisions will significantly improve the quality of our work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thoughtful comments and constructive suggestions. We describe how we have addressed each point below and are grateful for the guidance on areas where our work could be clarified or expanded. In particular, we note the following:

      Selection scan summary statistics: In our revised manuscript, we have included summary statistics from the selection scans. We believe this addition will enhance transparency and provide additional context for readers.

      Reporting of outliers: As highlighted by the editor, the reviewers expressed differing views on the most appropriate way to report outliers. To provide a comprehensive and balanced presentation, we now report both the empirical selection statistics and the corresponding converted p-values in either the main text or supplement, and both outputs are also provided in the full summary files. This dual approach will allow readers to fully interpret the results under both perspectives.

      Expanded discussion of admixture timing and population structure: We have carefully considered the reviewers' suggestions to incorporate additional descriptions of population structure or demographic analyses, and have done so in our revisions where possible. These changes strengthen the rigor and clarity of the analyses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper reports an analysis of whole-genome sequence data from 40 Faroese. The authors investigate aspects of demographic history and natural selection in this population. The key findings are that the Faroese (as expected) have a small population size and are broadly of Northwest European ancestry. Accordingly, selection signatures are largely shared with other Northwest European populations, although the authors identify signals that may be specific to the Faroes. Finally, they identify a few predicted deleterious coding variants that may be enriched in the Faroes.

      Strengths:

      The data are appropriately quality-controlled and appear to be of high quality. Some aspects of the Faroese population history are characterized, in particular, by the relatively (compared to other European populations) high proportion of long runs of homozygosity, which may be relevant for disease mapping of recessive variants. The selection analysis is presented reasonably, although as the authors point out, many aspects, for example differences in iHS, can reflect differences in demographic history or population-specific drift and thus can't reliably be interpreted in terms of differences in the strength of selection.

      Weaknesses:

      The main limitations of the paper are as follows:

      (1) The data are not available. I appreciate that (even de-identified) genotype data cannot be shared; however, that does substantially reduce the value of the paper. Minimally, I think the authors should share summary statistics for the selection scans, in line with the standard of the field.

      We agree with the reviewer that sharing the selection scan results is important, so we have now made the selection scan summary statistics publicly available, and clearly lay out the guidelines and research questions for which the data can be accessed in our Data Availability statement.

      (2) The insight into the population history of the Faroes is limited, relative to what is already known (i.e., they were settled around 1200 years ago, by people with a mixture of Scandinavian and British ancestry, have a small effective population size, and any admixture since then comes from substantially similar populations). It's obvious, for example, that the Faroese population has a smaller bottleneck than, say, GBR.

      More sophisticated analyses (for example, ARG-based methods, or IBD or rare variant sharing) would be able to reveal more detailed and fine-scale information about the history of the populations that is not already known. PCA, ADMIXTURE, and HaplotNet analysis are broad summaries, but the interesting questions here would be more specific to the Faroes, for example, what are the proportions of Scandinavian vs Celtic ancestry? What is the date and extent of sex bias (as suggested by the uniparental data) in this admixture? I think that it is a bit of a missed opportunity not to address these questions.

      We clarify that we did quantify the proportions of various ancestry components as estimated by HaploNet in main text Figure 5 and supplemental figures S6 and S7. To better highlight this result, we now also include the average global ancestry of the various components in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes.

      We agree that more fine-scale demographic analyses would be informative. We now additionally provide an estimation of the admixture date in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes and discussion using the DATES software which is optimized for ancient genomes.

      We have encountered problems with using different standard date estimation software, including DATES, which give very inconsistent and unstable results. As we note in our text, we suspect this might be due to the strong bottleneck experienced in the history of the Faroe Islands, low LD differentiation between the source populations, or multiple pulses of admixture, which may be breaking one or more of the assumptions of these methods. Assessing the limitations of these methods is beyond the scope of this current manuscript; however, we will continue working on this problem for future studies, possibly using simulations to assess where the problem might be. We recognize that our relatively small sample size places limits on the fine-scale demographic analyses that can be performed. We are addressing this in ongoing work by generating a larger cohort, which we hope will enable more detailed inference in the future.

      (3) I don't really understand the rationale for looking at HLA-B allele frequencies. The authors write that "ankylosing spondylitis (AS) may be at a higher prevalence in the Faroe Islands (unpublished data), however, this has not been confirmed by follow-up epidemiological studies". So there's no evidence (certainly no published evidence) that AS is more prevalent, and hence nothing to explain with the HLA allele frequencies?

      We agree that no published studies have confirmed a higher prevalence of ankylosing spondylitis (AS) in the Faroe Islands. Our recruitment data suggest that AS might be more common than in other European populations, but we understand that this is only based on limited, unpublished observations and what we are hearing from the community. We emphasized in our original manuscript that this is based on observational evidence from the FarGen project. However, as this reviewer pointed out, we can be more clear that this prevalence has not been formally studied.

      In revision, we clarify in the Main Text - Results - HLA-B Allele Frequencies and Discussion that our recruitment data suggest a higher prevalence of AS may be possible, but more formal epidemiological studies are needed to confirm this observation. The reason we study HLA-B allele frequencies is to see if the genetic background of the Faroese population could help explain this possible difference, since HLA-B27 is already known to play a strong role in AS.

      Reviewer #2 (Public review):

      In this paper, Hamid et al present 40 genomes from the Faroe Islands. They use these data (a pilot study for an anticipated larger-scale sequencing effort) to discuss the population genetic diversity and history of the sample, and the Faroes population. I think this is an overall solid paper; it is overall well-polished and well-written. It is somewhat descriptive (as might be expected for an explorative pilot study), but does make good use of the data.

      The data processing and annotation follows a state-of-the-art protocol, and at least I could not find any evidence in the results that would pinpoint towards bioinformatic issues having substantially biased some of the results, and at least preliminary results lead to the identification of some candidate disease alleles, showing that small, isolated cohorts can be an efficient way to find populations with locally common, but globally rare disease alleles.

      I also enjoyed the population structure analysis in the context of ancient samples, which gives some context to the genetic ancestry of Faroese, although it would have been nice if that could have been quantified, and it is unfortunate that the sampling scheme effectively precludes within-Faroes analyses.

      We note that although the ancestry proportions were not originally specified in the main text, we did quantify ancestry proportions in the modern Faroese individuals and other ancient samples, and we visualized these proportions in Figure 5 and Supplementary Figures S6 and S7. As stated in our response to Reviewer #1, in our revisions, we now more clearly state the average global ancestry of the various components in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes.

      I am unfortunately quite critical of the selection analysis, both on a statistical level and, more importantly, I do not believe it measures what the authors think it does.

      Major comments:

      (1) Admixture timing/genomic scaling/localization:

      As the authors lay out, the Faroes were likely colonized in the last 1,000-1,500 years, i.e., 40-60 generations ago. That means most genomic processes that have happened on the Faroese should have signatures that are on the order of ~1-2cM, whereas more local patterns likely indicate genetic history predating the colonization of the islands. Yet, the paper seems to be oblivious to this (to me) fascinating and somewhat unique premise. Maybe this thought is wrong, but I think the authors miss a chance here to explain why the reader should care beyond the fact that the small populations might have high-frequency risk alleles and the Faroes are intrinsically interesting, but more importantly, it also makes me think it leads to some misinterpretations in the selection analysis.

      See response to point #3

      (2) ROH:

      Would the sampling scheme impact ROH? How would it deal with individuals with known parental coancestry? As an example of what I mean by my previous comment, 1MB is short enough in that I would expect most/many 1MB ROH-tracts to come from pedigree loops predating the colonization of the Faroes. (i.e, I am actually quite surprised that there isn't much more long ROH, which makes me wonder if that would be impacted by the sampling scheme).

      The sampling scheme was designed to choose 40 Faroese individuals that were representative of the different regions and were minimally related. There were no pairs of third-degree relatives or closer (pi-hat > 0.125) in either the Faroese cohort or the reference populations. It is possible that this sampling scheme would reduce the amount of longer ROHs in the population, but we should still be able to see overall patterns of ROH reflective of bottlenecks in the past tens of generations. Additionally, based on this reviewer's earlier comment, 1 Mb ROHs would still be relevant to demographic events in the last 40-60 generations given that on average 1 cM corresponds to 1 Mb in humans, though we recognize that is not an exact conversion.

      That said, the “sum total amount of the genome contained in long ROH” as we described in the manuscript includes all ROHs greater than 1Mb. Although we group all ROHs longer than 1Mb into one category in Main Text Figure 2, we now additionally provide the distribution in ROH lengths across all individuals for each cohort in a new Supplemental Figure S3. As this plot shows, there certainly are ROHs longer than 1Mb in the Faroese cohort, and on average there is a higher proportion of long ROH particularly in the 5-15 Mb range in the Faroese cohort relative to the other cohorts. As the reviewer points out, these longer ROHs are possibly indicative of a more recent or stronger bottleneck in the Faroes relative to the comparison cohorts. We highlight this result in Main Test - Results - Population Structure and Relatedness.

      (3) Selection scan:

      We are talking about a bottlenecked population that is recently admixed (Faroese), compared to a population (GBR) putatively more closely related to one of its sources. My guess would be that selection in such a scenario would be possibly very hard to detect, and even then, selection signals might not differentiate selection in Faroese vs. GBR, but rather selection/allele frequency differences between different source populations. I think it would be good to spell out why XP-EHH/iHS measures selection at the correct time scale, and how/if these statistics are expected to behave differently in an admixed population.

      The reviewer brings up good points about the utility of classical selection statistics in populations that are admixed or bottlenecked, and whether the timescale at which these statistics detect selection is relevant for understanding the selective history of the Faroese population. We break down these concerns separately.

      (1) Bottlenecks: Recent bottlenecks result in higher LD within a population. However, demographic events such as bottlenecks affect global genomic patterns while positive selection is expected to affect local genomic patterns. For this reason, iHS and XP-EHH statistics are standardized against the genome-wide background, to account for population-specific demographic history.

      (2) Admixture: The term “admixture” has different interpretations depending on the line of inquiry and the populations being studied. Across various time and geographic scales, all human populations are admixed to some degree, as gene flow between groups is a common fixture throughout our history. For example, even the modern British population has “admixed” ancestry from North / West European sources as well, dating to at least as recently as the Medieval & Viking periods (Gretzinger et al. 2022, Leslie et al. 2015), yet we do not commonly consider it an “admixed” population, and we are not typically concerned about applying haplotype-based statistics in this population. This is due to the low divergence between the source populations. In the case of the Faroe Islands, we believe admixture likely occurred on a similar timescale or even earlier, based on the DATES estimates. We see low variance in ancestry proportions estimated by HaploNet, both from the historical Faroese individuals (dated to 260 years BP) and the modern samples. This indicates admixture predating the settlement of the Faroe Islands, where recombination has had time to break up long ancestry tracts and the global ancestry proportions have reached an equilibrium. That is, these ancestry patterns suggest that the modern Faroese are most likely descended from already admixed founders. In the original manuscript, we mentioned this as a likely possibility in the Main Text - Discussion: “This could have occurred either via a mixture of the original “West Europe” ancestry with individuals of predominantly “North Europe” ancestry, or a by replacement with individuals that were already of mixed ancestry at the time of arrival in the islands (the latter are not uncommon in Viking Age mainland Europe).” In our revisions, we further included the DATES estimations of the timing of admixture in the modern and historical Faroese samples, which pre-date the timing of settlement in both cases. We highlight these points in the Discussion. And, as with the case of the British population, the closely-related ancestral sources for the Faroese founders were likely not so diverged as to have differences in allele frequencies and long-range haplotypes that would disrupt signals of selection from iHS or XP-EHH.

      (3) Time scale: It is certainly possible, and in fact likely, that iHS measures selection older than the settlement of the Faroe Islands. In our manuscript, we calculated iHS in both the Faroese and the closely related British cohort, and we highlight in the main Main Text that the top signals, with the exception of LCT, are shared between the two cohorts, indicative of selection that began prior to the population split (Discussion and Results - Signals of Positive Selection). iHS is a commonly calculated statistic, and it is often calculated in a single population without comparing to others, so we feel it is important to show our result demonstrating these shared selection signals. In our revisions, we now clarify in the Discussion the limitations and time-scale at which the iHS statistic may detect selection. As far as XP-EHH, it is a statistic designed to identify differentiated variants that are fixed or approaching fixation in one population but not others. The time-scale of selection that XP-EHH can detect would therefore be dependent on the populations used for comparison. As XP-EHH has the best power to identify alleles that are fixed or approaching fixation in one population but not others, it is less likely to detect older selection events / incomplete sweeps from the source populations. We highlight this point in the Discussion.

      (4) Similarly, for the discussion of LCT, I am not convinced that the haplotypes depicted here are on the right scale to reflect processes happening on the Faroes. Given the admixture/population history, it at the very least should be discussed in the context of whether the 13910 allele frequency on the Faroes is at odds with what would be expected based on the admixture sources.

      We agree that more investigation into the LCT allele frequency in the other ancient samples may provide some insight into the selection history, particularly in light of ancient admixture. Please note, we did look at the allele frequency of the LCT allele rs4988235 and stated in the main text that it was present at high frequencies in the historical (250BP) Faroese samples. The frequency of this allele in the imputed historical Faroese samples is 82% while the allele is present at ~74% frequency in modern samples. We originally did not report the exact percentage in the main text because the sample size of the historical samples (11 individuals) is small and coverage of ancient samples is low, leading to potential errors in imputation.

      However, given the reviewer’s comment, we have now included the frequencies as well as these caveats in the Discussion. We additionally calculated the LCT allele frequency in other ancient samples, and assuming that we had good proxies for the sources at the time of admixture, we calculated the expected allele frequency in the admixed ancestors of the Faroese founders (Discussion), but again note the limitations in using such a calculation in this context.

      (5) I am lacking information to evaluate the procedure for turning the outliers into p-values. Both iHS and XP-EHH are ratio statistics, meaning they might be heavy-tailed if one is not careful, and the central limit theorem may not apply. It would be much easier (and probably sufficient for the points being made here) to reframe this analysis in terms of empirical outliers.

      Given that there are disagreements on the best approach to reporting selection scan results from the reviewers, in our revision, we have additionally supplied both the standardized iHS / XP-EHH values in Supplementary Fig. S10 as well as these values transformed to p-values in Main Text Fig. 3. Additionally, both outputs are provided in the publicly available selection scan results files. We provide the method for obtaining p-values in the subsection “Selection scan” from the Methods section - we used a method developed earlier by Fariello et al.

      (6) Oldest individual predating gene flow: It seems impossible to make any statements based on a single individual. Why is it implausible that this person (or their parents), e.g., moved to the Faroes within their lifetime and died there?

      We agree with the reviewer that this is a plausible explanation, and in our revisions, we have updated the Main Text - Discussion to acknowledge this possibility.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Please note that there was disagreement among the reviewers regarding the reporting of outliers.

      As stated in our response to the public reviews, given the disagreement, we include both the empirical selection statistics as well as the converted p-values in the main text, supplement and selection scan files.

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 2:

      Define labels / explain why they differ from 1000k populations / make them consistent throughout the manuscript.

      We apologize for the error in labels for Figure 2. These are the same populations used in other figures and analyses. We have fixed this in our revisions so that the labels are consistent with the rest of the manuscript.

      (2) Figure S2 label:

      "The matrix is rescaled after subsetting the individuals, so although the scales are different, the overall structure remains the same." I do not understand this sentence. The samples are different, the scale is different, the apparent pattern is different - what overall structure is supposed to be the same?

      We apologize that the language was not clear in the figure label. The scales between panels A and B are different, because popkin rescales the kinship labels after subsetting so that the minimum kinship is zero. This is necessary when subsetting individuals from an already estimated kinship matrix particularly when subsetting from global populations to a single region. From the popkin documentation: “This rescaling is required when subsetting results in a more recent Most Recent Common Ancestor (MRCA) population compared to the original dataset (for example, if the original data had individuals from across the world but the subset only contains individuals from a single continent)” (https://rdrr.io/cran/popkin/man/rescale_popkin.html).

      We also described this in the Methods - Population Genetics - Kinship and runs of homozygosity section: “When calculating the kinship matrix for the Faroese WGS cohort only, we used the rescale_kinship() function, which will change the most recent common ancestor and give different absolute values, but the overall relationship structure in the subpopulation remains the same.”

      That is, the relative kinship within the Faroese cohort remains consistent, despite the different scale.

      It is difficult to see the kinship of Faroese individuals in the larger plot with all cohorts, which is why we subset and visualize the Faroese cohort alone. We have updated the Fig. S2 label language to make this more clear.

      (3) "Iron Age Wet Europe"

      We have corrected this typo to “Iron Age West Europe.”

      I'm confused if the ancient Faroese were part of the imputation panel: Figure 5 legend implies they are, methods imply they are not.

      The ancient samples are not imputed with the modern Faroese and reference samples, but they are the imputed data downloaded from Allentoft et al. and merged with the modern Faroese cohort. We specify that we downloaded imputed ancient samples in both the Methods - Fine-scale structure estimation using ancient genomes and in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes. The description of the imputation panel in the Methods - Bioinformatics - Variant calling and imputation refers only to the modern samples.

      (4) Kinship:

      The kinship of the Faroes is useful (and nice) as a QC analysis showing the genetic data matches the expectations from the pedigree. I don't know what I should learn from the kinship of the 1000kg samples (I'd assume one could learn something about bottleneck strength from this), but it's not developed/discussed.

      The global kinship matrix provides complementary information to PCA and ROH, as another way to quantify and visualize the relationships within and between populations. Additionally, as the reviewer mentioned, bottlenecks increase kinship within populations. Given that popkin estimates kinship measured from a Most Recent Common Ancestor, we can best observe this increase in kinship when comparing to other global populations. We more clearly delineate what can be observed from Fig. S2A versus Fig. S2B in the Results - Population Structure and Relatedness.

      Reference

      (1) Gretzinger, J. et al. The Anglo-Saxon migration and the formation of the early English gene pool. Nature 610, 112–119 (2022)

      (2) Leslie, S. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Henshall et al. delete the highly abundant merozoite surface protein PfMSP2 from two Plasmodium falciparum laboratory lines (3D7 and Dd2) using CRISPR-Cas9. Parasites lacking MSP2 replicate and invade red cells normally, opposing the experimental history that suggests MSP2 is essential. Unexpectedly, the knock-outs become more susceptible to several inhibitory antibodies - most strikingly those that target the apical antigen AMA1-while antibodies to other surface or secreted proteins are largely unaffected. Recombinant MSP2 added in vitro can dampen AMA1-antibody binding, supporting a "conformational masking" model. The reported data suggest that MSP2 helps shield key invasion ligands from host antibodies and may itself be a double-edged vaccine target.

      Reviewer 1 did not have any comments we needed to address.

      Reviewer #2 (Public review):

      (1) The section describing Laverania and avian Plasmodium MSP2 comparison is a lengthy section and could be told much more concisely for clarity in delivering the key message, i.e., that conservation in distantly related Plasmodium species could indicate an important function. The identification of MSP2-like genes in avian Plasmodium species was highlighted previously in the referenced Escalante paper, so it is not entirely novel, although this paper goes into more detailed characterisation of the extent of conservation. Overall, this section takes up much more space in the manuscript than is merited by the novelty and significance of the findings.

      As outlined in point (1) for Reviewer 1 (Recommendations for the authors), we have cut back through this section and focussed on the important comparisons rather than the general observation. We have also moved the elements of Table 1 to Supplementary Figures 2, 3 and 4 to streamline the manuscript. Further description of the changes is available in the Reviewer #1 (Recommendations for the authors).

      (2) Characterisation of the knockout strains is generally thorough, though relatively few interactions were followed by live microscopy (Figures 3E-H). A minimum of 30 merozoites were followed in each assay (although the precise number is not specified in the figure or legend), but there are intriguing trends in the data that could potentially have become significant if n was increased.

      In the Figure 3 Legend we have now indicated the number of merozoite invasions followed as per the following:

      “(E-H) Key parameters of merozoite invasion were measured for both PfDd2 WT (n = 43) and PfDd2 ΔMSP2 (n = 35) parasites that had successfully invaded a RBC using live cell imaging of merozoite invasion.”

      We have also removed the more general description of ‘a minimum of 30 merozoites’ from the same Figure Legend.

      The number of schizont ruptures and subsequent merozoite invasions followed for each experiment is in line with previous studies that have investigated phenotypes with invasion inhibitors and gene knock-outs (e.g. Weiss et al. 2015, PLoS Pathogens). It is important to note that the data refers to merozoites that have completed invasion, and not just the number of merozoites that have been released from a schizont which is typically 2-4 times more than have invaded. This means we are comparing the kinetics of invasion across a relatively large sample size compared to other studies of inhibitory phenotypes. While it is possible that increasing the number of merozoites being filmed might lead to some statistical significance for some of the trends, we note that there is a limited growth phenotype overall in both short and long-term culture and this fits with the limited defect we are seeing. In order to better address this, as outlined in our response to point (7) for Reviewer 2 (Recommendations for the authors), we now discuss the trends seen in the data in additional detail.

      (3) The comparative RNAseq data is interesting, but is not followed up to any significant degree. Multiple transcripts are up-regulated in the absence of PfMSP2, but they are largely dismissed because they are genes of unknown function, not previously linked to invasion, or lack an obvious membrane anchor. Having gone to the lengths of exploring potentially compensatory changes in gene expression, it is disappointing not to validate or explore the hits that result.

      While we understand the reviewers comment, as outlined in the text we did not identify any upregulated proteins that looked like strong candidates to compensate for loss of MSP2 to explore in this manuscript. Instead, we chose to further investigate any potential loss of MSP2 phenotype that yielded the observations around improved potency of antibodies targeting some merozoite antigens with loss of MSP2. This will be explored in future studies as we try and understand the role of MSP2 in more detail and the interactions between proteins and antibodies on the merozoite surface.

      (4) Given the abundance of PfMSP2 on the merozoite surface, it would have been interesting to see whether the knockout lines have any noticeable difference in surface composition, as viewed by electron microscopy, although, of course, this experiment relies on access to the appropriate facilities.

      We agree with the reviewer, but this lies outside the scope of this manuscript and optimisation of the imaging platform used to gain biologically useful insights would take a considerable amount of work based on feedback from people working with these techniques.

      (5) One of the key findings is that deletion of PfMSP2 increases inhibition by some antibodies/nanobodies (some anti-CSS2, some anti-AMA1) but not others (anti-EBA/RH, anti-EBA175, anti-Rh5, anti-TRAMP, some anti-CSS2, some anti-AMA1). The data supporting these changes in inhibition are solid, but the selectivity of the effect (only a few antibodies, and generally those targeting later stages in invasion) is not really discussed in any detail. Do the authors have a hypothesis for this selectivity? The authors make attempts to explore the mechanisms for this antibody-masking (Figure 7), but the data is less solid. Surface Plasmon Resonance was non-conclusive, while an ELISA approach co-incubating MSP2 and anti-AMA1 antibodies to wells coated with AMA1 lacks appropriate controls (eg, including other merozoite proteins in similar experiments).

      As outlined in our response to point (7) for Reviewer 2 (Recommendations for the authors), we have repeated the ELISA based assessment of recombinant MSP2s impact on anti-AMA1 antibody binding. In addition, we have included two comparator control proteins, the intrinsically disordered MSP4 of P. falciparum and the globular domain of the neural cell adhesion molecule (NCAM, CD56, 16 kDa), and found these proteins did not impact binding of anti-AMA1 antibodies. This strengthens the data that links the presence of MSP2 to reduced activity of anti-AMA1 antibodies.

      As covered in our response to point (7) for Reviewer 2 (Recommendations for the authors) we provide additional discussion of this phenotype. We note that the list of inhibitory antibodies tested is not exhaustive, and additional antibodies may be identified where loss of MSP2 could improve potency. So although we see a consistent effect with a relatively small number of antibody targets, this does not rule out additional examples that may act earlier in invasion (for example, we noticed a small, but not statistically significant, trend for mildly inhibitory antibodies targeting MSP1-19 as well) and this makes speculating on why these two initial antibody targets at this time problematic.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) If feasible, perform ex vivo assays to demonstrate that the masking effect operates with physiologically relevant antibodies.

      For this manuscript, we focussed on characterising the MSP2 knock-out parasites using the best reagents available. We remain interested in understanding whether these lines can be used to investigate the activity of functional antibodies from malaria exposed human serum and this will be the subject of future studies.

      Reviewer #2 (Recommendations for the authors):

      (1) As noted in the Public Review, the section describing MSP2 orthologues in other Laverania and avian Plasmodium species is overly long and not the most novel section of the manuscript. It could be really radically trimmed back.

      We have taken this suggestion for the reviewer on board and have significantly cut back on our descriptions of the basic similarity properties of the conserved N and C-terminal regions as well as the description of the central variable region. Effectively, we have cut back the number of words through this section from 864 across 3 paragraphs to 478 across 2 paragraphs. While we have chosen to greatly economise our description of the N and C-terminal conserved regions, we have maintained much of the description of the similarities and differences in the central variable region as we believe the observation that this variant region still maintaining repeats, though they differ in size, number and amino acid composition, across such evolutionary distances is of interest.

      Taking the reviewers comment on board, we have also removed Table 1 from the manuscript (shows amino acid sequence properties of these regions) and instead have inserted the tables relevant for each alignment in Supplementary Figures 2, 3 and 4 as appropriate. This will streamline the main manuscript and better align amino acid property and alignment data in the one Figure. We thank the reviewer for this feedback and believe that this has helped focus the text on the most important observations.

      (2) Figure 2C - As MSP2 has stage-specific expression, it could be informative to incorporate an antibody targeting another gene with a similar stage-specific expression pattern, such as AMA,1 into the blot. This would confirm that both protein samples were collected at a similar point during blood stage development.

      We have modified Figure 2C to include both the original comparison using PfAldolase as the loading control and also the merozoite expressed PfGAP45 as a loading/stage specific control as per the Figure.

      (3) Figure 2D - Magenta and red are hard to distinguish in the merge channel. Is it possible to pseudocolour one of these channels a different colour? Also, it would be simpler to keep PfMSP2 a consistent colour in both rows.

      Thank you for this suggestion and we agree that the comparison could be made clearer. For this figure, we have coloured DAPI to label the nuclei (Cyan), and antibodies targeting PfMSP2 (Magenta), PfAMA1 and PfMSP1-19 (Yellow). This is also reflected in the merged image. The Figure legend now reads:

      “(D) Distribution of key merozoite surface proteins in the presence or absence of PfMSP2 was visualised by immunofluorescence. PfMSP2 (magenta), the nucleus stained by DAPI (cyan) and PfAMA1 (yellow, top two rows) or PfMSP1-19 (yellow, bottom two rows), and the coloured merge of the preceding panels. Scale bar = 0.7 µm. Representative images shown from a minimum of 10 schizonts imaged per condition.”

      (4) Figure 2F - Static growth relative to shaking growth is plotted in this panel; perhaps this could be more clearly described in the legend or mentioned in the text that there was not a significant alteration in growth in static or shaking conditions.

      As suggested, we have clarified the result in the Figure legend text as follows:

      “(E-F) Growth of Pf3D7 WT compared to Pf3D7 ΔMSP2 P. falciparum parasites, measured as fold increase in parasitaemia, over one (48 hrs) or two (96 hrs) cycles in either standard (still- (E)) or shaking (F) conditions, with no measurable difference between parasite growth rates seen between standard or shaking conditions.”

      Please also describe the shaking conditions used (i.e., speed, culture size, and vessel) in the methods.

      We have updated the methods to provide information on the growth conditions used in the standard versus shaking growth assays:

      “The initial parasitemia of cultures was determined by flow cytometry and then measured again after the 50 mL cultures in 96 well plates were maintained under standard (still) or shaking (50 rpm) conditions for 48 hrs or 96 hrs of growth.”

      (5) Figure 3G - Annotate legend for strength of deformation to describe what 1,2, or 3 refers to.

      We have added the following to the Figure legend of Figure 3G:

      “Deformation scores are as defined by Weiss et al (Weiss et al., 2015), with 1 = weak deformation of the RBC membrane at the point of contact, 2 = strong deformation leading to the RBC membrane extending up the sides of the merozoite and changes in RBC membrane curvature beyond the point of contact and 3 = extreme deformation indicated by the merozoite being deeply embedded in the RBC membrane and strong deformation of the RBC well beyond the point of contact.”

      There is a small visible shift in the deformation event scores. Is this also not significant? Even if deformation is not significantly longer, could this small effect alter the exposure of epitopes on other proteins for antibody targeting?

      We did test the deformation event scores and the differences were non-significant. We have considered this possibility raised by the reviewer, but we are cautious in over interpreting the possibility that these trends might contribute to the increased potency of certain antibodies in the absence of additional data. We note that, although deformation may happen over a slightly longer timescale and show more aggressive deformations with PfMSP2 knock-out, this also seems to translate into a weak trend for faster overall entry for those merozoites that go on to invade. Therefore, although deformation may be longer and stronger, antibodies may have less time to block invasion overall. We are not confident that we can interpret around what might be happening at the molecular scale here based on this data and have chosen not to discuss this possibility in the manuscript. However, we have added the following to the results to better explain the phenotype the phenotype we observed.

      “This analysis showed that, although there was a trend for PfDd2 ΔMSP2 knock-out parasites to have a higher mean time to attach to the RBC, as well as for the length and strength of RBC deformation, these trends did not reach significance. For those merozoites that did invade the RBC, on average it took less time for PfDd2 ΔMSP2 knock-out parasites to invade then PfDd2 WT, but this again did not reach significance (Figure 3 E-H). Together these data show PfMSP2 is not essential for blood-stage replication in vitro in two P. falciparum laboratory isolates from different geographical regions and knock-out of PfMSP2 does not seem to significantly impact parasite growth or merozoite invasion in vitro.”

      (6) Figure 4C - Legend refers to black lines, but on the figure, they are red? Is the horizontal red line in the correct place, or should some of the dots below it be black rather than blue if they fall outside the adjusted p-value significance cut-off? Were 4 schizont harvests performed in total, or 4 for each cell line?

      We thank the reviewer for pointing this out and we have now changed the text to say red lines. We have also provided more information in the Figure legend to more clearly define what data is represented. In short, 4 harvests were performed for each cell line (8 in total across the 2 cell lines) and the data represents the distribution from one of these harvests. The blue shaded genes are those that, on average, across the 4 Pf3D7 WT and Pf3D7 ΔMSP2 paired harvests show up or down-regulated expression. This is why some of the blue shaded genes lie near or below the cut-off values represented by the red line. The Figure legend text has now been modified as follows.

      “(C) Log2(fold change) for differentially expressed genes, including multigene families, between the transcriptome of Pf3D7 WT and Pf3D7 ΔMSP2 schizonts. Plot represents the results for one of four independent schizont RNA harvests for Pf3D7 WT and Pf3D7 ΔMSP2 parasites and red lines differentiate genes with a log2 (fold change) > 0.5 and < -0.5 with adjusted p-value < 0.01. Genes shaded blue represent those genes that were found to have an average log2 (fold change) > 0.5 (dark blue) or < -0.5 (light blue) across the four replicate samples compared. Significance determined as below p< 0.05 after correction for multiple testing.”

      (7) Figure 7D - ELISA results don't show a convincing concentration-dependent inhibition, and repeating with another recombinant protein is essential before inferring that the effect is specific to PfMSP2

      We have repeated the ELISA experiment using recombinant PfMSP2 to reduce variability across the assay and again found a dose dependent reduction of anti-PfAMA1 binding with increasing concentrations of recombinant PfMSP2. It should be noted that this is a completely new set of experiments that recapitulate the original findings. See updated Figure 7D.

      We agree with the reviewer that the experiment and interpretation of the data would be strengthened by comparing any potential inhibitory impact on anti-PfAMA1 binding to a different recombinant protein. Therefore, we have completed identical experiments using the similarly intrinsically disordered PfMSP4 recombinant protein (40 kDa) and the highly structured 16 kDa immunoglobulin domain of human neural cell adhesion molecule (NCAM). We find that there is no dose dependent loss of anti-PfMAMA1 binding to recombinant PfAMA1 with addition of PfMSP4 or NCAM immunoglobulin domain recombinant protein. These controls are contained in Supplementary Figure 6, the relevant text is provided below.

      ‘In contrast, increasing concentrations of the intrinsically disordered MSP4 from P. falciparum 3D7 (40 kDa) and the highly structured immunoglobulin domain of neural cell adhesion molecule (NCAM, CD56, 16 kDa) recombinant proteins did not impact on binding of anti-PfAMA1 antibodies to recombinant AMA1 (Supplementary Figure 6).’

      (8) Again, as noted in the public review, the target-specificity of the inhibition-masking effect is perhaps the most surprising aspect of the data - this could do with much more thorough discussion. Why only these proteins, both of which function late in invasion?

      Overall, we tested several growth inhibitory and non-inhibitory antibodies shown to bind specifically to individual or some combination of nine P. falciparum merozoite surface and secreted proteins. However, we do not consider this to be an exhaustive list of potentially invasion inhibitory antibodies by any means. We mostly did not observe any non-inhibitory antibodies becoming significantly more growth inhibitory to PfMSP2 KO lines, indicating that these antibodies were not impacted by loss of PfMSP2 or had no functional inhibitory effect in these assays.

      What we do demonstrate here is that we see a consistent impact with different rabbit, mouse monoclonal and i-body growth inhibitory antibodies targeting PfAMA1, indicating that it is not a spurious result from a single antibody or antibody type. We also find a second example, with nanobodies targeting the PfPCRCR complex protein PfCSS potentiated with loss of PfMSP2. This opens up the possibility that other growth inhibitory antibodies to the antigens tested here, or growth inhibitory antibodies targeting other antigens involved in merozoite invasion, may also become more potent with MSP2KO. Although both PfAMA1 and PfCSS function late in invasion, it is too early to say whether this is a functional trend or an observation that is related to the panel of antibodies tested. Therefore, further testing using lines developed in this study could yield additional examples of antibodies that become more inhibitory with MSP2 KO and provide additional information on the potential impact that MSP2 may have on their vaccine potential. In order to address this, we have added the following text to the discussion:

      “Here we show consistent potency improvement with PfMSP2 knock-out for growth inhibitory rabbit, mouse monoclonal and i-body antibodies targeting PfAMA1, as well as demonstrate improved activity for and Fc-tagged nanobody targeting PfCSS, indicating that these are not outlier results from a single antibody or antibody type. However, increased antibody potency was not shared across all antibodies tested, possibly because the specific function or localisation of a target protein, the region that an antibody binds to or the functional activity (or lack thereof) of an antibody may all play a role in determining whether loss of PfMSP2 can potentiate growth inhibitory activity. Further investigation using the parasite lines developed in this study and a wider panel of antibodies that target different stages of the merozoite invasion process could shed more light on this potentially novel mechanism of vaccine derived antibody efficacy.”

      (9) Typos/minor editorial points:

      L111 – conserved

      This text has been modified.

      L235-237 - check the wording in this sentence for clarity

      This text has been modified.

      Figure 3E - 'attachment' on axis

      This Figure has been modified.

      L350 - mentions eight 'proteins' having expression increase, instead 'transcripts' should be referred to when describing RNAseq data, as transcript levels may not correspond directly with protein levels. Also, be careful when referring to transcript or protein throughout this paragraph.

      This text has been modified.

      Figure 4A - instead of 'transcription during schizonts', better to say 'schizont transcript abundance'

      This text has been modified.

      L514 - 'detectable binding to PfAMA1'

      This text has been modified.

      L589 - Is it a mouse Fc region or a human Fc region that is added? The human Fc region is mentioned in the results.

      In the growth inhibition assays anti-AMA1 WD34 i-body with a human FC region was used and in the ELISA assays anti-AMA1 WD34 i-body with a mouse FC region (to enable detection of AMA1 binding use the same secondary anti-body for both the WD34 i-body and the 4G2 mouse monoclonal antibody) was used. The text has been been checked and modified accordingly to clearly say this.

      Supplementary figure 3 - 'repeats'

      This text has been modified.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors describe the generation of a Drosophila model of RVCL-S by disrupting the fly TREX1 ortholog cg3165 and by expressing human TREX1 transgenes (WT and the RVCL-S-associated V235Gfs variant). They evaluate organismal phenotypes using OCT-based cardiac imaging, climbing assays, and lifespan analysis. The authors show that loss of cg3165 compromises heart performance and locomotion, and that expression of human TREX1 partially rescues these phenotypes. They further report modest differences between WT and mutant hTREX1 under overexpression conditions. The study aims to establish Drosophila as an in vivo model for RVCL-S biology and future therapeutic testing.

      Strengths:

      (1) The manuscript addresses an understudied monogenic vascular disease where animal models are scarce.

      (2) The use of OCT imaging to quantify fly cardiac performance is technically strong and may be useful for broader applications.

      (3) The authors generated both cg3165 null mutants and humanized transgenes at a defined genomic landing site.

      (4) The study provided initial in vivo evidence that human TREX1 truncation variants can induce functional impairments in flies.

      Weaknesses:

      (1) Limited mechanistic insight.

      RVCL-S pathogenesis is strongly linked to mislocalization of truncated TREX1, DNA damage accumulation, and endothelial/podocyte cellular senescence. The current manuscript does not examine any cellular, molecular, or mechanistic readouts - e.g. DNA damage markers, TREX1 subcellular localization in fly tissues, oxidative stress, apoptosis, or senescence-related pathways. As a result, the model remains largely phenotypic and descriptive.

      We thank the reviewers for these suggestions. We are planning to perform experiments addressing the RVCL-S linked cellular deviations. We will examine DNA damage markers on cellular level and perform TUNEL tissue staining to visualize apoptosis, etc.

      To strengthen the impact, the authors should provide at least one mechanistic assay demonstrating that the humanized TREX1 variants induce expected molecular consequences in vivo.

      Yes, we are planning to demonstrate the distinct effects from TREX1 and TREX1 V235G expression on molecular level.

      (2) The distinction between WT and RVCL-S TREX1 variants is modest.

      In the cg3165 rescue experiments, the authors do not observe differences between hTREX1 and the V235Gfs variant (e.g., Figure 3A-B). Phenotypic differences only emerge under ubiquitous overexpression, raising two issues:

      i) It is unclear whether these differences reflect disease-relevant biology or artifacts of strong Act5C-driven expression.

      Thanks for pointing out this issue. We will discuss the differences between two expression models in the revised manuscript.

      ii) The authors conclude that the model captures RVCL-S pathogenicity, yet the data do not robustly separate WT from mutant TREX1 under physiological expression levels.

      We will provide more details related to the RVCL-S disease development and agerelated manifestations.

      The authors should clarify these limitations and consider additional data or explanations to support the claim that the model distinguishes WT vs RVCL-S variants.

      We will address the reviewer concerns and re-write the related manuscript sections to provide more clarity.

      (3) Heart phenotypes are presented as vascular defects without sufficient justification.

      RVCL-S is a small-vessel vasculopathy, but the Drosophila heart is a contractile tube without an endothelial lining. The authors refer to "vascular integrity restoration," but the Drosophila heart lacks vasculature.

      We will expand the model justification section and will be more careful with our statements to avoid misunderstanding of the experimental conclusions.

      The manuscript would benefit from careful wording and from a discussion of how the fly heart phenotypes relate to RVCL-S microvascular pathology.

      We thank the reviewer for pointing to this issue. Justifying Drosophila usage for human disease modelling is always challenging. We will re-write the corresponding parts of the manuscript.

      (4) General absence of tissue-level or cellular imaging.

      No images of fly hearts, brains, eyes, or other tissues are shown. TREX1 nuclear mislocalization is a hallmark of RVCL-S, yet no localization studies are included in this manuscript. Adding one or two imaging experiments demonstrating TREX1 localization or tissue pathology would greatly enhance confidence in the model.

      As suggested by the reviewers,we will add tissue imaging experiments to illustrate the pathological effects of RVCL linked TREX1 expression. We are also planning to utilize CRIMIC line CR70804 to visualize fly TREX1 tissue distribution.

      Reviewer #2 (Public review):

      Summary:

      The authors used the Drosophila heart tube to model Retinal vasculopathy with the goal of building a model that could be used to identify druggable targets and for testing chemical compounds that might target the disease. They generated flies expressing human TREX1 as well as a line expressing the V235G mutation that causes a C-terminal truncation that has been linked to the disease. In humans, this mutation is dominant. Heart tube function was monitored using OCM; the most robust change upon overexpression of wild-type or mutant TREX1was heart tube restriction, and this effect was similar for both forms of TREX1.

      Our results are consistent with the human disease nature, RVCL-S carriers and non-carriers are both healthy and asymptomatic at young age; however, the accumulation of physiological stress becomes obvious in midlife, leading to premature death in 40s and 50s. We will expand the discussion section focusing on RVCL-S manifestations in aged animals.

      Lifespan and climbing assays did show differential effects between wt and mutant forms when they were strongly and ubiquitously expressed by an actin-Gal4 driver. Unfortunately, these types of assays are less useful as drug screening tools. Their conclusion that the primary effect of TREX is on neuronal function is inferential and not directly supported by the data.

      We will revise this experiment discussion and plan to include additional experiments to strengthen the conclusions.

      The authors do not show that CG3165 is normally expressed in the heart. Further fly heart tube function was similarly restricted in response to expression of either wild-type or mutant TREX1. The fact that expression of any form of human TREX1 had deleterious effects on heart function suggests that TREX1 serves different roles in flies compared to humans. Thus, in the case of this gene, it may not be a useful model to use to identify targets or use it as a drug screening tool.

      We will examine the expression of cg3165, human TREX1 transgenes in whole organism to demonstrate tissue expression profiles, as noted above. We will also expand the relevant manuscript sections to address the systemic manifestations of RVCL.

      The significant effects on lifespan and climbing that did show differential effects required ubiquitous overexpression using an actin-gal4 driver that does not allow the identification of tissue-specific effects.

      We plan to carry out additional experiments to determine cg3165, and human TREX1 tissue expression profile.

      Thus, their assertion that the results suggested a strong positive correlation between Drosophila neuromotor regulation and transgenic hTREX1 presence and a negative impact from hTREX1 V235G" is not supported by these data.

      Thanks for pointing this out. We will revise our conclusions appropriately after we include the results from additional new experiments.

      Also worrisome was the inability to identify the mutant TREX1 protein by Western blot despite the enhanced expression levels suggested by qPCR analysis. Mutant TREX1 cannot exert a dominant effect on cell function if it isn't present.

      We will try to resolve this issue by technical means.

      There are also some technical problems. The lifespan assays lack important controls, and the climbing assays do not appear to have been performed correctly.

      We would disagree with this statement. We will re-write the method description for better clarity.

      It is unclear what the WT genetic background is in Figure 1-3, so it is unclear if the appropriate controls have been used. Finally, the lack of information on the specific statistical analyses used for each graph makes it difficult to judge the significance of the data.

      We will provide clearer descriptions of our controls and procedures.

      Overall, the current findings establish the Retinal vasculopathy disease model platform, but with only incremental new data and without any mechanistic insights.

      We will include additional experiments addressing the mechanism (see previous responses above).

      Reviewing Editor Comments:

      I (Hugo Bellen) also read your paper and noted that you do not document the expression pattern in the nervous system and other tissues, such as the heart. The stock https://flypush.research.bcm.edu/pscreen/crimic/info.php?CRname=CR70804 may help you do this and should allow you to compare the GAL4 induced expression of the stock you created and this stock. If compatible, you should consider reporting expression patterns.

      Thank you for the suggestion. We will obtain the line and will use it for expression visualization.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The authors appear to be excluding a significant fraction of the TCRlow gamma delta T cells from their analysis in Figure 1A. Since this population is generally enriched in CD25+ gamma delta T cells, this gating strategy could significantly impact their analysis due to the exclusion of progenitor gamma delta T cell populations.

      We were cautious in our gating strategy since the TCR𝛿+ CD3e+ subset is rather small and so low signal/background noise ratio can be an issue if the gates used are too broad/generous. There is some inevitable low level background staining with the TCR𝛿 that sits just above the bulk of the negative population and is CD3ε -ve. Although this background represents a tiny fraction of total cells, we were wary of gate contamination into our TCR𝛿+ CD3e<sup>+</sup> subset and we wanted a gating strategy that could be applied across other organs too. We do not, however, believe this conservative strategy is impacting on measurements progenitor numbers across strains or our conclusions, since the size of this progenitor population in the various IKKΔT<sup>CD2</sup> and Casp8ΔT<sup>CD2</sup> strains was never impacted by the mutations. But to reassure the reviewer, we show our conservative gate as compared with a very broad TCR𝛿 gate and see we are not missing a substantial population of CD25+ cells just below our gate. This also helps illustrate how close the background from the CD27<sup>int</sup> expressing αβ thymocytes (right column) comes to the TCR𝛿+ CD3+ gate and the importance of tight lineage gating.

      Author response image 1.

      (2) The overall phenotype of the IKKDeltaTCd2 mice is not described in any great detail. For example, it is not clear if these mice possess altered thymocyte or peripheral T cell populations beyond that of gamma delta T cells.

      Given that gamma delta T cell development has been demonstrated to be influenced by gamma delta T cells (i.e, trans-conditioning), this information could have aided in the interpretation of the data.

      Apologies for not being clearer on this point. We have studied conventional αβ T cell development in these strains in considerable detail, and these studies are published and discussed in some detail in the introduction in paragraph 3 on page 3-4 and in cited references Schmidt-Supprian et al 2004, SIlva et al 2014, Xing et al 2016, Webb et al 2019, Carty et al 2023. These detail how IKK expression is critical for thymic development of αβ T cells and their peripheral survival, and dissects the role of NF-κB activation and cell death regulation by IKK. However, we now add new discussion (page 11-12) that considers the potential impact of altered αβ T cell development in the strains used for this study.

      We agree that trans-conditioning is also an important consideration, since CD4 TH17 T cells can enhance type 17 𝛾𝛿 T cell development (10.1038/icb.2011.50). This is of relevance to the limited conclusions we draw concerning type 17 𝛾𝛿 T cells. The REL and IKK deficient strains do lack effector populations, including type 17 αβ T cells, so it is possible that the absence of type 17 αβ T cells in these strains does contribute to the modest impact of IKK deletion in the type 17 𝛾𝛿 subset. We now highlight this information and discuss in the manuscript (page 11-12).

      Related to this, it would have been helpful if the authors provided a comparison of the frequencies of each of the relevant subsets, in addition to the numbers.

      We now provide both the absolute frequencies of different 𝛾𝛿 subsets and their relative frequencies to one another, as supplementary figure 2. We still believe assessing absolute numbers is the gold standard, since the differential impact of gene deletions on the αβ T cell compartments in different strains will effect whether or not αβ T cells are present, and therefore overall representation of 𝛾𝛿 T cells can vary considerably between strains. Hence, absolute numbers are more reliable measure of cell abundance.

      (3) The manner in which the peripheral gamma delta T cell compartment was analyzed is somewhat unclear. The authors appear to have assessed both spleen and lymph node separately. The authors show representative data from only one of these organs (usually the lymph node) and show one analysis of peripheral gamma delta T cell numbers, where they appear to have summed up the individual spleen and lymph node gamma delta T cell counts. Since gamma deltaT17 and gamma deltaT1 are distributed somewhat differently in these compartments (lymph node is enriched in gamma deltaT17, while spleen is enriched in gamma deltaT1), combining these data does not seem warranted. The authors should have provided representative plots for both organs and calculated and analyzed the gamma delta T cell numbers for both organs separately in each of these analyses.

      We did of course process and calculate numbers of different subsets in both lymph nodes and spleen. Where we saw loss of peripheral 𝛾𝛿 subsets, or rescue, this was reflected in seperate analysis of both organs and we did not see any organs specific effects in the mouse strains analysed. We therefore took the initial view that presenting aggregate data was most efficient and least repetitive representation of data. However, we very much recognise the reviewers concern, and interest to see these data, so have now included representative plots across both organs for figure 1D, and show cell numbers of lymph nodes and spleen separately, as well as together, for figures 1, 2, 4 and 7, and these plots reflect the differences observed when we combined data. We did not break down the data for all figures (e.g. figures 3 and 5) as it was more cumbersome for more complex multi-strain comparisons and so attempt to balance clarity and transparency against unnecessary repetitive data presentation.

      (4) The authors make extensive use of surrogate markers in their analysis. While the markers that they choose are widely used, there is a possibility that the expression of some of these markers may be altered in some of their genetic mutants. This could skew their analysis and conclusions. A better approach would have been to employ either nuclear stains (Tbx21, RORgammaT) or intracellular cytokine staining to definitively identify functional gamma deltaT1 or gamma deltaT17 subsets.

      We did share a similar concern, but think this is not an issue where subsets disappear and are almost completely absent, such as in IKK1/2 KO and Casp8 KO settings. Where we saw rescue with RIPK1<sup>D138N</sup> in Casp8ΔT<sup>CD2</sup> strains, we were keen to demonstrate that the populations we saw restored did exhibit their expected function, and so confirmed this in figure 5C by intracellular cytokine staining after a short 4h restimulation in vitro. This also served to validate our gating strategy, since what we designated as Type 1 cells - CD27+CD122+CD44<sup>int</sup> cells were the only source of IFN-gamma, while CD27–CD44<sup>hi</sup> CD122<sup>lo</sup> cells were the only source of IL-17. Adaptive/ naive cells made neither cytokine. So while we did not include nuclear stains, we were satisfied that the cytokine assays validated the gating strategy.

      (5) The analysis and conclusion of the data in Figure 3A is not convincing. Because the data are graphed on log scale, the magnitude of the rescue by kinase dead RIPK1 appears somewhat overstated. A rough calculation suggests that in type 1 game delta T cells, there is ~ 99% decrease in gamma delta T cells in the Cre+WT strain and a ~90% decrease in the Cre+KD+ strain. Similarly, it looks as if the numbers for adaptive gamma delta T cells are a 95% decrease and an 85% decrease, respectively. Comparing these data to the data in Figure 5, which clearly show that kinase dead RIPK1 can completely rescue the Caspase 8 phenotype, the conclusion that gamma delta T cells require IKK activity to repress RIPK1-dependent pathways does not appear to be well-supported. In fact, the data seem more in line with a conclusion that IKK has a significant impact on gamma delta T cell survival in the periphery that cannot be fully explained by invoking Caspase8-dependent apoptosis or necroptosis. Indeed, while the authors seem to ultimately come to this latter conclusion in the Discussion, they clearly state in the Abstract that "IKK repression of RIPK1 is required for survival of peripheral but not thymic gamma delta T cells." Clarification of these conclusions and seeming inconsistencies would greatly strengthen the manuscript. With respect to the actual analysis in Figure 3A, it appears that the authors used a succession of non-parametric t-tests here without any correction. It may be helpful to determine if another analysis, such as ANOVA, may be more appropriate.

      Yes, we completely agree with this assessment and conclusion. While kinase dead RIPK1 does provide some rescue, this appears relatively modest, and instead supports the view, validated in figure 7, that maybe the dominant function of IKK in 𝛾𝛿 T cells is to activate NF-κB dependent survival signals. Nevertheless, RIPK1<sup>D138N</sup> does provide some significant rescue, which allows some peripheral cells to repopulate and demonstrates that IKK is repressing RIPK1 mediated cell death. It is actually not trivial to assess the relative importance of IKK-RIPK1 and IKK-NF-κB functions. In the IKKΔT<sup>CD2</sup> RIPK1<sup>D138N</sup> mice, we prevent RIPK1 induced death, but still lack the NF-κB-dependent survival signal. Consistent with this, the ~1log reduction in 𝛾𝛿 numbers between WT and IKKΔT<sup>CD2</sup> RIPK1<sup>D138N</sup> mice is actually similar to what we observe in the absence of REL subunits (Fig. 7) which is a smaller reduction than we observe in IKKΔT<sup>CD2</sup> mice. What would have been ideal is to have a scenario where IKK regulation of RIPK1 was defective but NF-κB survival signalling was intact. This would reveal the full impact of loosing IKK dependent regulation of RIPK1 alone, which we suspect would result in substantial cell death that could not be blocked by NF-κB. Unfortunately, we not have or know of suitable mouse mutants to test this. This is quite a nuanced discussion and we now clarify the scope and extent of conclusions we can draw (p. 7, 11).

      (6) The conclusion that the alternative pathway is redundant for the development and persistence of the major gamma delta T cell subsets is at odds with a previous report demonstrating that Relb is required for gamma delta T17 development (Powolny-Budnicka, I., et al., Immunity 34: 364-374, 2011). This paper also reported the involvement of RelA in gamma delta T17 development. The present manuscript would be greatly improved by the inclusion of a discussion of these results.

      Thank you - we include a discussion of these papers now (p12).

      (7) The data in Figures 1C and 3A are somewhat confusing in that while both are from the lymph nodes of IKKdeltaTCD2 mice, the data appear to be quite different (In Figure 3A, the frequency of gamma delta T cells increases and there is a near complete loss of the CD27+ subset. In Figure 1A, the frequency of gamma delta T cells is drastically decreased, and there is only a slight loss of the CD27+ subset.)

      Yes, we agree these do like quite different and could be confusing. The lymph nodes from IKKΔT<sup>CD2</sup> lack αβ T cells and B cells, and so the cellularity is much lower than normal. Consequently, the percentage representation of remaining cells can be more noisy, while total cellularity calculations are more consistent. This is not an issue in the other strains that all have more cells in lymph nodes. We now show plots from spleen of the same mice which appear better aligned with additional splenic data shown in Figure 1.

      Reviewer #2 (Public review):

      (1) All approaches used confer changes to the entire T cell compartment. Therefore, the authors are unable to resolve whether the observations are mediated by direct and/or indirect effects (e.g., disorganized lymphoid architecture impacting maintenance/survival/homing).

      We address this important point in the discussion (p11-12). The impacts of gene deletions upon αβ and 𝛾𝛿 T cells operate independently of one another (as also discussed in response to reviewer 1). For instance, the phenotype of αβ T cells is identical in IKKΔT<sup>CD2</sup> and IKKΔT<sup>CD4</sup> mice - 𝛾𝛿 T cells are only targeted in IKKΔT<sup>CD2</sup> mice. Similarly, the phenotype of 𝛾𝛿 T cells is similar in IKKΔT<sup>CD2</sup> vs Casp8.IKKΔT<sup>CD2</sup> strains. αβ T cells are absent from IKKΔT<sup>CD2</sup> but present in near normal numbers in Casp8.IKKΔT<sup>CD2</sup> mice. Others have also noted that 𝛾𝛿 T cell development is normal in Rag deficient mice (10.1126/science.1604321). In any case, an absence of αβ T cells is expected to promote 𝛾𝛿 T cell survival in the absence of competition for common utilised cytokines such as IL-7 and IL-15, though we do not see much evidence for this in mice with and without αβ T cells such as IKKΔT<sup>CD2</sup> vs Casp8. IKKΔT<sup>CD2</sup> strains. We do now discuss the potential contribution of trans-conditioning for type 17 𝛾𝛿 T cell development (p12).

      (2) Assessment of factors that impact T cell numbers in the periphery is necessary. Are there observable changes to the proliferation, survival, and migration of gd T cell subsets?

      In IKKΔT<sup>CD2</sup> and Casp8. IKKΔT<sup>CD2</sup> deficient strains, we infer a defect in survival, since they lack peripheral 𝛾𝛿 T cells, despite normal thymic development. Their absence made it hard to assess proliferation and migration, though 𝛾𝛿 T cells were absent from all lymphoid organs. The conclusions that defective survival is responsible for the absence of 𝛾𝛿 T cells in the different strains is also supported by the rescue of IKKΔT<sup>CD2</sup> and Casp8ΔT<sup>CD2</sup> strains by kinase dead RIPK1D138N. Furthermore, the presence of small numbers of residual populations in lymph nodes and spleen of IKKΔT<sup>CD2</sup> and Casp8ΔT<sup>CD2</sup> strains demonstrates that migration patterns were normal. Were cells unable to recirculate, they might be expected to fail to leave the thymus, or to accumulate in the spleen. We so no evidence of either of these scenarios.

      (3) TCRd chain usage, especially among type 3 gd T cells, should be assessed.

      We did not unfortunately, assess chain usage, choosing rather to rely of phenotypic identity of specific subsets, which we show in figure 5C, was extremely robust. IL-17 was only secreted by CD27– CD44<sup>hi</sup> 𝛾𝛿 T cells, while IFN-gamma was only secreted by CD27+ CD44<sup>hi</sup> 𝛾𝛿 T cells. We argue that the production of these key effector cytokines is the most direct test of a subsets functional identity and the phenotypic designation is robust.

      (4) The functional consequences of IKK signaling on gd T cells were largely unaddressed. Cytokine analyses were performed only in the RIPK1D138N Casp8∆TCD2 model, leaving open the question of how canonical NF-κB-dependent signaling impacts the long-term functionality of gd T cells.

      Yes, we agree this remains an open question around the transcriptional mechanisms by which NFκB signalling promotes cell survival, and one best addressed in future studies. We did not perform cytokine staining more widely, because the cytokine assay relies on short term re-stimulation of T cells with PMA and ionomycin. PMA activates PKC which in turn activates NF-κB signalling to elicit the cytokine response measured in this assay. As such, the results of such assays would be hard to interpret. We agree it would be interesting to investigate the functional consequences of REL deficiency in future studies, although this may need a more nuanced setting where 𝛾𝛿 T cells are not lost as a result of their defective survival.

      (5) The authors suggest that Caspase 8 is required for the development and maintenance of type 3 gd T cells. While the authors discussed the limitations of assessing adult mice in interpreting the data, it seems like a relatively straightforward experiment to perform.

      We did attempt these experiments with collaborators by analysing type 17 𝛾𝛿 T cell development in fetal thymic organ culture (FTOC). However, the GM mice are not so easy to breed and generating the large numbers of embryos required to set up the FTOCs proved too challenging and we were unable to generate these data.

      (6) While analyses of Casp8∆TCD2 RIPK1D138N mice suggest that loss of adaptive and type 1 gamma delta T cells in Casp8∆TCD2 animals is due to necroptosis, the contribution of RIPK3 kinase activity remains unexamined. RIPK3 activity determines whether cells die via necroptosis or apoptosis in RIPK1/Caspase8-dependent signaling, and inclusion of this analysis would strengthen mechanistic insights.

      Given time and resources, it would have been ideal to confirm necroptotic cell death by alternative knockouts, such as RIPK3 or MLKL. However, formation of the necrosome is dependent on kinase active RIPK1, since autophosphorylation of RIPK1 changes its conformation to allow recruitment of RIPK3 and MLKL and formation of the necrosome. Therefore, the rescue of CASPASE8 deficient T cells from cell death by kinase dead RIPK1 is very solid genetic evidence of necroptosis.

      (7) Canonical NF-κB signaling through cRel alone was not evaluated, leaving a gap in the understanding of transcriptional pathways required for gd T cell subsets.

      This was assessed in p105/RelA knockout strain, which only express cREL. What we lacked was an assessment of what RelA/p50 dimers can support in the absence of cREL. We do however, show the impact of RelA single deficiency, and RelA/p50 deficiency.

      In truth, we had many REL deficient strains and it was challenging to make all the combinations we wanted. However, we try to compensate for this by discussing what cREL:cREL dimers and cREL:P50 dimers are capable of doing by analysing 𝛾𝛿 T cell development in p105/RELA DKO and RELA KO mice - these do show that cREL:P50 can compensate in the absence of RELA, but cREL:cREL cannot.

      Reviewer #3 (Public review):

      Weaknesses:

      The paper would benefit greatly from a graphical abstract that could summarize the key findings, making the key findings accessible to the general immunology or biochemistry reader. Ideally, this graphic would distinguish the requirements for NF-κB signals sustaining thymic γδ T cell differentiation from peripheral maintenance, taking into account the various subsets and signaling pathways required. In addition, the authors should consider adding further literature comparing the requirements for NF-κB /necroptosis pathways in regulating other non-conventional T cell populations, such as iNKT, MAIT, or FOXP3+ Treg cells. These data might help position the requirements described here for γδ T cells compared to other subsets, with respect to homeostatic cues and transcriptional states.

      Thank you - we have added such discussions. We are happy to add a graphical abstract if journal constraints permit this.

      Last and least, there are multiple grammatical errors throughout the manuscript, and it would benefit from further editing. Likewise, there are some minor errors in figures (e.g., Figure 3A, add percentage for plot from IKKDT.RIPK1D138N mouse; Figure 7, “Adative").

      Thank you !

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The central pair apparatus of motile cilia consists of two singlet microtubules, termed C1 and C2, each of which is associated with a set of projections, referred to as the C1 and C2 projections. Each projection comprises multiple distinct structural domains, designated a, b, c, and so on. Biochemical studies combined with genetic analyses in Chlamydomonas identified three proteins as the major components of the C2a projection, and subsequent cryo-EM studies confirmed these findings.

      In this paper, the authors aim to study the homologues of these three proteins-CCDC108/CFAP65, CFAP70, and MYCBPAP/CFAP147-using knockout mouse models. Biochemical and cell biological analyses demonstrate that, as in Chlamydomonas, these proteins are components of the C2 projection and form a complex that depends on the presence of each other. In addition, the authors use affinity purification to identify two previously uncharacterized proteins and show that they are central pair apparatus proteins that associate with the aforementioned complex. Knockout mice lacking any of the three core proteins exhibit phenotypes consistent with primary ciliary dyskinesia (PCD).

      Overall, the manuscript is clearly written, and the data are convincing and support the authors' conclusions. However, given the previous findings in Chlamydomonas, this work provides limited conceptual advances to the field. Nonetheless, it represents a useful and well-documented resource for understanding the conserved organization of the central pair apparatus in motile cilia. It will be of interest to cell and developmental biologists, biochemists, and clinicians studying and treating human ciliopathies.

      We thank the reviewer for their positive comments on our work.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the protein composition and functional role of the C2a projection of the central apparatus (CA) in vertebrate motile cilia. Using three knockout mouse models (Ccdc108, Mycbpap, and Cfap70), the authors demonstrate that these genes - homologs of Chlamydomonas FAP65, FAP147, and FAP70 - are required for normal motile cilia function in ependymal and tracheal multiciliated cells. Specifically, the authors show that:

      (1) Knockout mice for each gene exhibit primary ciliary dyskinesia phenotypes (hydrocephalus and sinusitis), accompanied by abnormal ciliary motion and reduced ciliary beat frequency. 

      (2) CCDC108, MYCBPAP, and CFAP70 physically interact and localize to the axonemal central lumen, consistent with the C2a projection. 

      (3) Loss of any one of these proteins destabilizes the others and disrupts CA integrity in a tissue-specific manner. 

      (4) ARMC3 and MYCBP are C2a-associated proteins. 

      Strengths:

      (1) Clarity: the results are presented in a coherent sequence that facilitates understanding of both the rationale and conclusions. 

      (2) Genetic rigor: three independent knockout mouse lines that exhibit consistent motile cilia phenotypes provide in vivo support for the proposed role of these proteins. 

      (3) Integration of structural and functional analyses: combination of ultrastructural (TEM) and immunofluorescence data with CBF measurements provides convincing correlation between structural defects and impaired ciliary function. 

      (4) Mutual dependency model: reciprocal destabilization of CCDC108, MYCBPAP, and CFAP70 supports their interdependence in the C2a assembly. 

      (5) Expansion of the vertebrate C2a proteome: the identification of ARMC3 and MYCBP as C2a-associated proteins provides a foundation for future mechanistic studies. 

      We appreciate our reviewer's positive comments.

      Weaknesses:

      (1) Mechanistic depth: the data show a convincing correlation between C2a and ciliary function, but the cell type-specificity of CCDC108, MYCBPAP, and CFAP70 knockout effects is underdeveloped. This is an interesting observation that raises mechanistic/structural questions not addressed in the study, such as what is the role of C2a in CP nucleation, maintenance, or mechanical stabilization? Is C2a composition different in different cell types? 

      We agree with our reviewer and value their insightful comments. Indeed, CP-MT defects, including the loss of one or both CP-MTs, were only observed in a subset of mouse ependymal cells (mEPCs) at day 10 post-serum starvation, and were rare in tracheal multiciliated cells, although the C2a projections were severely damaged in these tracheal cells. Based on these observations, we hypothesize that the loss of CP-MTs is probably a secondary effect caused by mechanical stress during ciliary movement. To investigate the role of C2a in CP-MT nucleation, maintenance, or mechanical stabilization, we plan to examine the axoneme structures of mEPCs at day 5 post-serum starvation using TEM. By comparing axoneme defects in these cells at days 5 and 10, we hope to gain insights into this question. Based on our findings and previous findings in Chlamydomonas, we speculate that the core components (CCDC108/FAP65, MYCBPAP/FAP147, and CFAP70/FAP70) of the C2a projection are highly conserved across species, but the peripheral associated C2a proteins may vary among different cell types. Therefore, we will perform co-immunoprecipitation using mEPCs and mouse tracheal epithelial cells to investigate potential cell-type-specific differences and expand the related discussion.

      (2) Cell model choice: co-immunoprecipitation was performed using mouse testis lysates. While this is a reasonable source of CA proteins from flagellated cells, the functional analyses in this study focus on ependymal and tracheal multiciliated cells. It would therefore be helpful for the authors to clarify the extent to which these interactions are expected to be conserved across ciliated cell types, and to discuss potential tissue-specific differences in CA assembly.

      We appreciate our reviewer's insightful comments. We will follow their suggestion and perform co-immunoprecipitation using mEPCs and mouse tracheal epithelial cells to investigate potential cell-type-specific differences and expand the related discussion.

      (3) Statistical analysis: the manuscript states "Statistical significance was defined as P < 0.5", which is likely a typo, but should be P < 0.05. In general, the statistical methods require more clarification. In several figures (e.g., 2B, 2D, 5J, 5K), multiple knockout genotypes are compared with WT, yet unpaired t-tests are reported. When more than two groups are analyzed, multiple pairwise t-tests inflate Type I error unless appropriately corrected; a one-way ANOVA with post hoc comparisons (e.g., Dunnett's test for WT-referenced comparisons) would be more appropriate. Furthermore, the analysis of ciliary movement modes (Figure 2D) involves categorical data, for which a t-test is not statistically appropriate. These comparisons could instead be evaluated using chi-square or Fisher's exact tests. Addressing these issues is important to ensure accurate statistical inference.

      We thank our reviewer for pointing out these errors. We will double-check our statistical results and perform new analyses following their suggestion.

      (4) Methods section: does not sufficiently describe how image-based quantifications were performed. For example, the criteria used to define cilia number, basal body number, and rotational beating are not specified, nor is how CBF measurements were analyzed. The authors should also provide details regarding analysis software and imaging parameters used (and whether they were kept constant across genotypes). 

      We apologize for overlooking these method details. We will expand the relevant method section to include this information.

    1. Author response:

      We thank the reviewer for the thoughtful and constructive evaluation of our work and for recognizing its potential interest to researchers working on cardiac development and regeneration. We are planning to address the specific concerns as noted by the reviewers in the following way:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important question in cardiac biology: whether distinct cardiomyocyte (CM) subpopulations play specialized roles during heart development and regeneration. Using single-cell RNA sequencing and newly generated genetic tools, the authors identify phlda2 as a specific marker of primordial cardiomyocytes in the adult zebrafish heart. They further show that these primordial CMs function are essential for myocardial morphogenesis and coronary vascularization but are dispensable for myocardial regeneration or revascularization after injury. These findings indicate that heart regeneration doesn't simply recapitulate developmental processes.

      Strengths:

      A major strength of the study is the generation of a phlda2 BAC reporter, which provides a specific and reliable marker for primordial cardiomyocytes. The lack of genetic tools has previously limited functional analysis of this CM population. By using phlda2 regulatory elements to generate reporter and NTR-based ablation lines, the authors can visualize and selectively manipulate primordial CMs in vivo. This enables a direct functional interrogation rather than relying on lineage tracing or correlative evidence. Through genetic ablation, the authors convincingly demonstrate that primordial CMs are essential for myocardial morphogenesis and coronary vascular organization during development but are not necessary for heart regeneration.

      Weaknesses:

      (1) The manuscript would benefit from clarifying whether the primordial cardiomyocytes ablation affects epicardial cell behaviors during heart development, given that the well-established role of the epicardium in supporting coronary vessel growth, it is possible that the vascular phenotypes observed after primordial CM ablation may be affected, at least in part, by altered epicardial cells.

      We thank the reviewer for this thoughtful comment and agree that primordial cardiomyocyte ablation may indirectly affect coronary vessel growth through changes in epicardial cell behavior. Therefore, we will perform additional analyses to examine epicardial cell behaviors, including epicardial coverage and migration following primordial cardiomyocyte ablation using the established epicardial reporter line tcf21:nucEGFP during heart development.

      (2) Because primordial cardiomyocytes form a dense, single-cell-thick layer covering the ventricular surface, it would be informative to determine whether their loss alters the spatial distribution or inward migration of coronary endothelial cells or epicardial cells.

      We thank the reviewer for this important comment. We will analyze the spatial distribution and inward migration of coronary endothelial and epicardial cells after primordial cardiomyocyte ablation using high-resolution imaging and quantitative analysis

      (3) The manuscript carefully examines the relationship between primordial CMs and gata4⁺ cardiomyocytes during regeneration. However, their relationship during heart development should be more fully addressed.

      We appreciate the suggestion and will carefully investigate the relationship between primordial cardiomyocytes and gata4<sup>+</sup> cardiomyocytes during heart development.

      (4) As loss of cardiomyocytes is known to induce gata4:GFP activation during regeneration, it would be important to determine whether ablation of primordial cardiomyocytes alone triggers gata4:GFP expression in neighboring cardiomyocytes. This analysis would further support the conclusion that primordial cardiomyocytes are not required for regenerative responses.

      We acknowledge the reviewer’s comments and will test whether primordial cardiomyocyte ablation induces gata4:GFP activation in neighboring cardiomyocytes in the adult heart.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript "Primordial Cardiomyocytes orchestrate myocardial morphogenesis and vascularization but are dispensable for regeneration", Sun et al. identify a novel marker of primordial cardiomyocytes and use it to visualize and ablate the population during development and regeneration. The role of the primordial layer has not been investigated because the tools to manipulate this population have not existed. The manuscript is straightforward, easy to understand, and addresses an important question that has not been explored.

      While the manuscript provides important insights into the role of primordial CMs, backed by a convincing methodology, the authors should clarify their requirements for heart development and maturation. Specifically, is the primordial layer required for the fish to survive?

      We thank the reviewer for this important question. We will examine the survival of fish following primordial cardiomyocyte ablation during development.

      Do primordial CMs regenerate when ablated during development, and do the defects observed (in trabecular and compact CMs and coronary vessels) resolve after 10 days post-treatment when they were detected?

      We thank the reviewer for this valuable comment. We will perform additional analyses to determine whether primordial cardiomyocytes regenerate after ablation during development and to assess the extent and dynamics of their recovery. We will also evaluate whether the defects in trabecular and compact myocardium and coronary vasculature persist or resolve in adult hearts following primordial cardiomyocyte ablation during development.

      Reviewer #3 (Public review):

      Summary:

      The authors performed single-cell RNA sequencing of adult zebrafish hearts and identified markers for distinct cardiomyocyte subpopulations. One marker, phlda2, marks primordial cardiomyocytes. They generated transgenic reporter lines to characterize phlda2 expression patterns and a phlda2-NTR ablation line to determine the functional requirement of primordial cardiomyocytes during heart regeneration. They found that phlda2+ primordial cardiomyocytes are essential for myocardial morphogenesis and coronary vessel development. Interestingly, when phlda2+ primordial cardiomyocytes are ablated during heart regeneration, gata4+ cortical cardiomyocytes, coronary vessel revascularization, and scar tissue formation are not affected.

      Strengths:

      The authors identified a new primordial cardiomyocyte marker, phlda2. They further demonstrated that primordial cardiomyocytes are important for heart morphogenesis but dispensable for heart regeneration. Their findings reveal a potential difference between heart development and regeneration programs.

      Weakness:

      Despite the interesting findings, the authors did not provide supplemental data for their scRNAseq to demonstrate the data quality and support their conclusions, and some results are not well described.

      We appreciate the reviewer’s comment. We will include supplemental data to demonstrate the quality of our single-cell RNA sequencing. Additionally, we will provide more detailed descriptions of the key results in the main text and figure legends to clearly support our conclusions regarding primordial cardiomyocytes and their roles in heart morphogenesis and regeneration.

    1. Author response:

      Public reviews:

      Reviewer #1 (Public review):

      In the manuscript entitled "Flexible and high-throughput simultaneous profiling of gene expression and chromatin accessibility in single cells," Soltys and colleagues present easySHARE-seq, a method described as an improvement upon SHARE-seq for the simultaneous measurement of RNA transcripts and chromatin accessibility.

      The authors demonstrate the utility of easySHARE-seq by profiling approximately 20,000 nuclei from the murine liver, successfully annotating cell types and linking cis-regulatory elements to target genes. The authors claim that easySHARE-seq supports longer read lengths potentially enabling better variant discovery or allele-specific signal assessment, though they do not provide direct evidence to support these specific claims.

      A key strength of the protocol is enhanced sequencing efficiency, achieved by shortening the Index 1 read from 99 to 17 nucleotides. This reduction does not come at a significant cost to barcode diversity, retaining approximately 3.5 million combinations. Additionally, the approach allows for the sequencing of a sub-library to assess quality prior to final barcoding and sequencing which seems quite clever.

      While the increase in RNA transcript recovery is substantial, it appears to come at a cost: there is a notable decrease in ATAC fragments per cell compared to the original SHARE-seq (and other platforms). Likely as a result, the dimensionality reduction (UMAP) shows good resolution for RNA profiles but relatively poor resolution for accessibility profiles. Furthermore, the presented data suggests potential ambient RNA contamination; specifically, the detection of Albumin in HSCs and B cells is likely an artifact of the protocol rather than a biological signal.

      Overall, the study is well-presented and represents a promising advance. However, there are significant shortcomings that should be addressed, particularly regarding "leaky" transcript recovery and reduced ATAC performance.

      Recommendations:

      (1) To provide a comprehensive view of the current field, the authors should include Scale Biosciences (Scale Bio) in their discussion of available commercial platforms.

      (2) A head-to-head comparison with the 10x Genomics Multiome platform would be of significant interest to the single-cell genomics community and would better contextualize the performance of easySHARE-seq.

      (3) Optimizing ATAC Performance: I strongly suggest exploring methods to improve ATAC sensitivity. As the authors note, the improvement in RNA recovery may result from fewer processing steps and stronger fixation. It would be valuable to test if decreasing fixation back to 2% (as in the original SHARE-seq) recovers ATAC data quality, and to determine if the fixation level or the number of steps is the key variable in preserving transcripts.

      (4) The authors allude to the possibility of scaling this assay using a barcoded poly(T). Explicit inclusion or demonstration of this capability would dramatically increase interest in this protocol. Perhaps ATAC could be scaled using a barcoded Tn5?

      (5) The number of HSCs and B cells expressing Albumin is problematic and suggests significant ambient RNA issues that need to be addressed or computationally corrected.

      We thank reviewer #1 for his comments and critique. We will include a direct comparison of easySHARE-seq with the 10x Multiome platform by adding this comparison to Fig. 1 E&F and more directly point to Table 1 as a comparison of overall assay possibilities. We will also more explicitly state and describe the possibilities and limitations of how to scale this assay up. We also thank the reviewer for raising the possible issue of ambient RNA contamination. We aim to quantify ambient RNA contamination and explore its impact as well as possibilities to correct for it if needed. Unfortunately, external circumstances make it difficult to perform further wetlab experiments in order to optimize ATAC-seq performance. We will thus update our discussion to include possibilities on how to improve ATAC-seq data quality.

      Reviewer #2 (Public review):

      Aims:

      The authors sought to optimize SHARE-seq, a multimodal single-cell method, to improve the simultaneous profiling of gene expression and chromatin accessibility. Their goal was to enhance barcode design for better sequencing efficiency and cost savings, while improving overall data quality. They then applied their optimized method, easySHARE-seq, to study liver sinusoidal endothelial cells (LSECs) to demonstrate its utility in examining gene regulation and spatial zonation.

      Strengths:

      The improved barcode design is an advance, increasing the proportion of sequencing reads dedicated to biological information rather than barcode identification. This modification offers practical benefits in terms of sequencing costs and read length, potentially reducing alignment errors. The method also demonstrates improved RNA detection compared to the original SHARE-seq protocol. The biological applications showcase how simultaneous measurement of both modalities enables analyses that would be practically impossible with single-modality approaches, particularly in examining how chromatin states change along developmental or spatial trajectories.

      Weaknesses:

      There is a notable reduction in chromatin accessibility detection compared to the original SHARE-seq method, likely limiting the broad use of the method. While the authors are transparent about this tradeoff, additional discussion would be helpful regarding how this affects data interpretation. Comparisons showing consistency between easySHARE-seq and SHARE-seq chromatin accessibility patterns at the single-cell level would strengthen confidence in the method.

      We thank reviewer #2 for his comments and great suggestions for further analyses. We will emphasize ATAC-seq data quality issues further in our discussions and more explicitly discuss the resulting implications and shortcomings. We agree with reviewer #2 that this dataset allows exploration of enhancer logic. We aim to incorporate the suggested analyses regarding RNA-ATAC correlations, expand our exploration of enhancer biology and include these results in our revisions. We will also improve clarity of our zonation analysis procedure.

      Overall:

      The authors achieve their aim of creating an optimized protocol with improved barcode design and enhanced RNA detection. The method represents a useful advance for specific experimental contexts where the tradeoffs are appropriate.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the ecological interactions between wild plants and specialized herbivorous insects, structural innovation-based diversification of secondary metabolites often occurs. In this study, Agrawal et al. utilized two milkweed species (Asclepias curassavica and Asclepias incarnata) and the specialist Monarch butterfly (Danaus plexippus) as a model system to investigate the effects of two N,S-cardenolides - formed through structural diversification and innovation in A. curassavica-on the growth, feeding, and chemical sequestration of D. plexippus, compared to other conventional cardenolides. Additionally, the study examined how cardenolide diversification resulting from the formation of N,S-cardenolides influences the growth and sequestration of D. plexippus. On this basis, the research elucidates the ecophysiological impact of toxin diversity in wild plants on the detoxification and transport mechanisms of highly adapted herbivores.

      Strengths:

      The study is characterized by the use of milkweed plants and the specialist Monarch butterfly, which represent a well-established model in chemical ecology research. On one hand, these two organisms have undergone extensive co-evolutionary interactions; on the other hand, the butterfly has developed a remarkable capacity for toxin sequestration. The authors, building upon their substantial prior research in this field and earlier observations of structural evolutionary innovation in cardenolides in A. curassavica, proposed two novel ecological hypotheses. While experimentally validating these hypotheses, they introduced the intriguing concept of a "non-additive diversity effect" of trace plant secondary metabolites when mixed, contrasting with traditional synergistic perspectives, in their impact on herbivores.

      Weaknesses:

      The manuscript has two main weaknesses. First, as a study reliant on the control of compound concentrations, the authors did not provide sufficient or persuasive justification for their selection of the natural proportions (and concentrations) of cardenolides. The ratios of these compounds likely vary significantly across different environmental conditions, developmental stages, pre- and post-herbivory, and different plant tissues. The ecological relevance of the "natural proportions" emphasized by the authors remains questionable. Furthermore, the same compound may even exert different effects on herbivorous insects at different concentrations. The authors should address this issue in detail within the Introduction, Methods, or Discussion sections.

      Second, the study was conducted using leaf discs in an in vitro setting, which may not accurately reflect the responses of Monarch butterflies on living plants. This limitation undermines the foundation for the novel ecological theory proposed by the authors. If the observed phenomena could be validated using specifically engineered plant lines-such as those created through gene editing, knockdown, or overexpression of key enzymes involved in the synthesis of specific N,S-cardenolides - the findings would be substantially more compelling.

      Reviewer #2 (Public review):

      This study examined the effects of several cardenolides, including N,S-ring containing variants, on sequestration and performance metrics in monarch larvae. The authors confirm that some cardenolides, which are toxic to non-adapted herbivores, are sequestered by monarchs and enhance performance. Interestingly, N,S-ring-containing cardenolides did not have the same effects and were poorly sequestered, with minimal recovery in frass, suggesting an alternate detoxification or metabolic strategy. These N,S-containing compounds are also known to be less potent defences against non-adapted herbivores. The authors further report that mixtures of cardenolides reduce herbivore performance and sequestration compared to single compounds, highlighting the important role of phytochemical diversity in shaping plant-herbivore interactions.

      Overall, this study is clearly written, well-conducted and has the potential to make a valuable contribution to the field. However, I have one major concern regarding the interpretations of the mixture results. From what I understand of the methods, all tested mixtures contain all five compounds. As such, it is not possible to determine whether reduced performance and sequestration result from the complete mixture or from the presence of a single compound, such as voruscharin for performance and uscharin for sequestration. For instance, if all compounds except voruscharin (or uscharin) were combined, would the same pattern emerge? I suspect not, since the effects of the individual N,S-containing compounds alone are generally similar to those of the full mixture (Figure S3). By taking the average of all single compounds, the individual effects of the N,S-containing ones are being inflated by the non-N,S-containing ones (in the main text, Figure 4). In the mix, of course, they are not being 'diluted', as they are always present. This interpretation is further supported by the fact that in the equimolar mix, the relative proportion of voruscharin decreases (from 50% in the 'real mix'), and the target measurements of performance and sequestration tend to increase in the equimolar mix compared to the real mix.

      Despite this issue, the discussion of mixtures in the context of plant defence against both adapted and non-adapted herbivores is fascinating and convincing. The rationale that mixtures may serve as a chemical tool-kit that targets different sets of herbivores is compelling. The non-N,S cardenolides are effective against non-adapted herbivores and the N,S-containing cardenolides are effective against adapted herbivores. However, the current experiments focus exclusively on an adapted species. It would be especially interesting to test whether such mixtures reduce overall herbivory when both adapted and non-adapted species are present.

      It remains possible that mixtures, even in the absence of voruscharin or uscharin, genuinely reduce sequestration or performance; however, this would need to be tested directly to address the abovementioned concern.

      Thanks for these insightful reviews and your summary assessment. We certainly agree that ours was a laboratory study with a single specialized insect, and both mixtures types had all five compounds (controlling for total toxin concentration). Thus, our conclusion that combined effects of naturally occurring toxins (within the cardenolide class) have non-additive effects for the specialized sequestering monarch are constrained by our experimental conditions. In our assay we used two mixture types, equimolar and “natural” proportions. We acknowledge that the natural proportions will vary with plant age, damage history, etc. of the host plant, Asclepias curassavica. Our proportions were based on growing the plants a few different times under variable conditions. Although we did not conduct these experiments on non-adapted insects, we discuss a related experiment that was conducted with wild-type and genetically engineered Drosophila (Lopez-Goldar et al. 2024, PNAS). In sum, we appreciate the reviewers’ comments.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (i) More convincingly justify the choice and ecological relevance of the "natural" cardenolide ratios, (ii) Clarify the interpretation of mixture effects, and (iii) more explicitly discuss the limitations of leaf-disc assays and the absence of non-adapted herbivores in light of the broader coevolutionary claims.

      Thank you for these suggestions. We have added several sentences of text to the Discussion section to make these points.

      Reviewer #1 (Recommendations for the authors):

      (1) Statistical analysis is missing from Figure 3 and Figure S3, making it difficult to assess the significance of the data.

      Much of the data in Fig. 3 is meant for descriptive presentation, with the main statistical analysis (contrast between N,S and non-N,S cardenolides given in the main text of the results. We have added treatment differences between the sequestration efficiencies to the figure as well.

      (2) To help readers intuitively understand how certain results (such as ECD and sequestration efficiency) were calculated, the authors can provide the equations used for these computations.

      Thank you, this was given in the methods and we have added it to the Result on first mention as well.

      (3) For Figure 4, we suggest presenting the results of the equal mixture treatment and the realistic mixture treatment separately, rather than averaging the results from these two types of treatments.

      We understand and appreciate this comment – all of the treatment means are given in Fig. S3. For this particular figure we have opted to stick with the binary comparison (singles vs. mixed) to maximize replication for statistical tests (typically n = 25 vs. 10).

      Reviewer #2 (Recommendations for the authors):

      Given the interpretations and discussion generally, I feel the manuscript would benefit from either additional experiments (mixtures w/o N-S compounds), inclusion of non-adapted herbivore performance, or reframing of the explicit interpretations from your findings.

      We have added some caveats to the text but not added any additional experiments.

      Also, for all treatments/mixtures are concentrations above the IC50? Perhaps this could be calculated from the information presented, but it may be best to explicitly mention this.

      This is an interesting question. IC50’s are estimated from in vitro assays (with the enzyme and toxins in microplate wells) and so are not translatable to foliar concentrations. As indicated in the text, we chose cardenolide levels based on foliar concentrations to match A. curassavica.

      Some minor points:

      (1) Although the intact N,S-ring-containing compounds are recovered in low amounts in frass (and not sequestered), is there evidence of N,S-ring components being otherwise traceable in the frass? For example, can excess S or N be detected in frass? This could provide insight into differential detoxification or reincorporation of these elements, potentially explaining variation between voruscharin and uscharin.

      Great question! We have not been able to detect breakdown projects. In other experiments we have conducted mass spectrometric analysis of bodies and frass, but have not been able to find the features representing breakdown products. Nonetheless, as mentioned below, the main conversion products are evident and measurable, as in this study.

      (2) As a point of curiosity, is there evidence of interconversion between such compounds? For instance, if monarchs are fed only voruscharin, can other cardenolides be detected in their tissues?

      Yes, we have tried to make this more clear in the text. Both uscharin and voruscharin are converted to calotropin and calactin.

    1. Author response:

      General Statements

      Our study provides important mechanistic insights into how the perinuclear actomyosin network PANEM facilitates the interaction of unfavorably positioned chromosomes, i.e. peripheral and polar chromosomes, with the mitotic spindle in early mitosis to ensure their correct segregation in subsequent anaphase. All reviewers agree that our study makes important contribution to the field of mitosis and chromosome segregation. They make positive comments on our manuscript, for example, ‘The work highlights the PANEM as a key spatial and temporal element of chromosome congression’, ‘The work is an excellent addition to the field’, and ‘the concept of PANEM could be integrated into textbooks and models of chromosome congression’. All three reviewers also acknowledge the high quality of the data, rigorous and accurate analyses, and convincing quantification in our study. Reviewers 1 and 3 give several comments and suggestions for revision of our manuscript. Please find our point-by-point revision plan of the manuscript from page 3.

      Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosin-based mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.

      In its current form, however, the manuscript is not acceptable for publication. It suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend Major Revision, addressing the structural, conceptual, and experimental issues detailed below.

      Major Comments

      A. Structural overhaul and figure reorganization

      The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.

      Figure 4I: This panel is currently unclear and should be drastically simplified.

      We will follow this suggestion and simplify this figure. For example, we plan to remove the column of “Start” because it is obvious and does not provide much new information.

      I recommend to reorganize figures as follows:

      Figure I: Keep as single figure but simplify. Figure 1D and 1E could be combined, move unnormalized SCV to supplementary materials. Same goes for 1F.

      We will follow this suggestion and reorganize Figure 1 accordingly.

      New Figure 4: Combine Figures 7A, 7B, 7D, 7E, 7F, expanded Supplementary Figure S7, and new data to demonstrate that PANEM actively pushes peripheral chromosomes inward which is important for efficient chromosome congression in diverse cellular contexts.

      As suggested, we will conduct new experiments to demonstrate the role of PANEM in diverse cellular contexts, as detailed below. We will then combine the new results with Figure S7 to make the new Figure 8.

      On the other hand, in our view, combining Figure 7A-E and the extended Figure S7 would be confusing because the two parts address different topics. Although we respect this suggestion from the reviewer, we would like to keep Figure 7 and the extended Figure S7 (i.e. Figure 8) separate.

      C. Expansion of PANEM functional analysis

      To strengthen the conclusions and broaden the study beyond the group's previous work, PANEM function should be tested in additional contexts (some may be considered optional but important for broader impact): [underlined by authors]

      Test PANEM function in at least one additional cell line that displays PANEM to rule out cellline-specific effects.

      As suggested, we will study the effect of PANEM contraction in one or two additional cell lines that form PANEM during prophase. For example, we plan to inhibit the PANEM contraction and study the outcome, focusing on the generation of polar chromosomes, which is the major defect after the inhibition of PANEM contraction in U2OS cells.

      Evaluate PANEM contraction role in unsynchronized U2OS cells, where centrosome separation can occur before NEBD in a subset of cells (Koprivec et al., 2025), and in other cell types with variable spindle elongation timing.

      As suggested, we will investigate the outcome (e.g. generation of polar chromosomes) of reduced PANEM contraction in unsynchronized U2OS cells, and address whether the two subsets of cells, where centrosomes’ separation occurs before and after NEBD, show any difference in the outcome.

      D. Conceptual integration in Introduction and Discussion

      The manuscript should better situate its findings within the context of early mitotic chromosome movements:

      Clearly state in the Introduction and elaborate in the Discussion that initiation of congression is coupled to biorientation (Vukušić & Tolić, 2025). This provides essential context for how PANEM-mediated nuclear volume reduction supports efficient congression of polar chromosomes.

      To explain the new interpretation of our results more clearly, we plan to add a new diagram to a supplemental figure in the revised manuscript.

      Minor Comments

      Sixth subheading (currently in Discussion): Move the final paragraph of the Discussion into the Results and expand it with preliminary analyses linking PANEM contraction to congression efficiency across untreated cell types or under mild nocodazole treatment.

      As suggested, we will move the final paragraph of the Discussion to make a new final section in the Results. Moreover, as suggested, we will study the outcome of inhibiting PANEM contraction in cell lines other than U2OS, and add the results to the new final section in the Results.

      Significance

      Advance

      This study's main strength is its novel and potentially important demonstration that contraction of PANEM, a peripheral actomyosin network that operates contracts early mitosis, contributes to the timely initiation of chromosome congression, especially for polar chromosomes. While PANEM itself was previously described by this group, this manuscript provides new mechanistic evidence, improved perturbations, and detailed chromosome tracking. To my knowledge, no prior studies have mechanistically connected this contraction to polar chromosome congression in this level of detail. The work complements dominant microtubule-centric models of chromosome congression and introduces actomyosin-based forces as a cooperating system during very early mitosis. However, the impact of the study is currently limited by major organizational issues, insufficient controls, and incomplete contextualization within existing literature. Addressing these issues will substantially improve clarity and credibility. [underlined by authors]

      We have addressed or will address the underlined criticisms as detailed above.

      Audience

      Primary audience of this study will be researchers working in cell division, mitosis, cytoskeleton dynamics, and motor proteins. The findings may interest also the wider cell biology community, particularly those studying chromosome segregation fidelity, spindle mechanics, and cytoskeletal crosstalk. If validated and clarified, the concept of PANEM could be integrated into textbooks and models of chromosome congression and could inform studies on mitotic errors and cancer cell mechanics.

      Expertise

      My expertise lies in kinetochore-microtubule interactions, spindle mechanics, chromosome congression, and mitotic signaling pathways.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this manuscript, Sheidaei et al. reported on their study of chromosome congression during the early stages of mitotic spindle assembly. Building on their previous study (ref. #15, Booth et al., Elife, 2019), they focused on the exact role of the actin-myosin-based contraction of the nuclear envelope. First, they addressed a technical issue from their previous study, finding a way to specifically impair the actomyosin contraction of the nuclear membrane without affecting the contraction of the plasma membrane. This allowed them to study the former more specifically. They then tracked individual kinetochores to reveal which were affected by nuclear membrane contraction and at what stage of displacement towards the metaphase plate. The investigation is rigorous, with all the necessary controls performed. The images are of high quality. The analyses are accurate and supported by convincing quantifications. In summary, they found that peripheral chromosomes, which are close to the nuclear membrane, are more influenced by nuclear membrane contraction than internal chromosomes. They discovered that nuclear membrane contraction primarily contributes to the initial displacement of peripheral chromosomes by moving them towards the microtubules. The microtubules then become the sole contributors to their motion towards the pole and subsequently the midplane. This step is particularly critical for the outermost chromosomes, which are located behind the spindle pole and are most likely to be missegregated.

      Significance

      While the conclusions are somewhat intuitive and could be considered incremental with regard to previous works, they are solid and improve our understanding of mitotic fidelity. The authors had already reported the overall role of nuclear membrane contraction in reducing chromosome missegregation in their previous study, as mentioned fairly and transparently in the text. However, the reason for this is now described in more detail with solid quantification. Overall, this is good-quality work which does not drastically change our understanding of chromosome congression, but contributes to improving it. Personally, I am surprised by the impact of such a small contraction (of around one micron) on the proper capture of chromosomes and wonder whether the signalling associated with the contraction has a local impact on microtubule dynamics. However, investigating this point is clearly beyond the scope of this study, which can be published as it is. [underlined by authors]

      The suggested topic (underlined) is intriguing. However, we agree with the reviewer that it is beyond the scope of this paper. The reviewer recommends publication of our manuscript as it is. So, we do not plan a revision based on this reviewer’s comments.

      Reviewer #3:

      Sheidaei et al., report how chromosomes are brought to positions that facilitate kinetochoremicrotubule interactions during mitosis. The study focusses on an important early step of the highly orchestrated chromosome segregation process. Studying kinetochore capture during early prophase is extremely difficult due to kinetochore crowding but the team has taken up the challenge by classifying the types of kinetochore movements, carefully marking kinetochore positions in early mitosis and linking these to map their fate/next-positions over time. The work is an excellent addition to the field as most of the literature has thus far focussed on tracking kinetochore in slightly later stages of mitosis. The authors show that the PANEM facilitates chromosome positioning towards the interior of the newly forming spindle, which in turn facilitates chromosome congression - in the absence of PANEM chromosomes end up in unfavourable locations, and they fail to form proper kinetochore-microtubule interactions. The work highlights the perinuclear actomyosin network in early mitosis (PANEM) as a key spatial and temporal element of chromosome congression which precedes the segregation process.

      Major points

      (4) The work has high quality manual tracking of objects in early mitosis- if this would be made available to the field, it can help build AI models for tracking. The authors could consider depositing the tracking data and increasing the impact of their work.

      As suggested, we will include kinetochore tracking data as supplemental data in the revised manuscript.

      Minor points

      (2) Discussion point: If cells had not separated their centrosomes before NEBD, would PANEM still be effective? Perhaps the cancer cell lines or examples as shown in Figure 6A have some clues here.

      The same question has been raised by Reviewer #1’s major point. We will undergo new experiments to directly address this question in a revised manuscript. If we do not obtain interpretable results, we will discuss this issue further in the Discussion, as suggested.

      (3) Figure 7 cartoon shows misalignment leading to missegregation. It may be useful to consider this in the context of the centrosome directed kinetochore movements via pivoting microtubules. Is this process blocked in azBB-treated cells?

      This issue is closely relevant to point 2 above. As discussed above, we will first address this issue experimentally. If we do not obtain interpretable results, we will discuss this issue further in the Discussion.

      Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosin-based mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.

      In its current form, however, the manuscript is not acceptable for publication. It suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend Major Revision, addressing the structural, conceptual, and experimental issues detailed below.

      Major Comments

      A. Structural overhaul and figure reorganization

      The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.

      Remove repetitive statements that simply restate that later phenotypes arise as consequences of delayed Phase 1 (applicable to subheadings 3 onward).

      As suggested, we have removed the statement for the delayed start of Phase 2 for peripheral kinetochores in azBB-treated cells (Page 9, second paragraph). We have also simplified the statement for the delayed start of Phase 3 and Phase 4 to avoid repetition (Page 9, third paragraph; Page 10, second paragraph).

      B. Specificity and redundancy of actin perturbation

      To establish the specificity and relevance of PANEM, the authors should include or discuss appropriate controls:

      Apply global actin inhibitors (e.g., cytochalasin D, latrunculin A) to disrupt the entire actin cytoskeleton. These perturbations strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as reported previously (Lancaster et al., 2013; Dewey et al., 2017; Koprivec et al., 2025). The minimal effect of global inhibition must be addressed when proposing a localized actomyosin mechanism. Comment if the apparent differences in this approach and one that the authors were using arises due to different cell types.

      We did experiments along this line, using a dominant-negative LINC construct, in our previous study (Booth et al eLife 2019). LINC-DN should more specifically remove/reduce PANEM than the global actin inhibitors mentioned above. LINC-DN attenuated the reduction of CSV soon after NEBD and increased the number of polar chromosomes (Booth et al eLife 2019); i.e. in this regard, the outcome was similar to azBB treatment in the current study. One can expect that global actin inhibitors would also inhibit the PANEM formation and show effects similar to LINC-DN. By contrast, the indicated references reported that global actin inhibitors strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as pointed out by the reviewer. Such a difference may have arisen due to different cell types (e.g. some cells form the PANEM and others do not: Figure S7), a different extent in the inhibition of PANEM formation, and/or the inhibition of cell rounding and cytokinesis (e.g. if cytokinesis is more sensitive to inhibitors than is the PANEM formation, we may not observe the possible effects on early chromosome movements due to PANEM inhibition while cytokinesis is still affected). As suggested, we discussed this topic in the Discussion (page 15, second paragraph). 

      Clarify why spindle-associated actin, especially near centrosomes, as reported in prior studies using human cultured cells (Kita et al., 2019; Plessner et al., 2019; Aquino-Perez et al., 2024), was not observed in this study. The Myosin-10 and actin were also observed close to centrosomes during mitosis in X.laevis mitotic spindles (Woolner et al., 2008). Possible explanations include differences in fixation, probe selection, imaging methods, or cell type. Note that some actin probes (e.g., phalloidin) poorly penetrate internal actin, and certain antibodies require harsh extraction protocols. Comment on possibility that interference with a pool of Myo10 at the centrosomes is important for effects on congression.

      As the reviewer implies, we cannot rule out that we could not detect actin associated with the spindle or centrosomes because of the difference in methods or cell lines between the current study and the literature mentioned by the reviewer. We have therefore moderated our claim in the Discussion that ‘we did not detect any actin network inside the nucleus, on the spindle or between chromosomes’ by adding ‘at least, using the method and the cell line in the current study’ to this statement (Page 13, second paragraph). We have also cited the three references mentioned by the reviewer in the Discussion (Page 13, second paragraph). Regarding Myosin10, azBB (blebbistatin variant) should have negligible effects on class-X myosin, including Myosin-10 (Limouze et al 2004 [PMID 15548862]). It is therefore unlikely that the effects of azBB that we observed in the current study are due to the inhibition of Myosin-10. We have cited Woolner et al 2008 and another paper and discussed this topic in the Discussion (Page 13, second paragraph).

      C. Expansion of PANEM functional analysis

      Quantify not only the percentage of affected cells after azBB but also the number of chromosomes per cell with congression defects in the current and future experiments.

      It is tricky to count the number of chromosomes because they frequently overlap. Counting kinetochores is more feasible, but kinetochore signals show some non-specific background (e.g. those outside of the nucleus in prophase). We therefore quantified the chromosome volume at polar regions in azBB-treated cells (Figure 6C).

      D. Conceptual integration in Introduction and Discussion

      The manuscript should better situate its findings within the context of early mitotic chromosome movements:

      Clearly state in the Introduction and elaborate in the Discussion that initiation of congression is coupled to biorientation (Vukušić & Tolić, 2025). This provides essential context for how PANEM-mediated nuclear volume reduction supports efficient congression of polar chromosomes.

      It has been a widely accepted view in the field that chromosome congression precedes biorientation, since the publication in 2006 (Kapoor et al Science 2006). Very recently, this view has been challenged by the new publication (Vukušić & Tolić, Nat comm 2025), as indicated by this reviewer. We have mentioned this new model and discussed the new interpretation of our results based on this new model, in the Discussion (page 14; ‘It has been a widely accepted view…’).

      To explain the new interpretation of our results more clearly, we plan to add a new diagram to a supplemental figure in the revised manuscript.

      Explain that PANEM is most critical for polar chromosomes because their peripheral positions are unfavorable for rapid biorientation (Barišić et al., 2014; Vukušić & Tolić, 2025).

      We have included such a statement in the Discussion, as a part of the new interpretation of our results based on the new model that chromosome biorientation precedes congression (see above). We have also cited the indicated two papers.

      Discuss how cell lines lacking PANEM (e.g., HeLa and others) nonetheless achieve efficient congression, and what alternative mechanisms compensate in the absence of PANEM. For example, it is well established that cells congress chromosomes after monastrol or nocodazole washout, which essentially bypasses the contribution of PANEM contraction.

      Following this suggestion, we discussed three possible mechanisms that could compensate for a lack of PANEM and facilitate kinetochore-MT interaction and chromosome congression, based on previous literature (Page 16): 1) the enhanced assembly rate of spindle MTs may facilitate kinetochore-MT interactions in N-CIN+ cancer cells, 2) chromosome biorientation may precede congression more frequently to promote the congression towards the spindle midplane, and 3) the balance between CENP-E, Dynein and chromokinesin’s activities may incline to greater chromosome-arm ejection forces towards the spindle midplane.

      Minor Comments

      These issues are more easily addressable but will significantly improve clarity and presentation.

      Introduction

      Remove the reference to Figure 1A in the Introduction. The portion of Figure 1 and related text that recapitulates the authors' previous work should be incorporated into the Introduction, not the Results.

      As suggested in the second sentence of this comment, we have moved most of the second paragraph of the first section of Results to Introduction (Page 4) and cited Figure 1A and 1B in Introduction. We would like to keep the reference to Figure 1A in the Introduction, because showing the PANEM images at the beginning of the manuscript would help readers’ understanding of our study. In addition, citing Figure 1A in the Introduction is more consistent with the suggestion in the second sentence of this comment.

      Results (by subheading)

      First subheading: When introducing the ~8-minute early mitotic interval, cite additional studies that have characterized this period: Magidson et al., 2011 (Cell); Renda et al., 2022 (Cell Reports); Koprivec et al., 2025 (bioRxiv); Vukušić & Tolić, 2025 (Nat Commun); Barišić et al., 2013 (Nat Cell Biol).

      As suggested, we cited these references at the indicated part of the first section of the Results (page 5).

      Second subheading: Cite key reviews and foundational research on kinetochore architecture and sequential chromosome movement during early mitosis: Mussachio & Desai, 2017

      (Biology); Itoh et al., 2018 (Sci Rep); Magidson et al., 2011 (Cell); Vukušić & Tolić, 2025 (Nat Commun); Koprivec et al., 2025 (bioRxiv); Rieder & Alexander, 1990 (J Cell Biol); Skibbens et al., 1993 (J Cell Biol); Kapoor et al., 2006 (Science); Armond et al., 2015 (PLoS Comput Biol); Jaqaman et al., 2010 (J Cell Biol).

      Rieder & Alexander, 1990 (J Cell Biol) and Kapoor et al., 2006 (Science) have already been cited in the second section of the Results in the original manuscript. We agree that all other references should be cited in this manuscript, and they are now cited in the Introduction and/or Discussion where they fit best (e.g. Mussachio & Desai 2017 reviews the kinetochore in general and is therefore best cited in the Introduction).

      Third subheading: Clarify why some kinetochores on Figure 3A appear outside the white boundaries if these boundaries are intended to represent the nuclear envelope.

      We interpret that these are background signals in the cytoplasm, which do not come from kinetochores, because 1) before NEBD, they were outside of the nucleus, and 2) after NEBD, they did not show any characteristic kinetochore motions such as those towards a spindle pole (Phase 2) and the spindle mid-plane (Phase 4). We have commented on these background signals in the legend for Figure 3A.

      Fifth subheading: Cite studies on polar chromosome movements: Klaasen et al., 2022 (Nature); Koprivec et al., 2025 (bioRxiv). Clarify that Figure 5F displays only those kinetochores that initiated directed congression movements.

      These two references have already been cited and discussed in this Result section of our original manuscript. However, considering this suggestion, we have discussed more about polar chromosome movements reported by Koprivec et al (page 11). Meanwhile, the reviewer is correct about Figure 5F, and we have clarified this point in the Figure 5F legend.

      Discussion

      When discussing cortical actin, cite key reviews on its presence and function during mitosis:

      Kunda & Baum, 2009 (Trends Cell Biol); Pollard & O'Shaughnessy, 2019 (Annu Rev Biochem); Di Pietro et al., 2016 (EMBO Rep).

      As suggested, we have cited all these review papers in the Discussion (page 15), and mentioned the role of the cortical actin on the spindle orientation and positioning (Kunda & Baum, 2009; Di Pietro et al., 2016), as well as the function of the actomyosin ring on cytokinesis (Pollard & O'Shaughnessy, 2019).

      Significance

      Advance

      This study's main strength is its novel and potentially important demonstration that contraction of PANEM, a peripheral actomyosin network that operates contracts early mitosis, contributes to the timely initiation of chromosome congression, especially for polar chromosomes. While PANEM itself was previously described by this group, this manuscript provides new mechanistic evidence, improved perturbations, and detailed chromosome tracking. To my knowledge, no prior studies have mechanistically connected this contraction to polar chromosome congression in this level of detail. The work complements dominant microtubule-centric models of chromosome congression and introduces actomyosin-based forces as a cooperating system during very early mitosis. However, the impact of the study is currently limited by major organizational issues, insufficient controls, and incomplete contextualization within existing literature. Addressing these issues will substantially improve clarity and credibility. [underlined by authors]

      We have addressed or will address the underlined criticisms as detailed above.

      Audience

      Primary audience of this study will be researchers working in cell division, mitosis, cytoskeleton dynamics, and motor proteins. The findings may interest also the wider cell biology community, particularly those studying chromosome segregation fidelity, spindle mechanics, and cytoskeletal crosstalk. If validated and clarified, the concept of PANEM could be integrated into textbooks and models of chromosome congression and could inform studies on mitotic errors and cancer cell mechanics.

      Expertise

      My expertise lies in kinetochore-microtubule interactions, spindle mechanics, chromosome congression, and mitotic signaling pathways.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this manuscript, Sheidaei et al. reported on their study of chromosome congression during the early stages of mitotic spindle assembly. Building on their previous study (ref. #15, Booth et al., Elife, 2019), they focused on the exact role of the actin-myosin-based contraction of the nuclear envelope. First, they addressed a technical issue from their previous study, finding a way to specifically impair the actomyosin contraction of the nuclear membrane without affecting the contraction of the plasma membrane. This allowed them to study the former more specifically. They then tracked individual kinetochores to reveal which were affected by nuclear membrane contraction and at what stage of displacement towards the metaphase plate. The investigation is rigorous, with all the necessary controls performed. The images are of high quality. The analyses are accurate and supported by convincing quantifications. In summary, they found that peripheral chromosomes, which are close to the nuclear membrane, are more influenced by nuclear membrane contraction than internal chromosomes. They discovered that nuclear membrane contraction primarily contributes to the initial displacement of peripheral chromosomes by moving them towards the microtubules. The microtubules then become the sole contributors to their motion towards the pole and subsequently the midplane. This step is particularly critical for the outermost chromosomes, which are located behind the spindle pole and are most likely to be missegregated.

      Significance

      While the conclusions are somewhat intuitive and could be considered incremental with regard to previous works, they are solid and improve our understanding of mitotic fidelity. The authors had already reported the overall role of nuclear membrane contraction in reducing chromosome missegregation in their previous study, as mentioned fairly and transparently in the text. However, the reason for this is now described in more detail with solid quantification. Overall, this is good-quality work which does not drastically change our understanding of chromosome congression, but contributes to improving it. Personally, I am surprised by the impact of such a small contraction (of around one micron) on the proper capture of chromosomes and wonder whether the signalling associated with the contraction has a local impact on microtubule dynamics. However, investigating this point is clearly beyond the scope of this study, which can be published as it is. [underlined by authors]

      The suggested topic (underlined) is intriguing. However, we agree with the reviewer that it is beyond the scope of this paper. The reviewer recommends publication of our manuscript as it is. So, we do not plan a revision based on this reviewer’s comments.

      Reviewer #3:

      Sheidaei et al., report how chromosomes are brought to positions that facilitate kinetochoremicrotubule interactions during mitosis. The study focusses on an important early step of the highly orchestrated chromosome segregation process. Studying kinetochore capture during early prophase is extremely difficult due to kinetochore crowding but the team has taken up the challenge by classifying the types of kinetochore movements, carefully marking kinetochore positions in early mitosis and linking these to map their fate/next-positions over time. The work is an excellent addition to the field as most of the literature has thus far focussed on tracking kinetochore in slightly later stages of mitosis. The authors show that the PANEM facilitates chromosome positioning towards the interior of the newly forming spindle, which in turn facilitates chromosome congression - in the absence of PANEM chromosomes end up in unfavourable locations, and they fail to form proper kinetochore-microtubule interactions. The work highlights the perinuclear actomyosin network in early mitosis (PANEM) as a key spatial and temporal element of chromosome congression which precedes the segregation process.

      Major points

      (1) The complexity of tracking has been managed by classifying kinetochore movements into 4 categories, considering motions towards or away from the spindle mid-plane. While this is a very creative solution in most cases, there may be some difficult phases that involve movement in both directions or no dominant direction (eg Phase3-like). It is unclear if all kinetochores go through phase1, 2, 3 and 4 in a sequential or a few deviate from this pattern. A comment on this would be helpful. Also, it may be interesting to compare those that deviate from the sequence, and ask how they recover in the presence and absence of azBB.

      To respond to this comment, we would like to first clarify how we selected kinetochores for our analysis. We selected kinetochores that can be individually tracked. If kinetochore tracking was difficult (before the start of Phase 4 in control and azBB-treated cells or before observing the extended Phase 3 in azBB-treated cells) because of kinetochore crowding, we did not choose such kinetochores. We also did not include kinetochores close to spindle poles (within 4 µm) at NEBD in our analysis for the following two reasons: First, these kinetochores often did not show clear and rapid movements towards a spindle pole, which we used to define Phase 2. Second, although we referred to kinetochore co-localization with a microtubule signal for the start of Phase 2, this was difficult for kinetochores close to spindle poles because of a high density of microtubules. As requested, we have added this comment to the Method section (page 23).

      With the above selection, all selected kinetochores without azBB treatment (control) showed the poleward motion (Phase 2) and congression (Phase 4) in this order, though their extents were varied among kinetochores. All selected kinetochores with azBB treatment also showed the poleward motion (Phase 2), and some of them showed congression (Phase 4) after Phase 2. Then, Phase 1 and Phase 3 were defined as intervals between NEBD and Phase 2 and between Phase 2 and Phase 4, respectively. If no Phase 4 was observed with azBB, we judged that Phase 3 continued till the end of tracking. We have added this comment to the Method section (page 23-24).

      (2) Would peripheral kinetochore close to poles behave differently compared to peripheral kinetochore close to the midplane (figure S4)? In figure 3D, are they separated? If not, would it look different?

      Since we did not include kinetochores close to spindle poles (at NEBD), for which it was difficult to define Phase 2 (see our response to the above major point 1), in our analysis, the suggested comparison is not feasible.

      (3) Uncongressed polar chromosomes (eg., CENPE inhibited cells) are known to promote tumbling of the spindle. In figure 5B with polar chromosomes, it will be helpful to indicate how the authors decouple spindle pole movements from individual kinetochore movements.

      In contrast to CENPE-inhibited cells, azBB-treated cells did not show much tumbling of the spindle, though both cells showed uncongressed polar chromosomes. The reason for this difference may be fewer uncongressed polar chromosomes in azBB-treated cells. There were still modest spindle motions in azBB-treated cells. However, because kinetochore motions were assessed relative to a spindle pole (and other reference points on the spindle) in our study (Figure 2A, C), the modest spindle motions were offset in our analyses of kinetochore motions. We have clarified the underlined part in the Method section (page 22).

      Minor points

      (1) It will be helpful for readers to see how many kinetochores/cell were considered in the tracking studies. Figure legends show kinetochore numbers but not cell numbers.

      As suggested, we have now mentioned the number of cells, where the kinetochore motions were analyzed, in the legends for Figures 3, 4, 5, S4 and S5.

      (4) Are all the N-CIN- lines with PANEM highly sensitive to azBB? In other words, is PANEM essential for normal congression in some of these lines.

      We checked the sensitivity of cell lines in Figure S7B to blebbistatin (the original form of azBB) on DepMap. There was no plausible difference between PANEM+ and PANEM- cell lines, although the blebbistatin sensitivity data were available only for 4 cell lines (HCT116, MCF7, U2OS and HT29) in Figure S7B. Nonetheless, because blebbistatin could kill cells by inhibiting cytokinesis, the blebbistatin sensitivity may not necessarily reflect how essential the PANEM contraction is for chromosome congression.

      (5) Are congression times delayed in lines that naturally lack PANEM?

      For example, it takes 10-20 min for HeLa cells (lacking PANEM) to complete chromosome congression after the NEBD (Bancroft et al 2025: https://doi.org/10.1242/jcs.163659). This is not significantly different from the time (8-18 min) for chromosome congression we observed in U2OS cells (forming PANEM). We assume that cells lacking PANEM have developed a compensatory mechanism for efficient chromosome congression – we have newly discussed possible compensatory mechanisms in the last paragraph of the Discussion (page 16).

      (6) Page 23 "we first identified the end of congression" how does this relate to kinetochore oscillations that move kinetochores away from the metaphase plate?

      The start of kinetochore oscillation was defined as the end of Phase 4 if we could track the kinetochore until that point. In some cases where the kinetochore became close to the midplane (< 2.5 µm), it was not possible to track it further due to kinetochore crowding around the spindle mid-plane – in such cases, the end of Phase 4 was assigned as the end of tracking. In the original manuscript, it was not clear that the end of Phase 4 was defined in the same way for both non-polar and polar kinetochores, while the start of Phase 4 was defined differently for the two groups. This was confusing in the original manuscript. We have now clarified these points in the Method section (page 23).

      (7) Are spindle pole distances (spindle sizes) different in early and late mitotic cells (4min vs 6min after NEBD) in control vs azBB-treated cells? Please comment on Figure S2E (mean distance) in the context of when phase 4 is completed. Does spindle size return to normal after congression?

      In Figure S2E, we did not observe a significant difference in the spindle-pole distance (the spindle size) between control and azBB-treated cells at any individual time points. The smallest p-value was 0.094 at 6.0 min. As suggested, we have explained this in the legend for Figure S2E.

      Significance:

      The current work builds upon their previous work, in which the authors demonstrated that an actomyosin network forms on the cytoplasmic side of the nuclear envelope during prophase. This work explains how the network facilitates chromosome capture and congression by tracking motions of individual kinetochores during early mitosis. The findings can be broadly useful for cell division and the cytoskeletal fields.

      Description of analyses that authors prefer not to carry out

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosin-based mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.

      In its current form, however, the manuscript is not acceptable for publication. It suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend Major Revision, addressing the structural, conceptual, and experimental issues detailed below.

      Major Comments

      A. Structural overhaul and figure reorganization

      The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.

      Move methodological and descriptive details (e.g., especially from the second Results subheading and Figure 2) to the Methods or Supplementary Materials.

      In these parts, we define four phases of kinetochore motion in early mitosis. Without such a description in the main text, readers would be confused about subsequent analyses. Figure 2 is also important to show examples of how the four phases develop. Although we respect this suggestion from the reviewer, we would like to keep these parts in the main text and main figure.

      New Figure 2: Combine current Figures 2A, 3A, 3C, 3D, 4C, 4F, and 4H to illustrate how PANEM contraction facilitates initial interactions of peripheral chromosomes with spindle microtubules which increases speed of congression initiation.

      If we were to follow this suggestion, we would lose Figure 2B, D, Figure 3B and Figure 4A, where examples of kinetochore motions are shown in images and 3D diagrams. The new Figure would mostly consist of only graphs. Without examples of images and 3D diagrams, readers would have difficulty understanding the study. Although we respect this suggestion from the reviewer, we would like to keep Figures 2, 3 and 4, as they are (except for making Figure 4I simpler; see above).

      New Figure 3: Combine current Figures 5A, 5C, 5D, 5F, 6B, 6C, and lower panels of 4H to show how PANEM contraction repositions polar chromosomes and reduces chromosome volume in early mitosis to enable rapid initiation of congression.

      If we were to follow this suggestion, we would lose Figure 5B and Figure 6A, where examples of kinetochore/chromosome dynamics are shown in images and 3D diagrams. For the same reason as above, we would like to keep Figure 5 and 6 as they are, although we respect this suggestion from the reviewer.

      New Figure 4: Combine Figures 7A, 7B, 7D, 7E, 7F, expanded Supplementary Figure S7, and new data to demonstrate that PANEM actively pushes peripheral chromosomes inward which is important for efficient chromosome congression in diverse cellular contexts.

      As suggested, we will conduct new experiments to demonstrate the role of PANEM in diverse cellular contexts, as detailed below. We will then combine the new results with Figure S7 to make the new Figure 8.

      On the other hand, in our view, combining Figure 7A-E and the extended Figure S7 would be confusing because the two parts address different topics. Although we respect this suggestion from the reviewer, we would like to keep Figure 7 and the extended Figure S7 (i.e. Figure 8) separate.

      B. Specificity and redundancy of actin perturbation

      To establish the specificity and relevance of PANEM, the authors should include or discuss appropriate controls:

      Examine higher-ploidy or binucleated cells to determine whether multiple PANEM contractions are coordinated and if PANEM contraction contributes more in cells of higher ploidies or specific nuclear morphologies.

      This is an interesting suggestion, but it takes lots of time to conduct such a study, and it goes beyond the scope of this paper.

      Investigate dependency on nuclear shape or lamina stiffness; test whether PANEM force transmission requires a rigid nuclear remnant.

      This is an interesting suggestion, but it takes lots of time to conduct such a study, and it goes beyond the scope of this paper.

      Analyze PANEM's contribution under mild microtubule perturbations that are known to induce congression problems (e.g., low-dose nocodazole).

      In the current study, we found that PANEM contraction affects chromosome motions in Phase 1 and Phase 3 but not Phase 2 or Phase 4. Mild microtubule perturbation itself could affect chromosome motions in all four Phases. We do not think it would be so informative to study what additional effects the reduced PANEM contraction shows when combined with mild microtubule perturbation.

      D. Conceptual integration in Introduction and Discussion

      The manuscript should better situate its findings within the context of early mitotic chromosome movements:

      Minor Comments

      These issues are more easily addressable but will significantly improve clarity and presentation.

      Results (by subheading)

      Fourth subheading: Note that congression speed is lower for centrally located kinetochores because they achieve biorientation more rapidly (Barišić et al., 2013, Nat Cell Biol; Vukušić & Tolić, 2025, Nat Commun).

      We respect this comment. However, if biorientation were established more rapidly for centrally located kinetochores, it would advance the initiation of congression, but would not necessarily change congression speed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We greatly appreciate the reviewers’ constructive comments and have followed their recommendations to improve our manuscript. These improvements include additional experiments, new analyses, and a rewriting of the text. We believe these changes significantly improved the paper and hope the editor and the reviewers agree. The following is a summary of the major changes made and our point-by-point response to reviewers’ comments.

      Summary of major changes:

      (1) Expanded labeling options: We generated a new nMAGIC vector containing miRFP680 as an infrared fluorescent protein (IFP) marker. We used gRNA-40D2(IFP) to demonstrate clones labeled by this marker in the wing imaginal disc (Figure 1M). This vector is available via Addgene for the generation of new gRNA-markers with our recommended or customer-designed gRNA target sequences.

      (2) Validated Gal80 potency: We provide new data in Figure 1E demonstrating complete suppression of pxn-Gal4>CD4-tdTom by tub-GAL80-DE-SV40. The exact transgenes used in the comparisons are clarified in the figure and figure legend.

      (3) Verified clone fitness: We compared the sizes of nMAGIC twin spots in wing discs and found no intrinsic growth or viability bias between marker/marker and WT/WT clones (Figure 1O).

      (4) Methodological Schematics: We added supplemental figures to Figure 1 to illustrate the principle of MAGIC, the difference between pMAGIC and nMAGIC, and an example of pMAGIC crossing scheme.

      (5) Inducible induction: We provide new data (Figure 3J-K’) showing the induction of sparse neuronal clones in the adult brain by heat shock (hs)-Cas9.

      (6) We revised texts to incorporate all other recommendations suggested by the reviewers. We also made other small changes to the manuscript to improve its readability.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Shen et al. have improved upon the mitotic clone analysis tool MAGIC that their lab previously developed. MAGIC uses CRISPR/Cas9-mediated double-stranded breaks to induce mitotic recombination. The authors have replaced the sgRNA scaffold with a more effective scaffold to increase clone frequency. They also introduced modifications to positive and negative clonal markers to improve signal-to-noise and mark the cytoplasm of the cells instead of the nuclei. The changes result in increase in clonal frequencies and marker brightness. The authors also generated the MAGIC transgenics to target all chromosome arms and tested the clone induction efficacy.

      Strengths:

      MAGIC is a mitotic clone generation tool that works without prior recombination to special chromosomes (e.g., FRT). It can also generate mutant clones for genes for which the existing FRT lines could not be used (e.g., the genes that are between the FRT transgene and the centromere).

      This manuscript does a thorough job in describing the method and provides compelling data that support improvement over the existing method.

      Weaknesses:

      It would be beneficial to have a greater variety of clonal markers for nMAGIC. Currently, the only marker is BFP, which may clash with other genetic tools (e.g., some FRET probes) depending on the application. It would be nice to have far-red clonal markers.

      We thank the reviewer for the positive comments about our study. We agree with the reviewer that adding a far-red option for nMAGIC increases the flexibility of this method. We replaced the BFP coding sequence in the nMAGIC cloning vector pAC-U63-QtgRNA2.1-tubBFP(HA) with that of miRFP680-T2A-HO1. We then used the resulting cloning vector to make a gRNA-40D2(IFP) transgene and tested it in the wing disc. Result showing clones in the wing disc are now in Figure 1M. The new cloning vector, along with others reported in our study, are available from Addgene.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors present the latest improvement of their previously published methods, pMAGIC and nMAGIC, which can be used to engineer mosaic gene expression in wild-type animals and in a tissue-specific manner. They address the main limitation of MAGIC, the lack of gRNA-marker transgenes, which has hampered the broader adoption of MAGIC in the fly community. To do so, they create an entire toolkit of gRNA markers for every Drosophila chromosome and test them across a range of different tissues and in the context of making Drosophila species hybrid mosaic animals. The study provides a significant and broadly useful improvement compared to earlier versions, as it broadens the use-cases for transgenic manipulation with MAGIC to virtually any subfield of Drosophila cell biology.

      Strengths:

      Major improvements to MAGIC were made in terms of clone induction efficiency and usability across the Drosophila model system, including wild-type genotypes and the use in non-melanogaster species.

      Notably, mosaic mutants can now be created for genes residing on the 4th chromosome, which is exciting and possibly long-awaited by 4th chromosome gene enthusiasts.

      Selection of the standard set of gRNA markers was done thoughtfully, using non-repetitive conserved and unique sequences.

      The authors demonstrate that MAGIC can be used easily in the context of interspecific hybrids. I believe this is a great advancement for the Drosophila community, especially for evolutionary biologists, because this may allow for easy access to mechanistic, tissue-specific insight into the process of a range of hybrid incompatibilities, an important speciation process that is normally difficult to study at the level of molecular and cell biology.

      In the same way, because it is not limited to usage in any particular genetic background, genome-wide MAGIC can be potentially used in wild-type genotypes relatively easily. This is exciting, especially because natural genetic diversity is rarely investigated more mechanistically and at the scale/resolution of cells or specific tissues. Now, one can ask how a particular naturally occurring allele influences cell physiology compared to another (control) while keeping the global physiological context of the particular genetic background largely intact.

      Weaknesses:

      It is not entirely clear how functionally non-critical regions were evaluated, besides that they are selected based on conservation of sequence between species. It may be useful to directly test the difference in viability or other functionally relevant phenotype for flies carrying different markers. Similarly, the frequency of off-targets could be investigated or documented in a bit more detail, especially if one of the major use-cases is meant for naturally derived, diverse genetic backgrounds. It is, at the moment, unclear how consistently the clones are induced for each new gRNA marker across different WT genetic backgrounds, for example, a set of DGRP genotypes, which could be highly useful information for future users.

      We thank the reviewer for the positive comments about our study. The reviewer raises an excellent point regarding the consistency of clone induction and potential background effects in diverse genetic backgrounds. As a standard step in building the MAGIC kit, we tested all gRNA-marker transgenes with the Cas9-LEThAL assay (Poe et al., Genetics, 2019), in which the gRNA-marker transgene was crossed to lig4 Act5C-Cas9 homozygotes. All crosses led to viable and apparently healthy female progeny, suggesting that ubiquitously mutating the chosen gRNA targeting sites does not cause obvious defects.

      For standard mutant analysis, we recommend researchers to use a well-characterized wildtype chromosome as a negative control. For studies utilizing diverse wildtype backgrounds where a standard control chromosome is inapplicable (e.g., DGRP screens), we recommend an internal validation strategy: researchers should confirm their key phenotypic findings by inducing clones with a second, independent gRNA-marker located on the same chromosomal arm (e.g., comparing clones induced by gRNA-40D2 vs. gRNA-40D4 ). This ensures that any observed phenotypes or variations in clone induction are linked to the selected genetic background rather than an off-target artifact or target-site specific effect.

      We admit that the above approach may not resolve concerns about off-targets. Performing deep sequencing to map empirical off-targets for all 34 gRNA pairs across multiple genetic backgrounds is experimentally prohibitive for a toolkit resource. However, our in silico selection pipeline strictly required target sequences to be unique within the D. melanogaster genome to mathematically minimize off-target probability. In addition, our requirement that target sequences be conserved in closely related Drosophila species acts as a stringent filter against intraspecies variation. Sequences conserved across species are subject to purifying selection, substantially reducing the likelihood that SNPs within the DGRP lines will disrupt the PAM or seed sequences required for Cas9 induction.

      Reviewer #3 (Public review):

      Summary:

      In the manuscript by Shen, Yeung, and colleagues, the authors generate an improved and expanded Mosaic analysis by gRNA-induced crossing-over (MAGIC) toolkit for use in making mosaic clones in Drosophila. This is a clever method by which mitotic clones can be induced in dividing cells by using CRISPR/Cas9 to generate double-strand breaks at specific locations that induce crossing over at those locations. This is conceptually similar to previous mosaic methods in flies that utilized FRT sites that had been inserted near centromeres along with heat-shock inducible FLPase. The advantage of the MAGIC system is that it can be used along with chromosomes lacking FRT sites already introduced, such as those found in many deficiency collections or in EMS mutant lines. It may also be simpler to implement than FRT-based mosaic systems. There are two flavors of the MAGIC system: nMAGIC and pMAGIC. In nMAGIC, the main constituents are a transgene insertion that contains gRNAs that target DNA near the centromere, along with a fluorescent marker. In pMAGIC, the main constituents are a transgenic insertion that contains gRNAs that target DNA near the centromere, along with ubiquitous expression of GAL80. As such, nMAGIC can be used to generate clones that are not labelled, whereas pMAGIC (along with a GAL4 line and UAS-marker) can be used much like MARCM to positively label a clone of cells. This manuscript introduces MAGIC transgenic reagents that allow all 4 chromosomes to be targeted. They demonstrate its use in a variety of tissues, including with mutants not compatible with current FLP/FRT methods, and also show it works well in tissues that prove challenging for FLP/FRT mosaic analyses (such as motor neurons). They further demonstrate that it can be used to generate mosaic clones in non-melanogaster hybrid tissues. Overall, this work represents a valuable improvement to the MAGIC method that should promote even more widespread adoption of this powerful genetic technique.

      Strengths:

      (1) Improves the design of the gRNA-marker by updating the gRNA backbone and also the markers used. GAL80 now includes a DE region that reduces the perdurance of the protein and thus better labeling of pMAGIC clones. The data presented to demonstrate these improvements is rigorous and of high quality.

      (2) Introduces a toolkit that now covers all chromosome arms in Drosophila. In addition, the efficiency of 3 target different sites is characterized for each chromosome arm (e.g., 3 different gRNA-Marker combinations), which demonstrate differences in efficiency. This could be useful to titrate how many clones an experimenter might want (e.g., lower efficiency combinations might prove advantageous).

      (3) The manuscript is well written and easy to follow. The authors achieved their aims of creating and demonstrating MAGIC reagents suitable for mosaic analysis of any Drosophila chromosome arm.

      (4) The MAGIC method is a valuable addition to the Drosophila genetics toolkit, and the new reagents described in this manuscript should allow it to become more widely adopted.

      Weaknesses:

      (1) The MAGIC method might not be well known to most readers, and the manuscript could have benefited from schematics introducing the technique.

      We thank the reviewer for the positive evaluation of our study and for making this kind suggestion. We have added diagrams that explain the principle of MAGIC and the difference between pMAGIC and nMAGIC in Figure 1 - Figure Supplement 1.

      (2) Traditional mosaic analyses using the FLP/FRT system have strongly utilized heat-shock FLPase for inducible temporal control over mitotic clones, as well as a way to titrate how many clones are induced (e.g., shorter heat shocks will induce fewer clones). This has proven highly valuable, especially for developmental studies. A heat-shock Cas9 is available, and it would have been beneficial to determine the efficiency of inducing MAGIC clones using this Cas9 source.

      We thank the reviewer for suggesting this experiment. We agree that demonstrating inducible clone induction in the adult brain is an effective way for people to compare MAGIC with the MARCM method they are probably more familiar with. We used a heat shock Cas9 developed by the Tzumin Lee group (Chen et al., Development, 2020) to experiment with clone induction, and the results are shown in the new Figure 3 (K and J). We show that, with a pan-neuronal Gal4, heat shock during the wandering 3rd instar larval stage induced more clones than during the pupal stage, and the later heat shock readily produced sparsely labeled neurons whose single-cell morphology can be easily visualized.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The following are some consolidated review remarks after discussions amongst all three reviewers:

      The reviewers feel the evidence level could be raised from 'convincing' to 'compelling' if the following key (and partially shared) suggestions by the reviewers are followed adequately:

      (1) Expand labeling options for nMAGIC, which is currently just a BFP marker. This would increase the utility of the method. A far-red marker would be very helpful. Could the authors just do this for one chromosome arm and make the reagent available for others to generate other chromosome arms?

      We agree with the editor and reviewers that adding a far-red option for nMAGIC increases the flexibility of this method. We replaced the BFP coding sequence in the nMAGIC cloning vector pAC-U63-QtgRNA2.1-tubBFP(HA) with that of miRFP680-T2A-HO1. We then used the resulting cloning vector to make a gRNA-40D2(IFP) transgene and tested it in the wing disc. Result showing clones in the wing disc are now in Figure 1M. The new cloning vector, along with others reported in our study, will be available from Addgene.

      (2) Verify that destabilized GAL80 is potent enough to suppress GAL4. Repeat Figure 1C-E with tub-GAL80-DE-SV40.

      We replaced the experiment using gRNA-42A4-tDES, which successfully achieved complete suppression of pxn>CD4-tdTom (Figure 1E).

      (3) Concern about the health of the induced mitotic clones. This is an important consideration, but the reviewers were not sure what the necessary experiments would be. To gauge twin-spot clone sizes? Please address.

      We agree that clone fitness is an important consideration for MAGIC experiments. To test it, we generated WT clones in the wing imaginal disc using nMAGIC and quantified the sizes of the twin spots (BFP/BFP and WT/WT clones). Our results show that there is no statistical difference between these two types of clones. Thus, there is no intrinsic growth disadvantage to either type of mitotic clones generated by MAGIC.

      (4) Include a schematic of the MAGIC method as Figure 1 or add it to Figure 1. Many may not be familiar with the method, so to promote its adoption, the authors should clearly introduce the MAGIC method in this paper (and not rely on readers to go to previous publications). For this paper to become a MAGIC reference paper, it should be self-contained.

      We thank the reviewers for this suggestion. We have added diagrams that explain the principle of MAGIC and the difference between pMAGIC and nMAGIC in Figure 1 - Figure Supplement 1.

      (5) Determine the utility of using a hs-Cas9 line for temporal induction of MAGIC clones. This is a traditional method for mitotic clone induction (with hsFLP/FRTs), and its use with the MAGIC system (especially pMAGIC) could also make it more attractive, especially to label small populations of neurons born at known times. To this point, the authors could generate pMAGIC clones using hs-Cas9 for commonly used adult target neurons, such as projection neurons, central complex neurons, or mushroom body neurons. The method to label small numbers of these adult neurons is well worked out with known GAL4 lines, and demonstrating that pMAGIC could have similar results would capture the attention of many not familiar with the pMAGIC method.

      We agree that demonstrating inducible clone induction in the adult brain is an effective way for people to compare MAGIC with the MARCM method they are probably more familiar with. We used a heat shock Cas9 developed by the Tzumin Lee group (GarciaMarques, Espinosa-Medina et al. 2020) to experiment with clone induction, and the results are shown in the new Figure 3 (J-K’). We show that, with a pan-neuronal Gal4, heat shock during wandering 3rd instar larval stage induced more clones than during the pupal stage, and the later heat shock readily produced sparsely labeled neurons whose single-cell morphology can be easily visualized.

      Reviewer #1 (Recommendations for the authors):

      This is a marked improvement over the existing methods that the authors' lab has previously generated. It will be a nice addition to the Drosophila genetic tool kit after minor revisions.

      We appreciate the reviewer’s recognition of the new tools we developed.

      Minor issues:

      (1) In the data in Figures 1G and H, it is not ideal to compare the effect of different modifications on two different transgenes. uH and uDEH are compared in gRNA-40D2, whereas uDEH, tDEH, and tDES are compared in gRNA-42A4. If the transgenics are already available, it would be better to compare the uH, uDEH, tDEH, and tDES on either gRNA-40D2 or gRNA-42A4.

      We appreciate the reviewer’s concern. These transgenes were developed during different phases of this project. We first adopted the uDEH design during improvement of gRNA40D2, which solved both the leaky activity of pxn-Gal4 and dim epidermal clones. However, when we tried to expand this design to 2R (such as 42A4), we found that the clones were still too dim (probably due to positional effects). Thus, we next used uDEH in gRNA-42A4 as a base for further improvements. We did not make a uH version for gRNA-42A4 because we already knew that it is inferior to uDEH. Because of this history, we did not have the full set for gRNA42A4.

      Despite the lack of uH for gRNA-42A4, we believe our comparisons of different designs are still valid, given that uH and uDEH were compared with identical sequences elsewhere in the transgenic vector (including the gRNA target sequence) and in the identical insertion site.

      (2) It is not clear whether the authors tested destabilized Gal80 is potent to suppress Gal4 (e.g., in suppressing pxn>CD4-tdTom in hemocytes). The results in Figure 1C-E should be repeated with tub-Gal80-DE-SV40.

      We apologize for omitting the transgene identities in these experiments. We have redone the experiment using gRNA-42A4-tDES and updated the figures to clearly indicate which transgenes were used.

      (3) The difference in sgRNA scaffolds can be better explained in the text. The explanation here is very bare bones and reads like jargon. (i.e., changing F+E gRNA scaffold with gRNA2.1 scaffold is not a sufficient explanation).

      We have added more explanations to the differences between the scaffolds as suggested.

      (4) The stocks should be sent to Bloomington Stock Center to ensure widespread adoption of the method. This includes the Cas9 lines that are generated and used.

      It is our plan to freely share the reagents developed in this study with the community. Most of the fly lines are already available at Bloomington (https://bdsc.indiana.edu/stocks/misc/magic.html and https://bdsc.indiana.edu/stocks/genome_editing/crispr_cas9.html). We are in the process of depositing the remaining ones to BDSC.

      In conclusion, this is a nicely written manuscript that improves currently available tools and should be of interest to the readership of this journal.

      Reviewer #2 (Recommendations for the authors):

      Typos spotted:

      Line 163 issues -> tissues

      Line 613 significance -> significant

      We thank the reviewer for catching these typos. We have corrected them.

      Reviewer #3 (Recommendations for the authors):

      This is a welcome update to the MAGIC system, which is a brilliant method that has not been as widely adopted as it should be. The authors validate and introduce updates to this system to increase clonal efficiency and more robust labeling (for both pMAGIC and nMAGIC). The data presented are robust and convincing.

      We appreciate the reviewer’s positive comments about our study.

      Suggestions to improve the presentation and adoption of this work:

      (1) The MAGIC system might not be well known, and the manuscript would have benefited from an introductory schematic of how the system works. I realize this was already done in the PLoS Biology paper, but the authors should not assume readers will know that paper, or be willing to look it up. So a standalone schematic, as Figure 1, or something added to Figure 1, would greatly aid in understanding how this system works and what the new updates are doing.

      We thank the reviewer for this kind suggestion. We have added diagrams that explain the principle of MAGIC and the difference between pMAGIC and nMAGIC in Figure 1 - figure supplement 1.

      (2) There were many instances where abbreviations were not clearly defined, especially in the Figures and Figure legends. The main text is well-written, and while the information is in there, it is beneficial when the Figures and Figure legends can stand alone. For example:

      (a) Figure 1. DE, not defined in the Figure or Figure legend.

      (b) Figure 1. 'p' and 'n' not defined in the Figure legend.

      (c) The different Cas9 lines or GAL4 lines used-a brief description of their expression patterns might be helpful in the legend. E.g., zk-Cas9, vas-Cas9, gcm-Cas9, R38F11-GAL4, RabX4Gal4.

      We apologize for omitting the details mentioned. They have been added to the figures and figure legends.

      (3) "Traditional" mosaic analyses took advantage of hsFLP for inducible induction and to control the number of mitotic clones that were induced. A hs-Cas9 line does exist (as correctly pointed out by the authors), and it would be a valuable addition if the authors tested the utility of this reagent with the MAGIC system. Many possible adopters may not like the idea that an alwayson Cas9 line is used, which could result in too many clones, especially if one wanted to label very few cells. Granted, one could use a 'worse' gRNA-Marker line as mentioned in the manuscript, but this might still be hard to titrate, as well as an inducible system that uses a heatshock promoter. A hs promoter is especially useful for birthdating cells during development.

      We thank the reviewer for suggesting this experiment. We agree that demonstrating inducible clone induction in the adult brain is an effective way for people to compare MAGIC with the MARCM method they are probably more familiar with. We used a heat shock Cas9 developed by the Tzumin Lee group (Chen et al., Development, 2020) to experiment with clone induction, and the results are shown in the new Figure 3 (K and J). We show that, with a panneuronal Gal4, heat shock during wandering 3rd instar larval stage induced more clones than during the pupal stage, and the later heat shock readily produced sparsely labeled neurons whose single-cell morphology can be easily visualized.

      (4) Lines 61-63. "However, most of these mutant chromosomes cannot be analyzed by traditional mosaic techniques due to the lack of FRT sites or incompatibility with the FRT/Flp system." It might also be worth mentioning that recombining existing reagents (e.g., mutants, etc) onto an FRT chromosome can be labor and time-intensive. A brilliant advantage of MAGIC is that it can be used with any existing stock, such as from classical EMS mutant screens, Df screens (as pointed out), etc. So the more the authors can emphasize a new way of thinking (e.g, you don't need to recombine your mutant of interest onto an FRT stock before you can get started), the better!

      We thank the reviewer for this kind suggestion. As suggested, we have expanded our introduction and discussion to emphasize the advantages of the MAGIC system over traditional mosaic techniques.

      (5) One incredible advantage of the MAGIC system is that it can direct where recombination occurs. So if one had two mutations on a chromosome arm, it could be possible to make the most distal homozygous mutant while the other remains heterozygous. This is not possible with current FRT-based methods. It's not necessary to demonstrate this, but perhaps the authors could mention it as a possible next step? This was somewhat implied by lines 66-67 "In comparison, MAGIC can potentially be used to study these genes because the crossover site in MAGIC can be flexibly defined by users".

      Again, we thank the reviewer for this nice suggestion. We have added this point to the discussion.

      (6) How stable are the MAGIC lines? If gRNA (with Cas9 expressed) induced a germline mutation of the target site, the MAGIC line would break down. How often is this observed? Some mention of this would be appreciated, especially to end users, if caution is necessary and gRNA-marker stocks should not be maintained in the same flies as an x-Cas9 line.

      The reviewer made a very important point. Keeping gRNA and Cas9 in the same strain will risk mutating the target sequence in the germline, if the Cas9 has any activity in the germline. Thus, it is not recommended to keep gRNA and Cas9 in the same flies over multiple generations. For MAGIC experiments, this concern is lessened because by crossing gRNA + Cas9 flies to another strain containing the chromosome of interest, clones can still be induced (possibly with less efficiency) because the chromosome of interest is still cuttable by Cas9. Nevertheless, to address this concern, we have recently developed anti-CRISPR tools to suppress Cas9 activity in such strains. These tools will be reported in a separate study.

      In the revised manuscript, we added this point in Discussion to caution users.

      (7) Line 157, "identify efficient gRNAs for every chromosomal arm.". What is considered "efficient"? Is this quantifiable? Eg., >= 10 clones.

      Thanks for pointing this out! “Efficient” is an arbitrary evaluation, as different experiments may require different efficiencies. But operationally, we consider any gRNA that can generate >= 10 neuronal clones per larva as being efficient. We have clarified it in the text.

      (8) Line 163, "highly packed _issues_ such as the brain"; spelling, should be "tissues"

      Thanks for catching this typo. It has been corrected.

      (9) The authors use ey-Cas9 for their demonstration of adult brain labeling. Additional adult brain examples would increase exposure of this method and attract wider attention- targeting structures that have been well characterized, such as projection neurons (GH146-GAL4), central complex, mushroom bodies, etc. Especially if hs-Cas9 could be utilized to mimic previous MARCM clones (for example).

      We thank the reviewer for suggesting heat shock-induced clones in the adult brain. We have conducted the experiment as explained above and shown in Figure 3J-3K’. We showed a single neuronal clone that resembles lateral horn Leucokinin neurons.

      (10) Line 216, "Despite these advances, existing mutations on FRT-lacking 4th chromosomes still cannot be analyzed by the FRT/Flp system." For context, it might be worth pointing out that meiotic recombination is exceedingly rare on the 4th chromosome, which means it is practically impossible to recombine existing 4th chromosome mutations onto an FRT chromosome.

      We thank the reviewer for this kind suggestion. We have added a note about the difficulty of recombining FRT onto the 4th chromosome.

      (11) Figure 2 legend. What is the full genotype for D and E? eg, what is RabX4>MApHS?

      We apologize for being brief with the details. RabX4-Gal4 is a pan-neuronal driver. UAS-MApHS is a membrane fluorescent marker (UAS-pHluorin-CD4-tdTom). The genotypes have been added to the figure legend.

      (12) It would be good to include the Bloomington Stock numbers for the MAGIC toolkit, especially in Table 1. And include an HTML reference to their MAGIC page at Bloomington

      (https://bdsc.indiana.edu/stocks/misc/magic.html).

      Thank you for this suggestion! We have done as suggested.

      (13) Similarly, the key plasmids to create the improved gRNA-marker insertions should be deposited to Addgene (or similar repository) and their ID numbers included in the resources table.

      The plasmids have been deposited to Addgene and are currently being validated.

      (14) The authors might consider including (perhaps as supplementary to Figure 1 or Figure 2) a crossing scheme for one of their MAGIC experiments. This will make it even clearer how a MAGIC experiment could be set up using existing fly reagents.

      This is a good suggestion! We have added an example crossing scheme in Figure 1 – figure supplement 1C.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Synaptotagmin 1 and Synaptotagmin 7 promote MR1-mediated presentation of Mycobacterium tuberculosis antigens", authored by Kim et al., showed that the calcium-sensing trafficking proteins Synaptotagmin (Syt) 1 and Syt7 specifically promote (are critical for) MAIT cell activation in response to Mtb-infected bronchial epithelial cell line BEAS-2B (Fig. 1) and monocyte-like cell line THP-1 (Figure 3) . This work also showed co-localization of Syt1 and Syt7 with Rab7a and Lamp1, but not with Rab5a (Figure 5). Loss of Syt1 and Syt7 resulted in a larger area of MR1 vesicles (Figure 6f) and an increased number of MR1 vesicles in close proximity to an Auxotrophic Mtb-containing vacuoles during infection (Figure 7ab). Moreover, flow organellometry was used to separate phagosomes from other subcellular fractions and identify enrichment of auxotrophic Mtb-containing vacuoles in fractions 42-50, which were enriched with Lamp1+ vacuoles or phagosomes (Figures 7e-f).

      Strengths:

      This work nicely associated Syt1 and Syt7 with late endocytic compartments and Mtb+ vacuoles. Gene editing of Syt1 and Syt7 loci of bronchial epithelial and monocyte-like cells supported Syt1 and Syt7 facilitated maintaining a normal level of antigen presentation for MAIT cell activation in Mtb infection. Imaging analyses further supported that Syt1 and Syt7 mutants enhanced the overlaps of MR1 with Mtb fluorescence, and the MR1 proximity with Mtb-infected vacuoles, suggesting that Syt1 and Syt7 proteins help antigen presentation in Mtb infection for MAIT activation.

      Weaknesses:

      Additional data are needed to support the conclusion, "identify a novel pathway in which Syt1 and Syt7 facilitate the translocation of MR1 from Mtb-containing vacuoles" and some pieces of other evidence may be seen by some to contradict this conclusion.

      We thank the reviewer for their positive and constructive comments. Because MR1 presents small molecule metabolites, specifically identifying MR1 molecules loaded with antigens derived from intracellular Mtb infection remains a significant technical challenge. Therefore, we agree that some of our approaches measure antigen-loaded MR1 indirectly. For example, IFN-γ release from a MAIT cell clone serves as a sensitive surrogate readout for the presence of antigen-loaded MR1 at the cell surface. This has been demonstrated in previous work showing that IFN-γ release from MAIT cells correlated with loaded MR1 molecules measured using flow cytometry and a TCR based tetramer (Kulicke et al., 2024). In this context, Syt1 and Syt7 represent the first endosomal trafficking proteins we have identified that play a specific role in MR1-mediated presentation of Mtb-derived metabolites. Syt1 and Syt7 do not contribute to the presentation of an exogenously delivered MR1 ligands, such as Ac-6-FP loaded in the ER or M. smegmatis supernatant. In Syt1 and Syt7 knockout cells expressing MR1-GFP, larger MR1 vesicles are observed, but MR1 continues to co-localize with LAMP1 similar to wildtype cells. Furthermore, Syt1 and Syt7 knockout cells exhibit an increased number of MR1 vesicles near the Mtb-containing vacuoles compared to wildtype cells. To increase the statistical power of our microscopy analyses, we have analyzed additional cells. Although the absolute magnitude of the observed effects is modest, T cell activation is highly sensitive to the number of loaded antigen presenting molecules at the cell surface. Also, a complementary approach using flow organellometry confirmed increased MR1 expression within Mtb<sup>+</sup>LAMP1<sup>+</sup> vesicles in Syt7 knockout cells. Thus, these findings suggest a mechanism whereby Syt1 and Syt7 facilitate the trafficking of loaded MR1 molecules from the Mtb-containing vacuoles to the plasma membrane. This specialized mechanism may be analogous to the previously described role of Syt7 in MHC class II trafficking (Becker et al., 2009). In our model, we observed increased accumulation and expression of MR1 within Mtb-containing vacuoles in Syt7 knockout cells.

      Reviewer #2 (Public review):

      Summary:

      The study demonstrates that calcium-sensing trafficking proteins Synaptotagmin (Syt) 1 and Syt7 are involved in the efficient presentation of mycobacterial antigens by MR1 during M. tuberculosis infection. This is achieved by creating antigen-presenting cells in which the Syt1 and Syt7 genes are knocked out. These mutated cell lines show significantly reduced stimulation of MAIT cells, while their stimulation of HLA class I-restricted T cells remains unchanged. Syt1 and Syt7 co-localize in a late endo-lysosomal compartment where MR1 molecules are also located, near M. tuberculosis-containing vacuoles.

      Strengths:

      This work uncovers a new aspect of how mycobacterial antigens generated during infection are presented. The finding that Syt1 and Syt7 are relevant for final MR1 surface expression and presentation to MR1-restricted T cells is novel and adds valuable information to this process. The experiments include all necessary controls and convincingly validate the role of Syt1 and Syt7. Another key point is that these proteins are essential during infection, but they are not significant when an exogenous synthetic antigen is used in the experiments. This emphasizes the importance of studying infection as a physiological context for antigen presentation to MAIT cells. An additional relevant aspect is that the study reveals the existence of different MR1 antigen presentation pathways, which differ from the endoplasmic reticulum or endosomal pathways that are typical for MHC-presented peptides.

      Weaknesses:

      The reduced MAIT cell response observed with Syt1 and Syt7-deficient cell lines is statistically significant but not completely abolished. This may suggest that only some MR1-loaded molecules depend on these two Syt proteins. Further research is needed to determine whether, during persistent M. tuberculosis infection, enough MR1-loaded molecules are produced and transported to the plasma membrane to sufficiently stimulate MAIT cells. The study proposes that other Syt proteins might also play a role, as outlined by the authors. However, exploring potential redundant mechanisms that facilitate MR1 loading with antigens remains a challenging task.

      We appreciate the reviewer’s comments and feedback. Syt1 and Syt7 knockout cells do not completely abolish MR1-mediated presentation of Mtb-derived metabolites. We agree that the likely explanation is that there are redundancies within the antigen presentation pathways. Whether these redundancies are due to other endosomal trafficking proteins or other intracellular compartments where MR1 loading can occur remains unknown. Moreover, Mtb-derived antigens can access the ER, where Syt1 and Syt7 are not involved, thereby enabling an ER-mediated pathway for MR1 antigen presentation. It is also important to note that relatively few (<10) loaded MHC class I molecules are sufficient to trigger T cell activation (Brower et al., 1994; Sykulev et al., 1995; Sykulev et al., 1996). A major challenge in exploring these mechanisms is due to the inability to directly track small molecule Mtb-derived antigens as they are loaded onto MR1 and presented at the cell surface. These hurdles are briefly outlined in the discussion as future directions. Nonetheless, Syt1 and Syt7 are the first endosomal trafficking proteins identified to have a specific effect on MR1-mediated presentation of Mtb-derived antigens.

      Reviewer #3 (Public review):

      Summary:

      In the submitted manuscript, the authors investigate the role of Synaptotagmins (Syt1) and (Syt7) in MR1 presentation of MtB.

      Strengths:

      In the first series of experiments, the authors determined that knocking down Syt1 and Sy7 in antigenpresenting cells decreases IFN-γ production following cellular infection with Mtb. These experiments are well performed and controlled.

      Weaknesses:

      Next, they aim to mechanistically investigate how Syt1 and Syt7 affect MtB presentation. In particular, they focus on MR1, a non-classical MHC-I molecule known to present endogenous and exogenous metabolites, including MtB metabolites. Results from these next series of experiments are less clear. Firstly, they show that knocking down Syt1 and Sy7 does not change MtB phagocytosis as well as MR1 ER-plasma membrane translocation. Based on this, they suggest that Syt1 and Syt7 may affect MR1 trafficking in endosomal compartments. However, neither subcellular compartment analysis nor flow organelleometry clearly establishes the role of Syt1 and Syt7 in MtB trafficking. Altogether, the notion that Synaptotagmins facilitate MR1 interaction with Mtb-containing compartments and its vesicular transport was already known. As such, the manuscript should add additional insight on where/how the interaction occurs. The reviewer is left with the notion that Syt1 and Sy7 may affect MR1 presentation, facilitating the trafficking of MR1 vesicles from endosomal compartments to either the cell surface or other endosomal compartments. The analysis is observational and additional data or discussion could address what the insight gained beyond what is already known from the literature.

      We thank Reviewer 3 for their comments. Our hypothesis is that Syt1 and Syt7 mediate MR1 trafficking rather than Mtb trafficking. While Syt7 has previously been implicated in MHC class II trafficking and vesicular transport, this study is the first to explore in detail the roles of Syt1 and Syt7 in MR1-mediated presentation of Mtb-derived metabolites. Since current technologies do not allow direct tracking of Mtbderived antigens loaded onto MR1, we relied on complementary approaches including IFN-γ release from MAIT cells, flow cytometry, fluorescence microscopy, and flow organelleometry. Both flow organelleometry and fluorescence microscopy show increased MR1 expression at Mtb-containing vacuoles in Syt7 knockout cells. Since total MR1 expression measured by flow cytometry and the overall number of MR1 vesicles remain unchanged, these data support a mechanism in which Syt7 facilitates the trafficking of antigen-loaded MR1 from Mtb-containing vacuoles to the cell surface, consistent with the observed reduction in MAIT cell IFN-γ release.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Concern 1, the data in the current manuscript have not been sufficient to "identify a novel pathway in which Syt1 and Syt7 facilitate the translocation of MR1 from Mtb-containing vacuoles, potentially to the cell surface for antigen presentation" (Last part of Abstract). To conclude this, additional pieces of data are needed: (a) Mtb-containing vacuoles associate with MR1 protein expression; (b) MR1+ vesicles traffic from one subcellular location to another; (c) Syt1 or Syt7 KO reduces MR1 vesicles at a downstream subcellular location, e.g., the cell surface. Important evidence supporting the "facilitation of translocation" is missing on whether Syt1 or Syt7 KO reduces MR1 vesicle traffic from one location to another.

      We thank the reviewer for their detailed suggestions to improve our proposed model. We would like to clarify that Figure 7g demonstrates increased MR1 protein expression in Syt7 knockout cells, as assessed by flow organellometry. This approach allowed us to specifically distinguish AuxMtb<sup>+</sup>LAMP1<sup>+</sup> compartments (Mtb-containing vacuoles) and to quantify MR1 expression using geometric mean fluorescence intensity. Moreover, in both Syt1 and Syt7 knockout cells, MR1+ vesicles are retained within lysosomal compartments, characterized by vesicle enlargement and accumulation. Therefore, we did not observe trafficking of MR1+ vesicles to other subcellular locations or to the plasma membrane. A key limitation, however, is the lack of current technologies that allow direct measurement of MR1 surface expression specifically during intracellular Mtb infection via flow cytometry. Given this limitation, IFN-γ ELISpot is a sensitive surrogate and supports the conclusion that loss of Syt1 and Syt7 results in decreased MR1 presentation of Mtb-derived antigens at the plasma membrane.

      The results "a significant increase in the number of MR1 vesicles within 1 μm of AuxMtb for Syt1 (1.13 {plus minus} 0.46) and Syt7 KO (1.31 {plus minus} 0.46) cells compared to WT cells (Fig.7b)." and "the surface of MR1 vesicles in Syt1 and Syt7 KO cells showed a 3-fold increase in overlap area with Mtb surfaces (Fig.7d)." may need to be further elaborated on whether MR1+vacuoles and Mtb+ vacuoles are overlapped or are adjacent. Figure 7b shows several groups of vacuoles with the same distance. This needs a larger sample size to randomize this distance measurement, for example, calculating 50~100 Mtb+ vacuoles.

      We appreciate the reviewer’s critical comments and suggestions. To quantify distance and surface overlap, the microscopy images were acquired from a single optical plane rather than full z-stacks. As a result, it is not possible to definitively determine whether MR1+ vesicles and Mtb-containing vacuoles are directly overlapping or adjacent. In response to the reviewer’s suggestion, we increased the sample size for both distance (n=51-53) and surface overlap analyses (n=51-53). Using the larger sample size, we observed a significant increase in the number of MR1 vesicles located within 1μm of AuxMtb in both Syt1 (1.23±0.21) and Syt7 knockout (1.28±0.22) cells. Also, there was an approximately 4-fold increase in MR1-Mtb surface overlap area compared to wildtype cells.

      Results from "performed flow organellometry to separate phagosomes from other subcellular fractions and identified enrichment of Mtb-containing vacuoles in fractions 42-50 (Fig.7e-f)" could not distinguish the difference between WT and Syt1/Syt7 KO, or further support the role of Syt1/Syt7 in endocytic trafficking. More specifically, authors claimed that "enhanced MR1 expression in Mtb+LAMP1+ compartments via flow organellometry in Syt1 and Syt7 KO cells.", may not be supported by Figure 7f, which does not show a difference in MR1 expression between Syt1 KO or Syt7 KO and WT.

      We appreciate the reviewer’s concerns and would like to clarify the interpretation of Figures 7f and 7g. Figure 7f demonstrates: (a) enrichment Mtb-containing vacuoles within fractions 42-50, (b) coenrichment of LAMP1+ vesicles within these Mtb-containing fractions, and (c) comparable subcellular fractionation profiles across wildtype, Syt1 knockout, and Syt7 knockout cells, indicating no major differences in fraction distribution. Differences in MR1 expression are shown in Figure 7g, which compares MR1 expression as the geometric mean fluorescence intensity within the fraction exhibiting the highest percentage of AuxMtb<sup>+</sup>LAMP1<sup>+</sup> across all fractions. We observed significant increase in MR1 expression in Syt7 knockout cells compared to wildtype cells.

      Concern 2, in abstract, "Loss of Syt1 and Syt7 results in enlarged MR1 vesicles and an increased number of MR1 vesicles in close proximity to Mtb-containing vacuoles during infection.". Although numbers of MR1 vesicles within 1um of Mtb increase (Figure 7b) and areas of MR1+ vacuoles for WT and KO cells enhance (Figure 6f), but numbers of MR1 vesicles/cells are not different between WT and Syt1 and Sy7 KO (Fig. 7c). These imaging analyses, including other figure panels, need more explicit presentation of (most if not all) random images for calculation, annotation of MR1-vacuoles for calculation, and raw statistical data for mean and p value calculation. These raw data can be presented in supplemental figure panels.

      We thank the reviewer for these suggestions. We have included more details on randomization, technical procedures, and statistical analyses in methods section for “Fluorescence Microscopy,” “Image Analysis,” and “Statistical Analysis.” Raw data collection and statistical data are presented in the supplemental data.

      Concern 3, additional evidence that does not support the conclusion "This study identifies a novel pathway in which Syt1 and Syt7 facilitate the translocation of MR1 from Mtb-containing vacuoles" (the last part of Abstract). This additional unsupportive evidence includes: (a) MR1 expression on the cell surface is not impacted or not different among WT, Syt1 KO, and Syt7 KO of BEAS-2B cells (Fig. 6d). (b) "Live-cell imaging showed no differences in MR1 cellular distribution in the presence or absence of Ac-6FP between WT, Syt1, and Syt7 KO BEAS-2B:TET-MR1GFP cells as MR1 translocated from the ER and vesicles to the cell surface as expected (Figure 6c).

      We thank the reviewer for this comment and would like to clarify our use of Ac-6-FP. Figures 6c and 6d examine MR1 cellular distribution and surface expression in the presence or absence of Ac-6-FP. Ac-6-FP is a small MR1 ligand that is loaded in the ER and promotes MR1 surface stabilization and trafficking to the cell membrane. In contrast, Mtb primarily resides within membrane-bound phagosomes. MR1 presentations of soluble/exogenously delivered ligands versus intracellular Mtb-derived antigens have shown to involve distinct pathways and endosomal trafficking proteins (Harriff et al., 2016; Karamooz et al., 2019; Karamooz et al., 2025). Findings from Figures 6c and 6d show that Syt1 and Syt7 do not contribute to the presentation of small soluble and ER-loaded ligands such as Ac-6-FP. Instead, they specifically contribute in MR1 presentation of Mtb-derived metabolites by translocating MR1 from Mtbcontaining vacuoles in the context of intracellular Mtb infection

      Other concerns:

      (1) Figure 1a uses Ct value to measure Syt1 and Syt7 expression levels, but a comparison with GAPDH Ct cycle numbers in different cell types will be helpful for understanding.

      We appreciate the reviewer’s suggestion of including GADPH Ct cycle numbers. We have revised Figure 1a to show Ct values for Syt1, Syt7, and GAPDH in both BEAS-2B and THP-1 cells.

      (2) Figure 1b indel, shown with an ICE method, should be confirmed with protein expression levels to interpret functional results.

      We thank the reviewer for raising this concern. We attempted to assess protein levels by western blot using multiple antibodies from both Abcam and Synaptic Systems. However, we were unable to identify a suitable antibody that reliably detected endogenous Syt1 or Syt7 protein levels.

      (3) Figure 1c. HLA-B45-restricted T cell clones also show some marginal reduction of IFN-γ spot responses and are more different in Figure 6b. Please discuss this conflicting data. Also, need a reference to support whether the exogenous CFP peptide antigen is presented via surface or endocytic antigen loading.

      We agree with the reviewer that there are some marginal reductions of IFN-γ responses for HLA-B45restricted T cell clones. Since T cell clones are used from frozen, there can be differences in maximal responses between T cell clones and expansions of the same T cell clone. However, the comparisons include a control arm and pool data from multiple experiments to reach statistical power and validity. In addition, Figure 6b shows Syt1 and Syt7 KO cells in the background of BEAS-2B MR1KO:tetMR1-GFP clone D4 cells, which overexpresses MR1 that may contribute to variability and potentially account for the observed differences. With respect to exogenous CFP peptide loading, earlier studies on peptides and antigen presenting cells demonstrated that peptides can be loaded onto fixed cells and subsequently presented to T cells (Shimonkevitz et al., 1983; Watts et al., 1985). Based on these findings, it is reasonable to assume that substantial peptide exchange occurs at the cell surface when exogenous peptides are added to antigen presenting cells.

      (4) Figure 2e: Delta CT values of Syt1, Syt7 in WT, KO cells can be shown together with Ct values of GAPDH or B2m house-keeping genes to help readers determine the efficiency of Syt1 and 7 mutation at the gene expression level. Also, in Figure 4a, the baseline of Ct values for GAPDH can be plotted together.

      As suggested by the reviewer, we have revised Figure 2e and 4a to include CT values for the genes of interest as well as housekeeping gene GAPDH.

      (5) Figure 3c and Figure 1d: M.smeg infection can be shown to be more comparable with Mtb infection.

      We thank the reviewer for this thoughtful comment. Although M. smegmatis infection could serve as a comparable control, M. smegmatis secretes large amounts of MR1 ligands derived from riboflavin metabolism. This makes it difficult to distinguish between extracellular and intracellular antigens, and to directly compare with Mtb infection, which is specifically an intracellular infection model.

      (6) Figure 4e: It appears Esyt2 Knockdown shows strong inhibition of MAIT activation mediated by BEAS2B cells with Mtb infection and M.smeg supernatant stimulation. Please add other relevant data, such as MR1 cell surface expression and colocalization, and discuss these results with Syt proteins.

      We appreciate the reviewer’s suggestion to include relevant data for Esyt2 knockdown. We performed flow cytometry analysis of Esyt2 knockdown cells and found surface MR1 expression under basal conditions. Treatment with Ac-6-FP resulted in increased MR1 surface stabilization, but MR1 surface level was significantly lower than those observed in missense control cells. Therefore, Esyt2 is not specific to MR1 presentation of Mtb-derived metabolites and instead may play a broader role in overall MR1 antigen presentation, including intracellular Mtb-derived antigens, exogenous antigens, and ER-loaded Ac-6-FP.

      (7) Figure 5 colocalization computational analyses can be more explicitly presented regarding randomization, technical procedures, and statistical analyses, as stated in Concern 2.

      As suggested, we have included more details in methods section and added the supplemental data.

      (8) Figure 6a: Syt1 and Syt7 protein expressions are also suggested to confirm the mutation, similar to the confirmation for Figures 1 and 3.

      We thank the reviewer for raising this concern. As discussed previously, we have not identified a suitable antibody for human Syt1 and Syt7. We have tested multiple antibodies from Abcam and Synaptic Systems.

      (9) For statistical analyses, "non-linear regression analysis comparing best-fit values of top and EC50 were used to calculate p-values by extra sum-of-squares F test" (Figure 6b) and "non-linear regression analysis of pairwise comparison to WT on best-fit values of top and EC50 were used to calculate p-values by extra sum-of-squares F test." (Figure 3bc), readers may need more specific demonstration in supplemental figures on how statistical analyses have been performed.

      We appreciate the reviewer’s suggestion to include more detailed information regarding the statistical analyses. For clarification, data presented in Figures 6b and 3bc were analyzed using the same statistical analysis in Prism 10. Specifically, nonlinear regression (curve fit) was performed using the [Agonist] vs. response model with three parameters. Best-fit values for the top and EC<sub>50</sub> parameters were compared using an extra sum-of-squares F test.No constraints were applied to the bottom and top parameters, and the EC<sub>50</sub> parameter was constrained to be greater than 0 for p-value calculation. We have revised the Statistical Analysis section of the Methods to more clearly describe this approach.

      (10) In discussion, the background section for Syt1 and Syt7 is more appropriate to be in the introduction. This will allow readers to better understand the association of Syt proteins with MR1 and the necessity to study the impact of Syt on MR1 trafficking.

      We thank the reviewer for this suggestion. We believe that the basic background and relevance of Syt1 and Syt7 in MR1 trafficking are covered in the introduction; however, we have added details to help readers understand their impact.

      Reviewer #2 (Recommendations for the authors):

      This reviewer has no requests for implementation and congratulates the authors on this nice piece of work.

      We thank the reviewer for the positive comments.

      Reviewer #3 (Recommendations for the authors):

      Complete trafficking experiments to pinpoint the trafficking relationship between Syt 1 and 7 and MR1 in MtB infection.

      We appreciate the reviewer’s insightful comment. As this study represents the first detailed investigation into the roles of Syt1 and Syt7 in MR1-mediated presentation of Mtb-derived metabolites, we agree that a fully resolved trafficking mechanism has not yet been established. A major limitation is the inability to directly track Mtb-derived antigens as they are loaded onto MR1 and trafficked to the cell surface. Therefore, we relied on complementary functional and microscopy-based approaches, including IFN-γ ELISpot assays, flow cytometry, fluorescence microscopy, and flow organellometry, to infer the trafficking relationships between Syt1, Syt7, and MR1 during intracellular Mtb infection. Our data support a model that Syt1 and Syt7 facilitates the trafficking of MR1 from Mtb-containing vacuoles to the plasma membrane. This interpretation is supported with the increased accumulation of MR1 in Mtb-containing vacuoles and reduction in MAIT cell IFN-γ release observed in Syt1 and Syt7 knockout cells.

      References

      (1) Becker, S. M., Delamarre, L., Mellman, I., & Andrews, N. W. (2009). Differential role of the Ca(2+) sensor synaptotagmin VII in macrophages and dendritic cells. Immunobiology, 214(7), 495–505.

      (2) Brower, R. C., England, R., Takeshita, T., Kozlowski, S., Margulies, D. H., Berzofsky, J. A., & Delisi, C. (1994). Minimal requirements for peptide-mediated activation of CD8+ CTL. Molecular immunology, 31(16), 1285–1293.

      (3) Harriff, M. J., Karamooz, E., Burr, A., Grant, W. F., Canfield, E. T., Sorensen, M. L., Moita, L. F., & Lewinsohn, D. M. (2016). Endosomal MR1 Trafficking Plays a Key Role in Presentation of Mycobacterium tuberculosis Ligands to MAIT Cells. PLoS pathogens, 12(3), e1005524.

      (4) Karamooz, E., Harriff, M. J., Narayanan, G. A., Worley, A., & Lewinsohn, D. M. (2019). MR1 recycling and blockade of endosomal trafficking reveal distinguishable antigen presentation pathways between Mycobacterium tuberculosis infection and exogenously delivered antigens. Scientific reports, 9(1), 4797.

      (5) Karamooz, E., Kim, S. J., Peterson, J. C., Tammen, A. E., Soma, S., Soll, A. C. R., Meermeier, E. W., Khuzwayo, S., & Lewinsohn, D. M. (2025). Two-pore channels in MR1-dependent presentation of Mycobacterium tuberculosis infection. PLoS pathogens, 21(8), e1013342.

      (6) Kulicke, C. A., Swarbrick, G. M., Ladd, N. A., Cansler, M., Null, M., Worley, A., Lemon, C., Ahmed, T., Bennett, J., Lust, T. N., Heisler, C. M., Huber, M. E., Krawic, J. R., Ankley, L. M., McBride, S. K., Tafesse, F. G., Olive, A. J., Hildebrand, W. H., Lewinsohn, D. A., Adams, E. J., … Harriff, M. J. (2024). Delivery of loaded MR1 monomer results in efficient ligand exchange to host MR1 and subsequent MR1T cell activation. Communications biology, 7(1), 228.

      (7) Shimonkevitz, R., Kappler, J., Marrack, P., & Grey, H. (1983). Antigen recognition by H-2restricted T cells. I. Cell-free antigen processing. The Journal of Experimental Medicine, 158(2), 303–316.

      (8) Sykulev, Y., Cohen, R. J., & Eisen, H. N. (1995). The law of mass action governs antigen-stimulated cytolytic activity of CD8+ cytotoxic T lymphocytes. Proceedings of the National Academy of Sciences of the United States of America, 92(26), 11990–11992.

      (9) Sykulev, Y., Joo, M., Vturina, I., Tsomides, T. J., & Eisen, H. N. (1996). Evidence that a single peptide-MHC complex on a target cell can elicit a cytolytic T cell response. Immunity, 4(6), 565– 571.

      (10) Watts, T. H., Gariépy, J., Schoolnik, G. K., & McConnell, H. M. (1985). T-cell activation by peptide antigen: effect of peptide sequence and method of antigen presentation. Proceedings of the National Academy of Sciences of the United States of America, 82(16), 5480–5484.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable work investigates the role of protein N-glycosylation in regulating T-cell activation and function and suggests that B4GALT1 is a potential target for tumor immunotherapy. The strength of evidence is solid, and further mechanistic validation could be provided.

      We sincerely thank the editor and reviewers for their time and constructive feedback. Your recognition of our work is much appreciated. We clarify our mechanistic studies as stated below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Yu et al investigated the role of protein N-glycosylation in regulating T-cell activation and functions is an interesting work. By using genome-wide CRISPR/Cas9 screenings, the authors found that B4GALT1 deficiency could activate expression of PD-1 and enhance functions of CD8+ T cells both in vitro and in vivo, suggesting the important roles of protein N-glycosylation in regulating functions of CD8+ T cells, which indicates that B4GALT1 is a potential target for tumor immunotherapy.

      Strengths:

      The strengths of this study are the findings of novel function of B4GALT1 deficiency in CD8 T cells.

      Weaknesses:

      However, authors did not directly demonstrate that B4GALT1 deficiency regulates the interaction between TCR and CD8, as well as functional outcomes of this interaction, such as TCR signaling enhancements.

      We are very sorry that we did not highlight our results in Fig. 5f-h enough. In those figures, we demonstrated the interaction between TCR and CD8 increased significantly in B4GALT1 deficient T-cells, by FRET assays. To confirm the important role of TCR-CD8 interaction in mediating the functions of B4GALT1 in regulating T-cell functions, such as in vitro killing of target cells, we artificially tethered TCR and CD8 by a CD8β-CD3ε fusion protein and tested its functions in both WT and B4GALT1 knockout CD8<sup>+</sup> T-cell. Our results demonstrate that such fusion protein could bypass the effect of B4GALT1 knockout in CD8<sup>+</sup> T-cells (Fig. 5g-h). Together with the results that B4GALT1 directly regulates the galactosylation of TCR and CD8, those results strongly support the model that B4GALT1 modulates T-cell functions mainly by galactosylations of TCR and CD8 that interfere their interaction.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors identify the N-glycosylation factor B4GALT1 as an important regulator of CD8 T-cell function.

      Strengths:

      (1) The use of complementary ex vivo and in vivo CRISPR screens is commendable and provides a useful dataset for future studies of CD8 T-cell biology.

      (2) The authors perform multiple untargeted analyses (RNAseq, glycoproteomics) to hone their model on how B4GALT1 functions in CD8 T-cell activation.

      (3) B4GALT1 is shown to be important in both in vitro T-cell killing assays and a mouse model of tumor control, reinforcing the authors' claims.

      Weaknesses:

      (1) The authors did not verify the efficiency of knockout in their single-gene KO lines.

      Thank reviewer for reminding. We verified the efficiency of some gRNAs by T7E1 assay. We will add those data in supplementary results in revised version later.

      (2) As B4GALT1 is a general N-glycosylation factor, the phenotypes the authors observe could formally be attributable to indirect effects on glycosylation of other proteins.

      Please see response to reviewer #1.

      (3) The specific N-glycosylation sites of TCR and CD8 are not identified, and would be helpful for site-specific mutational analysis to further the authors' model.

      Thank reviewer for suggestion! Unfortunately, there are multiple-sites of TCR and CD8 involved in N-glycosylation (https://glycosmos.org/glycomeatlas). We worry that mutations of all these sites may not only affect glycosylation of TCR and CD8 but also other essential functions of those proteins.

      (4) The study could benefit from further in vivo experiments testing the role of B4GALT1 in other physiological contexts relevant to CD8 T cells, for example, autoimmune disease or infectious disease.

      Thank reviewer for this great suggestion to expand the roles of B4GALT1 in autoimmune and infection diseases. However, since in current manuscript we are mainly focusing on tumor immunology, we think we should leave these studies for future works.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study by Yu et al investigated the role of protein N-glycosylation in regulating T-cell activation and functions is an interesting work. By using genome-wide CRISPR/Cas9 screenings, the authors found that B4GALT1 deficiency could activate expression of PD-1 and enhance functions of CD8+ T cells both in vitro and in vivo, suggesting the important roles of protein N-glycosylation in regulating functions of CD8+ T cells, which indicates that B4GALT1 is a potential target for tumor immunotherapy. However, authors need to directly demonstrate that B4GALT1 deficiency regulates the interaction between TCR and CD8, as well as functional outcomes of this interaction, such as TCR signaling enhancements. In addition, blocking PD1 has been shown to enhance antitumor effect, whereas the presented data in this study suggest that the activation of PD1 expression in the condition of B4GALT1 deficiency in T cells enhanced antitumor effect. How to reconcile this discrepancy? Finally, several minor questions need to be addressed to strengthen the conclusions in this manuscript.

      (1) We used a FRET (Fluorescence Resonance Energy Transfer) assay to measure interaction between TCR and CD8. FRET signals of TCR-CD8 increased significantly in B4GALT1 deficient T-cells, compared with control cells (Fig. 5f). For functional outcomes of this interaction, we observed enhanced T-cell killing activities in B4GALT1 deficient CD8<sup>+</sup> T-cells (Fig. 3f and Fig. 5h).

      To confirm whether reduced TCR-CD8 interaction is the major cause of TCR activation phenotypes in B4GALT1 knockout CD8<sup>+</sup> T-cells, we generated a construct in which we fused the CD8b ectodomain (ECD) with CD3e to artificially tether TCR with CD8 (Fig.5g). Overexpression of such CD8β-CD3ε fusion led to enhanced in vitro killing activities in control wild-type CD8<sup>+</sup> T-cells. On the other hand, in B4GALT1 deficient CD8<sup>+</sup>T-cells, such enhanced T-cell killing activities by fusion construct was significantly diminished (Fig.5h), suggesting it bypassed the regulation by B4GALT1.

      (2) PD-1 is both an early T-cell activation marker upon TCR activation and a T-exhausted marker under consecutive or repeated stimulations. In our screenings, PD-1 was used as an early activation marker for T-cells.

      We have clarified this in new Discussion section.

      (1) The present data relies on statistical graphs (e.g., bar and line charts) for all data, excluding the bioinformatics analysis. Including data such as flow cytometry plots, photomicrographs, or immunohistochemistry staining images will provide more direct support for the conclusions.

      Thank the reviewer for valuable suggestions! We added original flow cytometry gating strategies for Cas9 screening sorting (Fig. S1a), TIL analysis (Fig.S5), and FRET assay (Fig. S8) in revised version to provide more direct support for our conclusions.

      (2) To further validate the enhanced tumor infiltration phenotype resulting from B4GALT1 knockout, the following data would strengthen the manuscript:

      (a) Flow cytometric analysis of TILs or immunofluorescence data from tumor sections.

      Thank the reviewer for valuable suggestion! We added original flow cytometry gating strategies for TILs in Fig. S5 in revised version.

      (b) Assessment of in vivo T cell proliferation, for example, by tracking changes in the proportion of CD8+ T cells in the peripheral blood over time.

      We analyzed in vivo T-cell proliferation within tumor by CFSE (carboxyfluorescein succinimidyl ester) analysis. As shown in Fig. S6b, 6 days after infusion, B4GALT1 knockout OT-I T-cell showed increased proliferation within tumors, comparing with wild type control OT-I cells.

      (c) Evaluation of the proliferation and activation status of OT-1 CD8+ T cells specifically in the draining lymph nodes of the mouse model.

      Thank the reviewer for valuable suggestion! We plan to perform this experiment in the future.

      (3) The authors provide evidence that B4GALT1 knockout enhances CD8+ T cell function in both mouse models and human TCR-T cells (in vitro). Definitive support for the translational potential of this strategy would come from showing that B4GALT1-knockout human TCR-T cells also mediate potent in vivo function (NSG tumor-bearing model may be a better choice).

      Thank the reviewer for valuable suggestion! We are going to perform those experiments in the future. However, we do not expect that in vitro and in vivo (NSG mice) experiments will show much different results, which may also not add too much for current manuscript.

      (4) It would be preferable to include data on T cell activation and effector function (e.g., flow cytometry for IL-2, TNF-α, and IFN-γ, or ELISPOT) following stimulation with an OVA-specific peptide or co-culturing of OVA-expressing tumor cells with B4GALT1-knockout OT-1 CD8 T cells, especially the changes in the TILs compared with the non-targeting control group.

      Following co-culturing of B16-OVA tumor cells with B4GALT1-knockout or wild-type OT-I CD8<sup>+</sup> T-cells, the RNA levels and secretion levels of TNFα and IFNγ were detected by RT-qPCR and ELISA, respectively (Fig. 3c). B4GALT1-deficient OT-I T-cells showed increased expression of T-cell activation and cytotoxic markers such as IFNγ and TNFα.

      (5) What is the correlation between the expression of B4GALT1, PD-1, and TCR activation markers at various time points during a long-term T cell co-culture with tumor cells?

      Thanks for the reviewer for valuable suggestion! We don’t have this data now. While we agree that exploring this might be interesting, we think it falls outside the scope of the current study.

      (6) In line 136: Regarding the genetic targeting of B4GALT1 in T cells, it is unclear whether single or multiple gRNAs were used and if potential off-target effects were assessed. To fully validate the model, it would be important to clarify these strategies, and it is essential to include data on the knockout efficiency at both the protein (e.g., Western blot) and mRNA levels.

      We are sorry about the unclear statements for gene knockout strategy. In current study, single sgRNAs were used in all experiments for gene knockout. B4galt1 sg2 was used in Fig. 3a. Both B4galt1 sg1 and sg2 were used in Fig. S1d. We clarified this in each figure legend in revised version.

      The phenotypes of B4galt1 knockout T-cells could be rescued by overexpression of either a short or long isoform of mouse B4galt1 cDNA (Fig. 3b), indicating that potential off-target effects could be excluded.

      The sgRNA knockout efficiencies were confirmed by T7E1 assay in revised version (Fig. S2). Regrettably, anti-mouse B4galt1 antibody didn’t work in western blot.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Rationale for excluding clades G and H and clarification of clade definitions

      We appreciate this important request for clarification. In the revised manuscript, we now explicitly state (Methods, Tree generation) that the phylogenetic framework used in this study follows the clade definitions established by Techtmann et al. (Front. Microbiol. 2012, 3, 132), which classify [NiFe]-CODHs into clades based on high supporting values in nodes (bootstrap >75). We deem Techtmann et al.’s work as best lead, since their approach with two different types of trees (ML vs. Bayesian) gives solid support to this classification of clades. We ourselves did not perform Bayesian statistics, instead we used the known clades from literature to assign ours.

      Clades G and H were not deliberately excluded from downstream genomic-context and operon analyses. They were excluded by our pipeline, because their data did not fulfil our initial quality assessments, such as: host classified down to species level (https://github.com/boehmax/protein-per-organism), and protein exists in the IPG database of NCBI (https://github.com/boehmax/protein-to-genome).

      Clade G and H are both represented by only a very small number of sequences, most of which derive from fragmented or poorly annotated genomes, preventing reliable assessment of operon organization and gene neighbourhood conservation. As a result, inclusion of these clades would not allow statistically meaningful or biologically interpretable comparisons with clades A–F.

      To improve transparency, we have added a brief explanation of these limitations in the Results (Results, Neighbor analysis).

      (2) Presentation and interpretation of co-occurrence data

      We agree that the presentation of the co-occurrence data required improvement. In the revised supplementary material, we now include a table in the long format that might be easier to interpret than a matrix representation as seen in Fig. 3B.

      We have also revised the Results text to more precisely reflect the numerical trends. Specifically, we clarify that clade D shows co-occurrence with clades A, E and F, while clade C only displays co-occurrence with clade E. The statement that clades C and D “more often co-occur” has been removed and rephrased to avoid overgeneralization and to better align with the quantitative data shown in Figure 3B and the supplementary table (Results, Co-occurrence and Correlation).

      (3) Rationale for operon-level rather than organism-level analysis

      We thank the reviewer for highlighting this conceptual point. In the revised manuscript, we now explicitly state that our analysis was conducted at the operon level because individual genomes frequently encode multiple CODH operons that are phylogenetically and functionally distinct. Treating each operon as an independent functional unit allows us to capture this intra-genomic diversity and to associate specific gene neighbourhoods with individual CODH clades. We furthermore discuss in the introduction explicitly technical reasons why we decided to limit this study to the operon level for more transparency.

      Nevertheless, we acknowledge that this approach may overlook higher-level regulatory or physiological interactions among multiple CODHs encoded within the same genome. This limitation is now discussed explicitly, and we acknowledge that operon-level analysis should be a complementary, not exhaustive, framework for functional inference.

      Reviewer #2 (Public review):

      We thank Reviewer #2 for their positive assessment of the conceptual clarity and methodological utility of our approach, as well as for their thoughtful discussion of its limitations.

      Regarding incomplete genome assemblies, limited representation of class II HCPs, and potential omission of distal pathway components, we agree fully. We stress that our conclusions are probabilistic and hypothesis-generating rather than definitive functional assignments.

      In response to the concern about reproducibility of the visual filtering step, we have added a more explicit description (Methods, Data collection and refinement) of the criteria used to exclude non-CODH homologs (e.g., absence of conserved active-site motifs, unknown folds predicted with AlphaFold3, extremely long tree branches). This clarification improves transparency and facilitates replication of the analysis.

      Finally, we concur that extrapolating enzymatic activity or inactivity from a limited number of characterized representatives should be done cautiously. We have revised the wording throughout the manuscript to further temper such generalizations and to frame our interpretations explicitly as predictions that require experimental validation.

      Once again, we thank both reviewers for their constructive feedback, which has significantly improved the clarity, rigor, and transparency of the manuscript. We believe that the revisions address all concerns raised and strengthen the overall contribution of this work.

      Recommendation from authors:

      Reviewer #1 (Recommendations for the authors):

      All suggested editorial and stylistic corrections were implemented. These include refinements to the wording in the Abstract, grammatical corrections, streamlined phrasing, standardized figure callouts and supplementary file references, corrected abbreviations, and consistent formatting of references and author names. The only exception concerns the suggested change from MetCODH to MtCODH. We have retained MetCODH, as this abbreviation is well established in the literature for the Methanothermobacter thermophila CODH and is commonly used in prior studies (e.g., https://doi.org/10.1073/pnas.2410995121 ). MtCODH has historically been referring to CODH from Neomoorella thermoacetica (previously Moorella thermoacetica, hence the abbreviation Mt). We chose to rename that to NtCODH but to avoid confusion, keep MetCODH for Methanothermobacter thermophila.

      Reviewer #2 (Recommendations for the authors):

      We likewise addressed the majority of recommendations. We now report the versions of all software tools and databases used, standardized capitalization and naming of software and platforms (e.g., GitHub, eggNOG), clarified the BLAST implementation and database employed, and added direct repository links for custom scripts in both the Methods section and the bibliography. Overall grammatical consistency and formatting were improved throughout the manuscript. In addition, the criteria and procedure used for visual inspection to remove non-CODH sequences are now described more explicitly to enhance reproducibility, and several methodological sections were streamlined as suggested. Minor textual redundancies were removed, and phrasing was simplified where appropriate.

      Figure legends and formatting were revised to improve clarity and consistency. Adjustments to color usage and font consistency were made where feasible to enhance readability. The color scheme in Figure 1 was adjusted as suggested, and darker shades were chosen for clade H and G. This change was also implemented in the Supplementary File 9_Tree5. Figure 3A was retained, as it provides information on the frequency of multiple CODHs from the same clade within genomes, which cannot be inferred from the probability matrix shown in Figure 3B; together, these panels offer complementary insights. We adjusted the figure caption to make this clearer. We increased the visibility of data points in Figure 4B. To allow inclusion of the full dataset we did not collapse the x-axis as suggested. Figure 4C was retained in its original format to emphasize the characteristic operon “fingerprints” of each CODH clade, which is a central focus of this work. A table is supplied in Supplementary File 2, which allows data exploration with the preferred focus of the reader.

      A small number of suggestions were therefore not implemented exactly as proposed, primarily where alternative revisions were judged to better preserve clarity or analytical intent. These decisions are minor and do not affect the conclusions or reproducibility of the study.

      Overall, we believe that these revisions have substantially improved the manuscript’s readability, transparency, and technical rigor, and we thank the reviewers again for their careful and constructive feedback.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This study compares four models - VALOR (dynamic visual-text alignment), CLIP (static visual-text alignment), AlexNet (vision-only), and WordNet (text-only) - in their ability to predict human brain responses using voxel-wise encoding modeling. The results show that VALOR not only achieves the highest accuracy in predicting neural responses but also generalizes more effectively to novel datasets. In addition, VALOR captures meaningful semantic dimensions across the cortical surface and demonstrates impressive predictive power for brain responses elicited by future events.

      Strengths:

      The study leverages a multimodal machine learning model to investigate how the human brain aligns visual and textual information. Overall, the manuscript is logically organized, clearly written, and easy to follow. The results well support the main conclusions of the paper.

      (1) My primary concern is that the performance difference between VALOR and CLIP is not sufficiently explained. Both models are trained using contrastive learning on visual and textual inputs, yet CLIP performs significantly worse. The authors suggest that this may be due to VALOR being trained on dynamic movie data while CLIP is trained on static images. However, this explanation remains speculative. More in-depth discussion is needed on the architectural and inductive biases of the two models, and how these may contribute to their differences in modeling brain responses.

      Thank you for this thoughtful comment. We agree that attributing VALOR’s advantage over CLIP solely to ‘dynamic (video) versus static (image) pretraining’ would be incomplete, and that the architectural and inductive biases of the two models are central to understanding the observed performance gap.

      Both VALOR and CLIP use contrastive learning to align visual and textual representations, but they differ in several key inductive biases that are particularly relevant for modeling brain responses during continuous movie viewing. First, VALOR is trained to align temporally extended video segments with text, introducing an explicit temporal integration window that aggregates information across consecutive frames. This encourages representations that maintain context, stabilize semantics across time, and encode event-level structure. Second, VALOR’s alignment operates at the level of multi-second narrative units, rather than isolated visual snapshots, biasing the model toward representations that are sensitive to unfolding events and cross-frame consistency.

      In contrast, CLIP processes frames independently and aligns single static images with text. As a result, it lacks an intrinsic mechanism for temporal binding, context accumulation, or event-level representation. While CLIP can capture rich visual–semantic associations at the image level, it is less well suited to represent higher-order temporal structure, which is known to strongly drive responses in association cortex during naturalistic narrative perception.

      We therefore interpret VALOR’s superior encoding performance as reflecting not only exposure to dynamic audiovisual data, but also inductive biases—temporal integration and event-level alignment—that more closely match how the brain integrates information over time during movie watching. We have revised the Discussion (p. 16) to articulate these architectural and representational differences explicitly, rather than attributing the effect solely to training data modality.

      (On page 16) “Additionally, VALOR exceeds the performance of CLIP, a leading static multimodal model, as its training objective aligns multi-second video–text units, enforcing a temporal integration window and event-level semantics that maintain cross-frame consistency and narrative context, whereas CLIP’s image-level alignment provides no intrinsic mechanism for such temporal continuity.”

      (2) The methods section lacks clarity regarding which layers of VALOR and CLIP were used to extract features for voxel-wise encoding modeling. A more detailed methodological description is necessary to ensure reproducibility and interpretability. Furthermore, discussion of the inductive biases inherent in these models-and their implications for brain alignment - is crucial.

      Thank you for this comment. We agree that reproducibility and interpretability require precise specification of which model representations were used for voxel-wise encoding, as well as clearer discussion of the inductive biases inherent in these models and their implications for brain alignment.

      In the revised Methods, we now explicitly specify the feature sources for both models. For CLIP (ViT-B/32), we use the final pooled image embedding after projection into the shared image–text space, extracted frame-by-frame; one representative frame is sampled per TR, and its projected embedding serves as the regressor. For VALOR, we use the final joint video–text projection head, yielding a 512-dimensional embedding computed at the segment/TR level that integrates information across consecutive frames and aligns each multi-second video segment with its associated text. These procedures are now described step-by-step in the Methods (p. 21).

      In addition, we expanded the Discussion (p. 16) to explicitly articulate the models’ inductive biases and their relevance for brain alignment. In particular, we contrast CLIP’s image-level, framewise alignment—which lacks intrinsic temporal integration—with VALOR’s event-level, temporally extended video–text alignment, which biases representations toward context maintenance and narrative continuity. This distinction helps explain why the two models differ in their ability to predict neural responses during continuous movie viewing.

      (Methods, On page 21)

      “(1) Video–text alignment features (VALOR): To extract video-based multimodal features, we used VALOR (VALOR-large checkpoint), an open-source pretrained video–text alignment model24. VALOR combines visual encoders (CLIP and Video Swin Transformer) for extracting visual features and a text encoder (BERT) for extracting textual features 23,51,52. These representations are aligned in a shared embedding space through contrastive learning. We segmented each movie at the TR level and, for each segment, extracted VALOR’s projected video–text embedding from the final projection head of the alignment module to obtain a 512-dimensional feature vector. These embeddings were then time-aligned to the corresponding BOLD responses.

      (2) CLIP features: To compare with static image-based multimodal models, we utilized CLIP (ViT-B/32), which aligns visual and textual representations through contrastive learning but processes individual frames independently without capturing temporal context. One video frame was sampled per TR, and the pooled image embedding after CLIP’s projection into the shared image–text space was extracted to obtain a 512-dimensional feature vector. These TR-aligned vectors were used directly as regressors in the voxel-wise encoding models.”

      (Discussion, On page 16)

      “Additionally, VALOR exceeds the performance of CLIP, a leading static multimodal model, as its training objective aligns multi-second video–text units, enforcing a temporal integration window and event-level semantics that maintain cross-frame consistency and narrative context, whereas CLIP’s image-level alignment provides no intrinsic mechanism for such temporal continuity. More broadly, this difference reflects distinct inductive biases in how the two models represent visual–linguistic information. CLIP is optimized for framewise image–text correspondence, encouraging representations that emphasize instantaneous visual semantics but remain agnostic to temporal structure. In contrast, VALOR is explicitly biased toward aggregating information over multiple consecutive frames and aligning representations at the level of temporally extended events. These inductive biases favor context maintenance, semantic stabilization, and narrative coherence over time, which are known to be critical for driving responses in association cortex during continuous movie perception.”

      (3) A broader question remains insufficiently addressed: what is the purpose of visual-text alignment in the human brain? One hypothesis is that it supports the formation of abstract semantic representations that rely on no specific input modality. While VALOR performs well in voxel-wise encoding, it is unclear whether this necessarily indicates the emergence of such abstract semantics. The authors are encouraged to discuss how the computational architecture of VALOR may reflect this alignment mechanism and what implications it has for understanding brain function.

      Thank you for this important conceptual question. We agree that improved voxel-wise encoding performance does not, by itself, imply the emergence of fully amodal or modality-independent semantic representations in the brain. In the revision, we therefore avoid framing our findings as evidence for abstract amodal semantics and instead clarify a more constrained interpretation.

      Specifically, we suggest that visual–text alignment may support the stabilization and coordination of scene-level meaning across modalities and over time, rather than the formation of modality-free semantic codes. From this perspective, VALOR’s advantage reflects inductive biases that promote (i) integration of visual information over multi-second windows and (ii) alignment of temporally extended visual events with linguistic descriptions, yielding representations that are more temporally stable, context-sensitive, and constrained by language.

      We therefore interpret VALOR’s superior encoding performance as identifying cortical regions whose responses are better captured by temporally stabilized, cross-modal representations, rather than as evidence that these regions encode fully abstract semantics independent of input modality. We have expanded the Discussion (p. 16) to articulate this interpretation and to clarify the implications of video–text alignment for understanding how the brain integrates perception and language during naturalistic cognition.

      (On page 16) “Together, the relative gains over AlexNet (purely visual), WordNet (manual semantic annotation), and CLIP (static image–text alignment) indicate cortical systems whose responses are best captured by multi-second, multimodal integration, and highlight regions that accumulate and stabilize narrative context over time. At the same time, these findings do not imply that visual–text alignment in the brain gives rise to fully amodal, modality-independent semantic representations. Instead, we suggest that alignment between visual and linguistic signals may serve to stabilize and coordinate scene-level meaning across modalities and over time. From this perspective, VALOR’s architecture—by integrating visual information over multi-second windows and aligning temporally extended video segments with language—provides a computational proxy for how the brain may use linguistic constraints to organize, disambiguate, and maintain coherent representations of unfolding events. The observed encoding gains therefore highlight regions engaged in temporally stabilized, cross-modal integration during naturalistic perception, rather than providing evidence for abstract semantic codes divorced from sensory input.”

      (4) The current methods section does not provide enough details about the network architectures, parameter settings, or whether pretrained models were used. If so, please provide links to the pretrained models to facilitate reproducible science.

      We appreciate this comment and agree that our original description of model sources and implementation details was not sufficiently explicit. These details are essential for both reproducibility and interpretability. We have now made these specifications explicit in the revised Methods.

      In particular, we now state for each model:

      VALOR. We use the publicly released pretrained VALOR-large checkpoint. For each movie segment, we extract the joint video–text projection head output (512-D) that encodes the aligned segment-level audiovisual semantics. We report the checkpoint source, the segment duration (in frames/seconds), and how these segment-level embeddings are temporally aligned to TRs for voxel-wise encoding.

      CLIP (ViT-B/32). We use the standard pretrained CLIP weights. For each video frame, we extract the final pooled image representation after projection into CLIP’s shared image–text embedding space (512-D). We also clarify that one representative frame is sampled and aligned to each TR, and that these projected embeddings are used as regressors in the encoding model.

      AlexNet. We use the ImageNet-pretrained AlexNet. We take activations from conv5, and then apply PCA to reduce them to 512 dimensions before mapping them to the fMRI time series.

      For each model, the revised Methods now specify: the pretrained source/checkpoint, the layer or head from which features were taken, output dimensionality, any preprocessing or dimensionality reduction, and the temporal alignment procedure used to generate TR-level regressors. These revisions appear in the updated Methods (page 21).

      (On page 21) “(1) Video–text alignment features (VALOR): To extract video-based multimodal features, we used VALOR (VALOR-large checkpoint), an open-source pretrained video–text alignment model24. VALOR combines visual encoders (CLIP and Video Swin Transformer) for extracting visual features and a text encoder (BERT) for extracting textual features 23,51,52. These representations are aligned in a shared embedding space through contrastive learning. We segmented each movie at the TR level and, for each segment, extracted VALOR’s projected video–text embedding from the final projection head of the alignment module to obtain a 512-dimensional feature vector. These embeddings were then time-aligned to the corresponding BOLD responses.

      (2) P features: To compare with static image-based multimodal models, we utilized CLIP (ViT-B/32), which aligns visual and textual representations through contrastive learning but processes individual frames independently without capturing temporal context. One video frame was sampled per TR, and the pooled image embedding after CLIP’s projection into the shared image–text space was extracted to obtain a 512-dimensional feature vector. These TR-aligned vectors were used directly as regressors in the voxel-wise encoding models.

      (3) AlexNet features: Visual features were extracted by sampling frames at the TR level and processing them with AlexNet, an eight-layer convolutional neural network comprising five convolutional layers followed by three fully connected layers. Features from all five convolutional layers were evaluated in preliminary analyses; the fifth convolutional layer showed the best performance and was used in subsequent analyses. Intra-image z-score normalization was applied to reduce amplitude effects. Principal component analysis (PCA) was used to reduce dimensionality, retaining the top 512 components to match the dimensionality of multimodal feature spaces. This pipeline was implemented using the DNNBrain toolkit 53.

      (4) WordNet features: Semantic features were obtained from publicly available WordNet annotations provided with the HCP dataset (7T_movie_resources/WordNetFeatures.hdf5), following the procedure of Huth et al. (2012). Each second of the movie clips was manually annotated with WordNet categories according to predefined guidelines: (a) identifying clear objects and actions in the scene; (b) labeling categories that dominated for more than half of the segment duration; and (c) using specific category labels rather than general ones. A semantic feature matrix was constructed with rows corresponding to time points and columns to semantic categories, with category presence coded as binary values. More specific categories from the WordNet hierarchy were added to each labeled category, yielding a total of 859 semantic features. These features were used directly as regressors. We also evaluated a PCA-reduced 512-dimensional variant (fit within each training fold to avoid leakage); because this version performed slightly worse, we report results from the full 859-dimensional representation in the main text. For the generalization analysis in Study 2, annotations for the SFM dataset were aligned to the same WordNet category space to ensure consistency.”

      Reviewer #2 (Public review):

      Fu and colleagues have shown that VALOR, a model of multimodal and dynamic stimulus features, better predicts brain responses compared to unimodal or static models such as AlexNet, WordNet, or CLIP. The authors demonstrated the robustness of their findings by generalizing encoding results to an external dataset. They demonstrated the models' practical benefit by showing that semantic mappings were comparable to another model that required labor-intensive manual annotation. Finally, the authors showed that the model reveals predictive coding mechanisms of the brain, which held a meaningful relationship with individuals' fluid intelligence measures.

      Strengths:

      Recent advances in neural network models that extract visual, linguistic, and semantic features from real-world stimuli have enabled neuroscientists to build encoding models that predict brain responses from these features. Higher prediction accuracy indicates greater explained variance in neural activity, and therefore a better model of brain function. Commonly used models include AlexNet for visual features, WordNet for audio-semantic features, and CLIP for visuo-semantic features; these served as comparison models in the study. Building on this line of work, the authors developed an encoding model using VALOR, which captures the multimodal and dynamic nature of real-world stimuli. VALOR outperformed the comparison models in predicting brain responses. It also recapitulated known semantic mappings and revealed evidence of predictive processing in the brain. These findings support VALOR as a strong candidate model of brain function.

      (1) The authors argue that this modeling contributes to a better understanding of how the brain works. However, upon reading, I am less convinced about how VALOR's superior performance over other models tells us more about the brain. VALOR is a better model of the audiovisual stimulus because it processes multimodal and dynamic stimuli compared to other unimodal or static models. If the model better captures real-world stimuli, then I almost feel that it has to better capture brain responses, assuming that the brain is a system that is optimized to process multimodal and dynamic inputs from the real world. The authors could strengthen the manuscript if the significance of their encoding model findings were better explained.

      We thank the reviewer for this thoughtful comment and agree with the premise that a model preserving multimodal and temporal structure might a priori be expected to better predict brain responses to naturalistic stimuli. Our intent is not to claim that higher accuracy alone explains brain function, but rather that where and how VALOR improves prediction provides diagnostic insight into cortical processing. We have revised the Discussion to make this distinction explicit.

      Specifically, we clarify three ways in which VALOR’s gains are scientifically informative rather than merely unsurprising:

      (1) Anatomical specificity of improvement. VALOR’s advantage is not uniform across the cortex; gains are largest in regions implicated in multi-second, cross-modal integration. This spatial pattern constrains where the brain accumulates information over time and stabilizes visual representations using linguistic context.

      (2) Model as a computational probe. Beyond prediction accuracy, VALOR’s feature space recovers large-scale semantic organization without manual annotation and enables targeted tests of predictive processing. Features reflecting upcoming content selectively improve fits in specific regions, consistent with anticipatory coding during continuous narrative perception.

      (3) Link to individual differences. Individuals whose neural responses are better captured by anticipatory features show higher fluid intelligence, suggesting that VALOR indexes meaningful variability in forward-looking representations rather than merely tracking stimulus complexity.

      Accordingly, we have revised the Discussion (p. 16) to frame VALOR as a tool for mapping cortical integration profiles, probing semantic and predictive structure, and linking representational dynamics to cognition, rather than asserting that higher encoding accuracy alone explains brain function.

      (On page 16) “Together, the relative gains over AlexNet (purely visual), WordNet (manual semantic annotation), and CLIP (static image–text alignment) indicate cortical systems whose responses are best captured by multi-second, multimodal integration, and highlight regions that accumulate and stabilize narrative context over time.”

      (2) In Study 3, the authors show high alignment between WordNet and VALOR feature PCs. Upon reading the method together with Figure 3, I suspect that the alignment almost has to be high, given that the authors projected VALOR features to the Huth et al.'s PC space. Could the authors conduct non-parametric permutation tests, such as shuffling the VALOR features prior to mapping onto Huth et al.'s PC space, and then calculating the Jaccard scores? I imagine that the null distribution would be positively shifted. Still, I would be convinced if the alignment is higher than this shifted null distribution for each PC. If my understanding of this is incorrect, I suggest editing the relevant Method section (line 508) because this analysis was not easy to understand.

      Thank you for this helpful comment and for pointing out a potential source of confusion. We apologize that the original Methods description was not sufficiently clear. Importantly, VALOR features were never projected into the Huth et al. PC space, and no optimization or rotation toward the WordNet basis occurred at any stage.

      The analysis proceeded as follows:

      (1) VALOR PCs. We first fit voxel-wise encoding models using VALOR features on the Huth et al. dataset. We then applied PCA to the resulting cortical weight maps, yielding spatial components (‘VALOR PCs’) that summarize shared patterns of VALOR feature weights across the cortex.

      (2) WordNet PCs. We used the semantic principal components reported by Huth et al. (2012) directly as published, with no refitting, projection, or modification using VALOR.

      (3) Correspondence analysis. Only after obtaining these two independent sets of cortical maps did we threshold each to their top-loading vertices and compute Jaccard overlap between VALOR PCs and WordNet PCs.

      Although a permutation that shuffles VALOR features prior to projection addresses a scenario that does not apply here, we agree that the Methods description should more clearly convey the independence of the two decompositions. We have therefore revised the Methods (p. 24) to describe the procedure step-by-step and explicitly state that no projection, refitting, or optimization toward the WordNet basis was performed.

      (On page 24) “We first fit voxel-wise encoding models using VALOR features for each of the five participants in the Huth et al. dataset. For each participant, this yielded a weight map linking each VALOR feature to each voxel. We then stacked these weight maps across participants to form a single voxel-by-feature weight matrix and applied principal component analysis (PCA). The top four principal components from this analysis (“VALOR PCs”) captured shared spatial patterns of VALOR feature weights across cortex. To interpret these components, we projected VALOR feature vectors from >20,000 video segments in the VALOR training set onto each VALOR PC, which revealed dominant semantic axes (e.g., mobility, sociality, civilization). For comparison, we used the semantic principal components reported by Huth et al. (2012) from their WordNet-based encoding model; these “WordNet PCs” were taken directly from the published work and were not refit or reweighted using VALOR.”

      (3) In Study 4, the authors show that individuals whose superior parietal gyrus (SPG) exhibited high prediction distance had high fluid cognitive scores (Figure 4C). I had a hard time believing that this was a hypothesis-driven analysis. The authors motivate the analysis that "SPG and PCu have been strongly linked to fluid intelligence (line 304)". Did the authors conduct two analyses only-SPG-fluid intelligence and PCu-fluid intelligence-without relating other brain regions to other individual differences measures? Even if so, the authors should have reported the same r-value and p-value for PCu-fluid intelligence. If SPG-fluid intelligence indeed holds specificity in terms of statistical significance compared to all possible scenarios that were tested, is this rationally an expected result, and could the authors explain the specificity? Also, the authors should explain why they considered fluid intelligence to be the proxy of one's ability to anticipate upcoming scenes during movie watching. I would have understood the rationale better if the authors had at least aggregated predictive scores for all brain regions that held significance into one summary statistic and found a significant correlation with the fluid intelligence measure.

      We thank the reviewer for this careful and constructive comment and agree that greater transparency about analytic intent, specificity, and rationale is needed. We have revised the manuscript accordingly.

      (1) Analytic scope and a priori restriction. The analysis in Fig. 4C was hypothesis-driven and restricted a priori to two regions — superior parietal gyrus (SPG) and precuneus (PCu) — based on convergent evidence linking frontoparietal and medial parietal systems to fluid reasoning, relational integration, and domain-general cognitive control. Importantly, we did not conduct a whole-brain search across regions or behaviors to identify the strongest correlation post hoc.

      (2) Specificity and reporting. In response to the reviewer’s request, we now report the full results for both hypothesized regions. Prediction horizon in SPG showed a statistically reliable association with fluid intelligence, whereas PCu showed a positive but weaker trend that did not survive correction. Reporting both results makes the regional specificity explicit rather than implicit.

      (3) Why SPG over PCu? Although both regions are implicated in fluid cognition, SPG has been more consistently linked to active maintenance and manipulation of relational structure and top-down attentional control, whereas PCu is more often associated with internally oriented and mnemonic processes. We therefore interpret the stronger SPG association as consistent with a role for sustained, externally driven predictive processing during continuous perception, rather than as evidence of exclusivity.

      (4) Why fluid intelligence? We do not equate fluid intelligence with “anticipation” per se. Rather, we used gF as an a priori proxy for domain-general capacities — maintaining and updating relational context over multi-second windows, integrating multiple constraints, and exerting flexible control — that are plausibly recruited when anticipating upcoming events during naturalistic narratives. The reported relationship is associative and hypothesis-consistent, not causal.

      (5) Why not aggregate across regions? We agree that aggregation could reveal more global relationships; however, our goal in this analysis was to test whether predictive timescales in theoretically motivated control regions relate to individual differences, rather than to maximize correlation by pooling heterogeneous regions. We now clarify this rationale in the Results.

      These clarifications and additional statistics have been incorporated in the revised Results section (p. 14).

      (On page 14) “Finally, we examined whether prediction horizons were linked to individual differences in cognition. We focused on fluid intelligence (gF) because gF is widely taken to index domain-general capacities such as maintaining and updating relational context over several seconds, integrating multiple constraints, and exerting flexible top-down control — functions that should support anticipating what will happen next in a continuous narrative. We targeted two parietal regions, the SPG and the PCu, which have both been repeatedly linked to gF and high-level cognitive control in the individual-differences literature 36,37. For each participant, we correlated fluid cognition scores with that participant’s average prediction horizon in each region. As shown in Fig. 4c, individuals with longer prediction horizons in SPG showed higher fluid cognition scores (SPG: r = 0.172, FDR-corrected p = 0.047). PCu showed a similar positive trend (PCu: r = 0.111, FDR-corrected p = 0.146) but did not reach significance. These associations suggest that the ability to sustain a longer predictive timescale during naturalistic perception co-varies with broader fluid cognitive capacity. No additional brain regions or behavioral measures were examined in this analysis.”

      Reviewer #3 (Public review):

      In this work, the authors aim to improve neural encoding models for naturalistic video stimuli by integrating temporally aligned multimodal features derived from a deep learning model (VALOR) to predict fMRI responses during movie viewing.

      Strengths:

      The major strength of the study lies in its systematic comparison across unimodal and multimodal models using large-scale, high-resolution fMRI datasets. The VALOR model demonstrates improved predictive accuracy and cross-dataset generalization. The model also reveals inherent semantic dimensions of cortical organization and can be used to evaluate the integration timescale of predictive coding.

      This study demonstrates the utility of modern multimodal pretrained models for improving brain encoding in naturalistic contexts. While not conceptually novel, the application is technically sound, and the data and modeling pipeline may serve as a valuable benchmark for future studies.

      (1) Lines 95-96: The authors claim that "cortical areas share a common space," citing references [22-24]. However, these references primarily support the notion that different modalities or representations can be aligned in a common embedding space from a modeling perspective, rather than providing direct evidence that cortical areas themselves are aligned in a shared neural representational space.

      We thank the reviewer for this important clarification. We agree that the cited works do not provide direct evidence that cortical areas themselves are aligned in a single neural representational space. Rather, they demonstrate that representations derived from different modalities can be mapped into a shared embedding space from a modeling and computational perspective.

      We have therefore revised the text to avoid overstatement and to more precisely reflect what these studies support. In the revised manuscript (p. 4), we now frame the claim in terms of a shared representational framework or feature space used for modeling, rather than implying that cortical areas themselves intrinsically share a unified neural space. This clarification aligns the conceptual claim with the scope of the cited literature.

      (On page 4) “As a result, researchers are turning to multimodal deep learning, which learns from visual, linguistic, and auditory streams to model complex brain functions. This trend is supported by neuroscience evidence that cortical responses across regions can be jointly modeled within a common representational space.”

      (2) The authors discuss semantic annotation as if it is still a critical component of encoding models. However, recent advances in AI-based encoding methods rely on features derived from large-scale pretrained models (e.g., CLIP, GPT), which automatically capture semantic structure without requiring explicit annotation. While the manuscript does not systematically address this transition, it is important to clarify that the use of such pretrained models is now standard in the field and should not be positioned as an innovation of the present work. Additionally, the citation of Huth et al. (2012, Neuron) to justify the use of WordNet-based annotation omits the important methodological shift in Huth et al. (2016, Nature), which moved away from manual semantic labeling altogether. Since the 2012 dataset is used primarily to enable comparison in study 3, the emphasis should not be placed on reiterating the disadvantages of semantic annotation, which have already been addressed in prior work. Instead, the manuscript's strength lies in its direct comparison between data-driven feature representations and semantic annotation based on WordNet categories. The authors should place greater emphasis on analyzing and discussing the differences revealed by these two approaches, rather than focusing mainly on the general advantage of automated semantic mapping.

      Thank you for this thoughtful and constructive comment. We agree with the reviewer that the field has largely transitioned away from manual semantic annotation toward features derived from large-scale pretrained models (e.g., CLIP, GPT-style architectures), and that this shift is now standard rather than a novelty of the present work.

      We have revised the manuscript to clarify this positioning. Our goal is not to claim automated semantic extraction as an innovation, but rather to demonstrate how a multimodal, temporally informed video–text model can be used as a direct feature space for voxel-wise encoding of naturalistic movie fMRI data. VALOR is used as a representative example of this broader class of pretrained models, and our emphasis is on the general modeling approach rather than on promoting a specific architecture.

      We also agree that our original discussion underemphasized the important methodological shift introduced in Huth et al. (2016, Nature), which moved away from manual semantic labeling in the context of continuous spoken narratives. We now explicitly acknowledge this work and clarify that our use of WordNet-based annotations from Huth et al. (2012) serves a different purpose: it provides an interpretable, historically grounded benchmark for comparison in Study 3, rather than a claim that semantic annotation remains necessary or state-of-the-art.

      In response to the reviewer’s suggestion, we have revised the Results (p.10) and Discussion (p.18) to place greater emphasis on what is revealed by directly comparing data-driven multimodal features with category-based semantic annotation under matched conditions. Specifically, we focus on how these two approaches converge at the level of large-scale semantic organization while differing in their flexibility, temporal resolution, and dependence on human-defined categories. These revisions better reflect the current state of the field and sharpen the manuscript’s central contribution as a principled comparison between modeling approaches, rather than a general argument for automated semantic mapping.

      (On page 10) “Study 3: Comparing data-driven multimodal representations with category-based semantic annotation

      A central question in naturalistic encoding is how data-driven feature representations derived from pretrained models relate to more interpretable, category-based semantic annotations that have historically been used to study cortical semantic organization. Although recent work has shown that pretrained language and vision–language models can capture semantic structure without explicit annotation, category-based approaches such as WordNet remain valuable as interpretable reference frameworks. Here, we leverage the WordNet-based semantic components reported by Huth et al. (2012) 5 not as a state-of-the-art alternative, but as a historically grounded benchmark, allowing a controlled comparison between data-driven multimodal representations and manually defined semantic categories under matched naturalistic movie stimuli.”

      (On page 18) “Study 3 demonstrates the utility of video–text alignment models for probing higher-order semantic representations during naturalistic perception. Our comparison between VALOR-derived representations and WordNet-based semantic components highlights an important distinction between data-driven and category-based approaches to modeling meaning in the brain. While multimodal pretrained models offer flexible, high-dimensional representations that capture semantic structure without explicit annotation, category-based frameworks provide interpretability and theoretical anchoring 4,48. Using WordNet-based labeling from prior work as an interpretable reference point, we show that VALOR automatically extracts semantic dimensions—including mobility, sociality, and civilization—that closely mirror those identified using manual semantic categories (Fig. 3). The observed alignment between VALOR PCs and WordNet semantic components suggests that large-scale semantic organization emerges consistently across these approaches, even though they differ in how semantic structure is defined and learned. This convergence supports the use of pretrained multimodal models as practical encoding tools for naturalistic stimuli, while also underscoring the continued value of interpretable semantic benchmarks for understanding which aspects of meaning are represented across cortex. We do not argue that semantic annotation is required for modern encoding models; rather, WordNet-based features serve here as a historically grounded and interpretable reference for contextualizing data-driven multimodal representations.”

      (3) The authors use subject-specific encoding models trained on the HCP dataset to predict group-level mean responses in an independent in-house dataset. While this analysis is framed as testing model generalization, it is important to clarify that it is not assessing traditional out-of-distribution (OOD) generalization, where the same subject is tested on novel stimuli, but rather evaluating which encoding model's feature space contains more stimulus-specific and cross-subject-consistent information that can transfer across datasets.

      We thank the reviewer for this helpful clarification and agree that the type of generalization tested here should be described more precisely. Our analysis does not assess classical within-subject out-of-distribution (OOD) generalization, in which the same individual is tested on novel stimuli.

      Instead, for each HCP participant we train a subject-specific encoding model and transfer it to predict group-mean responses in an independent in-house dataset collected at a different site, with different participants, different movies, and different acquisition conditions. This design evaluates which encoding model’s feature space contains stimulus-locked representations that are consistent across individuals and robust to changes in dataset and experimental context, rather than within-subject stimulus novelty per se.

      We have revised the Results (p. 10) and Discussion section (p. 17) to explicitly describe this analysis as a test of cross-subject and cross-dataset transferability of stimulus representations, and to clarify the distinction from traditional OOD generalization.

      (On Page 10) “Although this analysis is not a classical within-subject out-of-distribution generalization test, it evaluates the extent to which different feature spaces capture stimulus-locked representations that are consistent across subjects and transferable across datasets, stimuli, and acquisition environments.”

      (On Page 17) “By contrast, VALOR exhibited stronger generalization in a cross-cohort, cross-stimulus, and cross-site transfer evaluation.”

      (4) Within this setup, the finding that VALOR outperforms CLIP, AlexNet, and WordNet is somewhat expected. VALOR encodes rich spatiotemporal information from videos, making it more aligned with movie-based neural responses. CLIP and AlexNet are static image-based models and thus lack temporal context, while WordNet only provides coarse categorical labels with no stimulus-specific detail. Therefore, the results primarily reflect the advantage of temporally-aware features in capturing shared neural dynamics, rather than revealing surprising model generalization. A direct comparison to pure video-based models, such as Video Swin Transformers or other more recent video models, would help strengthen the argument.

      We thank the reviewer for this baseline-focused comment and agree that, in naturalistic movie paradigms, a temporally structured audiovisual model would be expected to outperform static or unimodal feature spaces. Our intent in this comparison is therefore not to claim a surprising advantage, but to isolate which inductive biases matter for cross-dataset transfer of movie-evoked neural responses.

      The baseline models were chosen deliberately to span feature spaces that are widely used and interpretable in cognitive neuroscience: AlexNet (vision-only, frame-based), WordNet (human-defined semantic categories without learned visual features), and CLIP (static image–text alignment without temporal context). Comparing VALOR against these established baselines under matched preprocessing, TR alignment, and dimensionality control allows us to attribute performance differences specifically to temporal integration and audiovisual alignment, rather than to generic model capacity.

      We agree that a direct comparison with purely visual spatiotemporal encoders (e.g., Video Swin or TimeSformer-style models) would further dissociate the contribution of temporal visual processing from cross-modal video–text alignment. We now explicitly note this as an important direction for future work and frame VALOR as one representative of a broader class of multimodal video models, rather than as a uniquely optimal solution (Discussion, p. 16).

      (On page 16) “Second, we did not directly compare VALOR to state-of-the-art video-only spatiotemporal models (e.g., Video Swin Transformer, VideoMAE, and related architectures) that are designed to capture temporal visual structure without language grounding; such comparisons will be important for isolating the specific contributions of temporal visual processing versus cross-modal video–text alignment in naturalistic neural responses.”

      (5) Moreover, while WordNet-based encoding models perform reasonably well within-subject in the HCP dataset, their generalization to group-level responses in the Short Fun Movies (SFM) dataset is markedly poorer. This could indicate that these models capture a considerable amount of subject-specific variance, which fails to translate to consistent group-level activity. This observation highlights the importance of distinguishing between encoding models that capture stimulus-driven representations and those that overfit to individual heterogeneities.

      Thank you for this thoughtful observation. We agree with the reviewer’s interpretation. In our analyses, WordNet-based models perform reasonably well when fit and evaluated within individual HCP participants, but their performance degrades substantially when transferred to predict group-averaged responses in the independent SFM dataset. This dissociation suggests that, while WordNet annotations capture meaningful variance at the individual level, a larger fraction of that variance may be subject-specific or idiosyncratic, and therefore does not translate into consistent, stimulus-locked responses at the group level.

      One motivation for our cross-dataset, cross-subject evaluation is precisely to distinguish encoding models that primarily capture shared stimulus-driven structure from those whose apparent performance depends more strongly on individual heterogeneity. In this context, the reduced transferability of WordNet-based models highlights a potential limitation of category-based semantic features for capturing population-consistent neural dynamics during naturalistic viewing.

      We note that this effect likely reflects multiple factors rather than a single failure mode, including differences in annotation schemes, labeling granularity, and semantic coverage across datasets. By contrast, video–text models provide time-aligned linguistic features directly from the stimulus itself, reducing reliance on dataset-specific human annotation and exhibiting stronger transfer across cohorts. We have clarified this interpretation in the revised Discussion (p. 17).

      (Page 17) “Together, these findings underscore the importance of distinguishing encoding models that primarily capture shared, stimulus-driven neural structure from those whose performance relies more heavily on subject-specific heterogeneity, particularly when evaluating generalization across participants and datasets.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the Methods section, please clarify which specific layer of VALOR the 512-dimensional feature vector was extracted from.

      Thank you for this suggestion. We have revised the Methods to state explicitly that the 512-dimensional feature vector is extracted from VALOR’s joint video–text projection head, i.e., the final projection layer of the contrastive alignment module that maps video and text representations into a shared embedding space. We also clarify that these 512-D embeddings are computed at the segment/TR level and then time-aligned to the BOLD signal (Methods, p. 21).

      (On page 21) “We segmented each movie at the TR level and, for each segment, extracted VALOR’s projected video–text embedding from the final projection head of the alignment module to obtain a 512-dimensional feature vector. These embeddings were then time-aligned to the corresponding BOLD responses.”

      (2) It would be helpful to include more detailed descriptions of the network architectures and parameters for all models used.

      Thank you for the suggestion. We have revised the Methods to include model-specific subsections for all feature spaces used (VALOR, CLIP, AlexNet, and WordNet). For each model, we now explicitly report (i) the backbone architecture and training objective, (ii) the exact feature source (layer or projection head) and output dimensionality, and (iii) how features were temporally aligned to the BOLD signal. All models were used with their publicly released pretrained parameters, without additional fine-tuning. These additions are intended to improve transparency and reproducibility (Methods, p. 21).

      (On page 21) “Movie Feature Extraction

      (1) Video–text alignment features (VALOR): To extract video-based multimodal features, we used VALOR (VALOR-large checkpoint), an open-source pretrained video–text alignment model24. VALOR combines visual encoders (CLIP and Video Swin Transformer) for extracting visual features and a text encoder (BERT) for extracting textual features 23,51,52. These representations are aligned in a shared embedding space through contrastive learning. We segmented each movie at the TR level and, for each segment, extracted VALOR’s projected video–text embedding from the final projection head of the alignment module to obtain a 512-dimensional feature vector. These embeddings were then time-aligned to the corresponding BOLD responses.

      (2) CLIP features: To compare with static image-based multimodal models, we utilized CLIP (ViT-B/32), which aligns visual and textual representations through contrastive learning but processes individual frames independently without capturing temporal context. One video frame was sampled per TR, and the pooled image embedding after CLIP’s projection into the shared image–text space was extracted to obtain a 512-dimensional feature vector. These TR-aligned vectors were used directly as regressors in the voxel-wise encoding models.

      (3) AlexNet features: Visual features were extracted by sampling frames at the TR level and processing them with AlexNet, an eight-layer convolutional neural network comprising five convolutional layers followed by three fully connected layers. Features from all five convolutional layers were evaluated in preliminary analyses; the fifth convolutional layer showed the best performance and was used in subsequent analyses. Intra-image z-score normalization was applied to reduce amplitude effects. Principal component analysis (PCA) was used to reduce dimensionality, retaining the top 512 components to match the dimensionality of multimodal feature spaces. This pipeline was implemented using the DNNBrain toolkit 53.

      (4) WordNet features: Semantic features were obtained from publicly available WordNet annotations provided with the HCP dataset (7T_movie_resources/WordNetFeatures.hdf5), following the procedure of Huth et al. (2012). Throughout this manuscript, we use the term “semantic features” to refer to such human-annotated, category-based representations of scene content, and we reserve the term “linguistic features” for continuous language embeddings derived automatically from pretrained language or vision–language models. Each second of the movie clips was manually annotated with WordNet categories according to predefined guidelines: (a) identifying clear objects and actions in the scene; (b) labeling categories that dominated for more than half of the segment duration; and (c) using specific category labels rather than general ones. A semantic feature matrix was constructed with rows corresponding to time points and columns to semantic categories, with category presence coded as binary values. More specific categories from the WordNet hierarchy were added to each labeled category, yielding a total of 859 semantic features. These features were used directly as regressors. We also evaluated a PCA-reduced 512-dimensional variant (fit within each training fold to avoid leakage); because this version performed slightly worse, we report results from the full 859-dimensional representation in the main text. For the generalization analysis in Study 2, annotations for the SFM dataset were aligned to the same WordNet category space to ensure consistency.”

      (3) In Figure 3, consider following Huth et al.'s approach by using 3-4 distinct colors to visualize semantic representations across the cortical surface more clearly.

      Thank you for this excellent suggestion. We have generated an alternative visualization using a discrete 3–4 color scheme following Huth et al. to display the semantic components on the cortical surface. This version makes the spatial correspondence between components and the boundaries between cortical territories easier to see. We now include this visualization in the Supplement (Fig. S3)

      (4) In Figure 2, the brain renderings are too small. Please consider creating a separate, enlarged figure with clearer delineation of relevant ROIs.

      We appreciate this suggestion and agree that clear delineation of ROIs is important. We evaluated larger brain renderings; however, within the multi-panel layout of Fig. 2, enlarging them compressed accompanying plots/legends and introduced visual crowding, which reduced overall readability. To preserve a balanced layout and consistent typography across panels, we have kept the current rendering size in the main text and added Fig. S4 with enlarged brain renderings showing clearer ROI boundaries for the same ROIs.

      Reviewer #2 (Recommendations for the authors):

      (1) From the introduction, I feel like naïve readers would have a hard time understanding what semantic models (e.g., WordNet) are, which the authors write are based on "labor-intensive and subjective manual annotation of semantic content". It would be straightforward to explain the process-how scientists have written descriptions or denoted categories of what's happening within a TR and transformed these into embedding vectors based on language models. This description would explain what the authors mean by "labor-intensive, time-consuming, and subjective". Related to this point, the authors seem to be using the words "semantic model/feature" and "linguistic model/feature" interchangeably, which may exacerbate the confusion.

      Thank you for this helpful suggestion. We agree that naïve readers would benefit from a clearer explanation of how “semantic” models such as WordNet are constructed and from a more precise distinction between semantic and linguistic features.

      In response, we expanded the Introduction (p. 3) to explicitly describe the process by which semantic features are generated via dense human annotation (i.e., raters label objects, actions, and events within each TR and map these labels onto a predefined ontology to form feature vectors), clarifying why this approach is labor-intensive, time-consuming, and subject to rater variability.

      To avoid disrupting the conceptual flow of the Introduction, we placed the explicit terminology clarification in the Methods section (p. 22), where feature extraction is described. There, we now define semantic features as human-annotated, category-based representations of scene content, and linguistic features as continuous language embeddings derived automatically from pretrained language or vision–language models. These revisions are intended to improve clarity and consistency for both expert and non-expert readers.

      (On page 3) “Critically, semantic models often rely on dense human annotation. In early naturalistic encoding studies, trained raters watched the stimulus and labeled what was happening within each TR or short time window—for example, identifying objects, actions, or events present in the scene. These labels were then mapped onto a predefined semantic ontology (such as WordNet), yielding high-dimensional categorical feature vectors that served as regressors in encoding models. While this approach provides interpretable semantic features, it is labor-intensive, time-consuming, and inherently subjective, as annotations depend on rater judgment, labeling guidelines, and dataset-specific conventions, limiting scalability and reproducibility.”

      (On page 22) “Throughout this manuscript, we use the term “semantic features” to refer to such human-annotated, category-based representations of scene content, and we reserve the term “linguistic features” for continuous language embeddings derived automatically from pretrained language or vision–language models.”

      (2) Figure 1A does not look like an accurate schematic of the encoding method. For example, shouldn't the "Train" give rise to weight matrices, and Movies come from moments at Test? I would appreciate it if this schematic figure would explain what the encoding model is to naïve readers.

      (3) Figure 1B emphasizes that VALOR is utilizing multimodal features, but does not emphasize that the model is trained on dynamic video. The current figure looks like the model extracted visual and linguistic features from a screenshot of the video, much like the CLIP model.

      Thank you for this helpful comment. We agree that the original Fig. 1A did not sufficiently clarify what is learned during training versus what is applied during testing, and that this distinction is particularly important for naïve readers unfamiliar with encoding models. We also agree that the original Fig. 1B did not sufficiently emphasize that VALOR is trained on dynamic video segments, and that the schematic could be misinterpreted as aligning a single video frame with text, similar to CLIP-style image–text models.

      We have revised Fig. 1A (p. 6) to make the encoding procedure explicit and pedagogical. Specifically, we now clearly depict that, during the training phase (HCP dataset), voxel-wise encoding models learn feature-to-voxel weight matrices from stimulus features and BOLD responses. These learned weights are explicitly labeled as voxel-wise weight matrices and visually associated with the training stage. In the testing/generalization phase (SFM dataset), we now indicate that these learned weights are held fixed and applied to features extracted from novel movies to generate predicted BOLD responses. Additional labels were added to distinguish “Training (learn weights)” from “Testing/Transfer (apply fixed weights)” and to clarify that the encoding model implements a linear mapping from stimulus features to voxel responses. We have also rewritten the Fig. 1 legend (p. 6) to explicitly explain the encoding workflow in words, including (i) the learning of voxel-specific weights during training, (ii) their reuse during cross-dataset transfer, and (iii) how generalization performance is evaluated. These changes are intended to ensure that Fig. 1A accurately reflects the encoding methodology and is understandable to readers without prior experience with encoding models.

      We have revised Fig. 1B (p. 6) to explicitly highlight the temporal nature of the video input used by VALOR. In the updated schematic, the visual stream is depicted as a sequence of consecutive frames spanning multiple seconds, grouped into a video segment, rather than as a single static image. Additional labels indicate that VALOR encodes temporally extended video clips and aligns them with corresponding textual descriptions in a shared embedding space via contrastive learning. We have also updated the figure legend (p. 6) to clarify that VALOR operates on multi-frame video segments and explicitly models temporal structure, distinguishing it from static image–text models such as CLIP. These changes are intended to make clear that VALOR’s advantage derives not only from multimodality, but also from learning representations over time.

      (4) Regarding Figure 2, why were paired t-tests conducted in one-sided comparisons? Shouldn't this be two-sided, given that there is no reason to assume one is higher or lower than another?

      Thank you for raising this point. We agree that, in the absence of a preregistered directional hypothesis, paired comparisons should be evaluated using two-sided statistical tests.

      In response, we have re-run all paired comparisons reported in Figure 2 (p. 9) using two-sided paired t-tests, recomputed the corresponding p-values and false discovery rate (FDR) corrections, and updated the significance markers in the figure and captions accordingly. Importantly, this change does not alter the qualitative pattern of results or the main conclusions reported in the manuscript.

      (5) Regarding Study 4, I am curious whether the results are specific to forward-looking representations (predictive coding) or whether the results broadly reveal regions that are sensitive to contexts. For example, if the authors were to incorporate nearby past scenes in the analysis rather than the nearby future scenes, would different brain regions light up?

      Thank you for this thoughtful question. We agree that it is important to distinguish forward-looking (predictive) representations from more general sensitivity to temporal context. In Study 4, we deliberately operationalized prediction using future-aligned features, such that only information from upcoming scenes was incorporated into the encoding model. Accordingly, the reported effects should be interpreted as reflecting forward-oriented representations rather than generic context sensitivity.

      To make this interpretive scope explicit, we have added a clarifying sentence at the beginning of the Study 4 paragraph in the Discussion (p.18), noting that our analysis incorporates only future-aligned features and that directly contrasting past- and future-aligned features will be an important direction for future work. This clarification is intended to clearly bound our claims while addressing the reviewer’s conceptual distinction..

      (On page 18) “In Study 4, we used a video-text alignment model to investigate predictive coding mechanisms. Because our analysis incorporates only future-aligned features, the reported effects should be interpreted as reflecting forward-oriented representations rather than generic sensitivity to temporal context; directly contrasting past- and future-aligned features will be an important direction for future work.”

      (6) In the paragraph starting in line 447, were WordNet feature time series also reduced to 512 dimensions like the rest of the model features?

      Thank you for the question. In the main analyses, WordNet feature time series were not reduced to 512 dimensions and were instead used at their full dimensionality (859 features).

      For comparability with the other feature spaces, we additionally conducted a control analysis in which WordNet features were reduced to 512 dimensions using PCA. The PCA was fit within each training fold to avoid information leakage, and the resulting 512-D features were evaluated using the same encoding pipeline. This PCA-reduced version performed slightly worse than the full 859-D WordNet representation. Accordingly, we report results from the full 859-D WordNet features in the main text. We have clarified this point in the Methods section (p. 22).

      (On page 22) “We also evaluated a PCA-reduced 512-dimensional variant (fit within each training fold to avoid leakage); because this version performed slightly worse, we report results from the full 859-dimensional representation in the main text.”

      (7) I don't think authors have written what VALOR stands for.

      Thank you for the reminder. We now define the VALOR acronym at its first mention in the Abstract and Introduction and use the abbreviation thereafter.

      (On page 2) “Using a state-of-the-art deep learning model (VALOR; Vision-Audio-Language Omni-peRception)”

      (On page 5) “To answer this, we apply a video-text alignment encoding framework, using VALOR (Vision-Audio-Language Omni-peRception)—a high-performing, open-source model that aligns visual and linguistic features over time—to predict brain responses during movie watching.”

      (8) When calculating equation (3), please make sure that the correlation values are Fisher's r-to-z transformed.

      Thank you for this reminder. We confirm that all correlation coefficients used in Equation (3) are now Fisher r-to-z transformed prior to any averaging, contrasts, or statistical testing, and this procedure is now explicitly stated in the Methods. We have also updated Fig. 4a (p. 15) to reflect this transformation. Importantly, applying the r-to-z transformation does not change the qualitative pattern of results or their statistical significance.

      (9) I wasn't able to check the OSF data/codes because it required permission.

      Thank you for flagging this, and we apologize for the inconvenience. We have removed the permission restriction and set the OSF repository to public read-only access, which should resolve the issue.

      Reviewer #3 (Recommendations for the authors):

      (1) The current approach extracts features from a single "best" layer of each model, which may be suboptimal for predicting neural responses. Prior work has shown that combining features across multiple layers through optimized fusion strategies (e.g., St-Yves et al., 2023) or using model ensembles (e.g., Li et al., 2024) can substantially improve encoding performance. The authors may consider these more comprehensive approaches either as additional baselines or as alternative directions to enhance model accuracy.

      Thank you for this constructive suggestion. We agree that combining features across multiple layers or using optimized fusion and ensemble strategies, as demonstrated in recent work (e.g., St-Yves et al., 2023; Li et al., 2024), can substantially improve absolute encoding performance.

      In the present study, however, we intentionally evaluated each model using its single best-performing layer within a matched encoding pipeline. This design choice was made to maintain model-agnostic comparability and interpretability, and to ensure that performance differences could be attributed primarily to the type of representation (e.g., temporally informed video–text features versus static or unimodal features), rather than to differences in model complexity, parameter count, or fusion strategy. Importantly, this constraint was applied uniformly across all models and therefore does not favor VALOR over the baselines.

      We now explicitly note in the Discussion (p. 19) that multilayer fusion and ensemble approaches represent a natural and promising extension of our framework and are likely to further improve absolute prediction accuracy. Our goal in the current work was to establish the practical utility and generalizability of temporally aligned video–text features for naturalistic movie fMRI under a controlled and comparable evaluation setting..

      (On page 19) “Third, for comparability across models we evaluated each model using its single best-performing layer within a matched encoding pipeline rather than using multilayer fusion or ensembling, which allowed us to attribute performance differences to representational format but likely underestimates the absolute performance ceiling.”

      (2) Given the naturalistic video-based task, the manuscript would benefit from including state-of-the-art video-only models (e.g., Video Swin Transformer, VideoMAE, and other more recent architectures) as explicit baselines. These models are designed to capture spatiotemporal structure without relying on language input and would provide a more targeted comparison to assess the specific contribution of temporal visual processing.

      Thank you for this thoughtful suggestion. We agree that state-of-the-art video-only spatiotemporal models (e.g., Video Swin Transformer, VideoMAE) are highly relevant baselines for naturalistic movie paradigms and would provide a more targeted comparison for isolating the contribution of temporal visual processing independent of language input.

      In the present study, our primary goal was not to exhaustively benchmark all possible video architectures, but to evaluate whether temporally informed video–text features can serve as a practical and general-purpose encoding framework that improves upon the models most commonly used in cognitive neuroscience for naturalistic fMRI (e.g., AlexNet for vision, WordNet for semantic annotation, and CLIP for static multimodal alignment). Using these established baselines allowed us to place our results in direct continuity with prior neuroimaging work and to attribute performance differences to representational format under a controlled encoding pipeline.

      We agree that incorporating modern video-only spatiotemporal encoders is an important next step, particularly for disentangling the relative contributions of temporal visual structure and cross-modal video–text alignment. We now explicitly note this point in the Discussion (p.19) as a limitation and future direction, and view such comparisons as a natural extension of the current framework within the same TR-aligned encoding setup.

      (On page 19) “Second, we did not directly compare VALOR to state-of-the-art video-only spatiotemporal models (e.g., Video Swin Transformer, VideoMAE, and related architectures) that are designed to capture temporal visual structure without language grounding; such comparisons will be important for isolating the specific contributions of temporal visual processing versus cross-modal video–text alignment in naturalistic neural responses.”

      (3) An additional consideration is the scale of the AI models used for feature extraction. Previous studies (e.g., Matsuyama et al., 2023) have indicated that model size - particularly the number of parameters - can influence neural prediction performance, independently of architecture. A discussion or analysis of how model size contributes to the observed encoding gains would help clarify whether improvements are due to the representational quality of the model or simply its scale

      Thank you for this important point. We agree that model scale—particularly parameter count—can influence neural prediction performance independently of architecture, as noted in prior work (e.g., Matsuyama et al., 2023).

      In the present study, our primary goal was to evaluate whether temporally informed video–text representations provide practical advantages over unimodal and static multimodal baselines that are widely used in cognitive neuroscience for naturalistic movie fMRI, under a matched encoding pipeline. We did not perform a systematic scale-controlled analysis in this revision because doing so would require training or evaluating multiple size-matched variants across video-only and video–text architectures, which is beyond the scope of the current work.

      We therefore agree that part of the observed performance gains may reflect model capacity in addition to representational format, and we caution against attributing all improvements solely to cross-modal alignment or temporal structure. We now explicitly acknowledge this limitation in the Discussion and note that comparing size-matched video-only and video–text models within the same pipeline is an important next step for disentangling model scale from representational content.

      (On page 19) “Finally, part of VALOR’s advantage may reflect model capacity: larger pretrained models often yield higher encoding accuracy, so repeating these analyses with size-matched image-only and image–text models will be critical for disentangling model scale from representational content.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the current study, Huang et al. examined ACC response during a novel discrimination-avoid task. The authors concluded that ACC neurons primarily encode post-action variables over extended periods, reflecting the animal's preceding actions rather than the outcomes or values of those actions. Specifically, they identified two subgroups of ACC neurons that responded to different aspects of the actions. This work represents admirable efforts to investigate the role of ACC in task-performing mice. However, in my opinion, alternative explanations of the data were not sufficiently explored, and some key findings were not well supported.

      Strengths:

      The development of the new discrimination-avoid task is applauded. Single-unit electrophysiology in task-performing animals represents admirable efforts and the datasets are valuable. The identification of different groups of encoding neurons in ACC can be potentially important.

      Weaknesses:

      One major conclusion is that ACC primarily encodes the so-called post-action variables (specifically shuttle crossing). However, only a single example session was included in Figure 2, while in Supplementary Figure 2 a considerable fraction of ACC neurons appears to respond to either the onset of movement or ramp up their activity prior to movement onset. How did the authors reach the conclusion that ACC preferentially respond to shuttle crossing?

      We now include more example sessions and the main results from individual animals (Fig. 3; Figs. S2–S3; Fig. 8). Overall, the results are consistent across recording sessions and animals.

      While shuttle crossings were the primary reference for most analysis, using shuttle initiation as a reference led to similar conclusions (Fig.4). Namely, we found that most ACC neurons exhibit either robust (22%; Types 1a & 2a) or moderate (51%; Types 1b & 2b) post-shuttle activity changes (Fig.4), while only a subset exhibits ramping pre-shuttle activity (16%; Types 3b & 3c). Therefore, our conclusion was intended to highlight the role of post-shuttle activity in learning. While we do not exclude the possibility that pre-shuttle ACC activity contributes to learning, its involvement is likely more limited.

      In Figure 4, it was concluded that ACC neurons respond to action independent of outcome. Since these neurons are active on both correct and incorrect shuttle but not stay trials, they seem to primarily respond to overt movement. If so, the rationale for linking ACC activity and adaptive behavior/ associative learning is not very clear to me. Further analyses are needed to test whether their firing rates correlated with locomotion speed or acceleration/deceleration. On a similar note, to what extent are the action state neurons actually responding to locomotion-related signals? And can ACC activity actually differentiate correct vs. incorrect stays?

      In this study, we highlight two distinct groups of ACC neurons: action-state and action-content neurons. Both groups of neurons tend to show sustained activity even when the animals remain immobile after completing shuttle behaviors, suggesting that their activity is not directly driven by locomotion. Furthermore, action-content neurons are selectively engaged in only one of the two shuttle categories, either rooms A→B or B→A shuttles. Therefore, differences in neuronal activity are unlikely to reflect locomotor differences, given that both shuttle types involve similar movement patterns. Finally, we analyzed ACC neuronal activity in relation to locomotion speed. Our results indicate that only a small fraction of neurons (<15%) show speed-correlated activity (Fig.5), suggesting that most ACC neurons do not encode movement-related information. Taken together, these findings support the distinction between ACC activity and locomotion encoding.

      As for the small subset of speed-related neurons, it remains unclear whether these speed-related neurons represent a distinct subpopulation within the ACC or reflect recordings from the nearby motor cortex. Postmortem examination of the recording sites suggests that most neurons were recorded from the ACC, while a small subset may be located at the border between the ACC and motor cortex (Fig. S2). Therefore, it is possible that the small fraction of speed-related neurons originated from the motor cortex.

      Lastly, given that the ACC neurons display no or limited activity during stay trials, their activity generally does not differentiate correct vs. incorrect stays (Fig.S7). However, ACC activity does show moderate differentiation between room-A vs. room-B stays (Fig.S7).

      Given that a considerable amount of ACC neurons encode 'action content', it is not surprising that by including all neurons the model is able to make accurate predictions in Figure 6. How would the model performance change by removing the content neurons?

      We thank the reviewer for this thoughtful analysis idea. Excluding action-content neurons drastically reduces decoding accuracy (Fig.8), suggesting that they are the main drivers for differentiating rooms AB vs. BA shuttles.

      Moving on to Figure 7. Since Figure 4 showed that ACC neurons respond to movement regardless of outcome, it is somewhat puzzling how ACC activity can be linked to future performance.

      As discussed earlier (point #2), ACC activity does not simply reflect locomotion itself. We interpret the post-shuttle ACC activity as encoding both the preceding shuttle state (shuttle or stay) and shuttle content (rooms AB or BA). Regardless of the outcome (safety or shock), such encoding is essential for cue–action–outcome associative learning, because both positive and negative feedback can drive learning. The level of post-shuttle ACC activity may reflect task engagement, with greater engagement facilitating learning and improving future performance.

      Two mice contributed about 50% of all the recorded cells. How robust are the results when analyzing mouse by mouse?

      We have added further analysis of highlighting the results of each mouse. Although the total number of recorded neurons varied across mice, the major findings were consistent. In every mouse, we observed sustained post-shuttle ACC activity (Fig.S2), and population-level ACC activity reliably decoded shuttle contents (rooms AB vs. BA; Fig.8).

      Lastly, the development of the new discrimination-avoid task is applauded. However, a major missing piece here is to show the importance of ACC in this task and what aspects of this behavior require ACC.

      We appreciate this feedback. We are currently conducting additional experiments to determine whether inhibiting ACC activity during distinct time windows disrupts task learning. We hope to publish a follow-up paper on these findings in the near future.

      Reviewer #2 (Public review):

      Summary:

      The current dataset utilized a 2x2 factorial shuttle-escape task in combination with extracellular single-unit recording in the anterior cingulate cortex (ACC) of mice to determine ACC action coding. The contributions of neocortical signaling to action-outcome learning as assessed by behavioral tasks outside of the prototypical reward versus non-reward or punished vs non-punished is an important and relevant research topic, given that ACC plays a clear role in several human neurological and psychiatric conditions. The authors present useful findings regarding the role of ACC in action monitoring and learning. The core methods themselves - electrophysiology and behavior - are adequate; however, the analyses are incomplete since ruling out alternative explanations for neural activity, such as movement itself, requires substantial control analyses, and details on statistical methods are not clear.

      Strengths:

      (1) The factorial design nicely controls for sensory coding and value coding, since the same stimulus can signal different actions and values.

      (2) The figures are mostly well-presented, labeled, and easy to read.

      (3) Additional analyses, such as the 2.5/7.5s windows and place-field analysis, are nice to see and indicate that the authors were careful in their neural analyses.

      (4) The n-trial + 1 analysis where ACC activity was higher on trials that preceded correct responses is a nice addition, since it shows that ACC activity predicts future behavior, well before it happens.

      (5) The authors identified ACC neurons that fire to shuttle crossings in one direction or to crossings in both directions. This is very clear in the spike rasters and population-scaled color images. While other factors such as place fields, sensory input, and their integration can account for this activity, the authors discuss this and provide additional supplemental analyses.

      Weaknesses:

      (1) The behavioral data could use slightly more characterization, such as separating stay versus shuttle trials.

      We appreciate this feedback. In the revised manuscript, we present data separating stay versus shuttle trials (Fig.1). Additionally, we provide new data from extended training sessions (Fig.S2).

      (2) Some of the neural analyses could use the necessary and sufficient comparisons to strengthen the authors' claims.

      We have now used the necessary and sufficient comparisons where applicable. In the SVM decoding analysis, we show that population ACC activity is sufficient to decode AB or BA shuttles. We also show that excluding action-content, but not other ACC neurons, drastically reduces decoding accuracy, suggesting that these neurons are necessary for the decoding (Fig.8).

      (3) Many of the neural analyses seem to utilize long time windows, not leveraging the very real strength of recording spike times. Specifics on the exact neural activity binning/averaging, tests, classifier validation, and methods for quantification are difficult to find.

      We chose to perform our neural analyses on a longer time scale, given the sustained activity we see in the data. To further justify that decision, we now provide additional results highlighting the sustained activity of ACC neurons in our task (Fig.2; Fig.S2). Additionally, we now provide more specifics of the neural analyses in Methods section.

      (4) The neural analyses seem to suggest that ACC neurons encode one variable or the other, but are there any that multiplex? Given the overwhelming evidence of multiplexing in the ACC a bit more discussion of its presence or absence is warranted.

      This is an interesting point of discussion, and we thank the reviewer for pointing this out. Overall, our results suggest that individual ACC neurons preferentially engage in only one of the proposed functions, rather than multiplexing across them. For example, action-state and action-content ACC neurons primarily engage in action monitoring, but not in decision-making, planning, or outcome tracking. Nevertheless, we cannot rule out the possibility that other ACC neurons, through their distinct connectivity or location in different ACC subregions, engage in other proposed functions. Thus, when considering the ACC as a whole, its function may still be multiplexed.

      Another possible reason we do not see clear multiplexing of neurons may be due to the dynamic nature of our task. Unlike established tasks that often assign fixed positive or negative values to cues, the cues in our task are not inherently associated with valence. Instead, their meaning is dynamically determined by the animal’s location (context) at the time of cue presentation. Since values are not fixed and change based on context, value-related responses may not be reflected in the ACC in our tasks.

      We have now incorporated the above discussions into our revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      The authors record from the ACC during a task in which animals must switch contexts to avoid shock as instructed by a cue. As expected, they find neurons that encode context, with some encoding of actions prior to the context, and encoding of neurons post-action. The primary novelty of the task seems to be dynamically encoding action-outcome in a discrimination-avoidance domain, while this is traditionally done using operant methods. While I'm not sure that this task is all that novel, I can't recall this being applied to the frontal cortex before, and this extends the well-known action/context/post-context encoding of ACC to the discrimination-avoidance domain.

      While the analysis is well done, there are several points that I believe should be elaborated upon. First, I had questions about several details (see point 3 below). Second, I wonder why the authors downplayed the clear action coding of ACC ensembles. Third, I wonder if the purported 'novelty' of the task (which I'm not sure of) and pseudo-debate on ACC's role undermines the real novelty - action/context/outcome encoding of ACC in discrimination-avoidance and early learning.

      Strengths:

      Recording frontal cortical ensembles during this task is particularly novel, and the analyses are sophisticated. The task has the potential to generate elegant comparisons of action and outcome, and the analyses are sophisticated.

      Weaknesses:

      I had some questions that might help me understand this work better.

      (1) I wonder if the field would agree that there is a true 'debate' and 'controversy' about the ACC and conflict monitoring, or if this is a pseudodebate (Line 34). They cite 2 very old papers to support this point. I might reframe this in terms of the frontal cortex studying action-outcome associations in discrimination-avoidance, as the bulk of evidence in rodents comes from overtrained operant behavior, and in humans comes from high-level tasks, and humans are unlikely to get aversive stimuli such as shocks.

      We appreciate this feedback. We have revised the Introduction and Discussion.

      (2) Does the purported novelty of the task undermine the argument? While I don't have an exhaustive knowledge of this behavior, the novelty involves applying this ACC. There are many paradigms where a shock triggers some action that could be antecedents to this task.

      We argue our newly designed discrimination–avoidance task is unique for several reasons. First, it requires animals to discriminate both sensory cues and environment contexts. Unlike established tasks that often assign fixed positive or negative values to cues, the cues in our task are not inherently associated with valence. Instead, their meaning is dynamically determined by the animal’s location (context) at the time of cue presentation, which reflects a conceptual advance over previous techniques. Furthermore, by removing valence from the cues, this design helps disentangle the ACC’s potential role in value encoding from other cognitive functions.

      Second, this task involves robust, ethologically relevant actions (i.e., shuttles), unlike many established paradigms that rely on less naturalistic behaviors such as saccades or lever presses. We view this as a key distinction from prior approaches, as even previous paradigms that utilize shutting responses or other naturalistic responses, fail to incorporate dynamic integration of cues and contexts.

      Finally, the clear temporal separation between actions and outcomes further helps disentangle the ACC’s roles in action monitoring vs. outcome tracking.

      (3) The lack of details was confusing to me:

      (a) How many total mice? Are the same mice in all analyses? Are the same neurons? Which training day? Is it 4 mice in Figure 3? Five mice in line 382? An accounting of mice should be in the methods. All data points and figures should have the number of neurons and mice clearly indicated, along with a table. Without these details, it is challenging to interpret the findings.

      We are sorry for the confusion. We now provide additional details and clear N numbers for each analysis to improve clarity.

      (b) How many neurons are from which stage of training? In some figures, I see 325, in some ~350, and in S5/S2B, 370. The number of neurons should be clearly indicated in each figure, and perhaps a table.

      All data were obtained from well-trained mice. For some analyses, the N is smaller because certain task sessions contained very few incorrect trials (≤3), which prevented us from examining ACC activity during those trials. We have modified figure legend so that neuron count is clear.

      (c) Were the tetrodes driven deeper each day? The depth should be used as a regressor in all analyses?

      Yes, the tetrodes were driven slightly deeper across task sessions (~80 µm per step; 2–4 depths per mouse). Given limited depth changes, preliminary analyses indicate no clear differences in ACC activity across these recording depths. However, we cannot rule out potential dorsal–ventral subregion differences if recordings were to span larger depth ranges.

      (d) Was is really ACC (Figure 2A)? Some shanks are in M2? All electrodes from all mice need to be plotted as a main figure with the drive length indicated.

      We have now included a supplementary figure showing all recording sites (Fig.S2). It is likely that a small subset of neurons was recorded at the ACC/M2 border area. Unfortunately, we are unable to separate them out due to blind recording design of our tetrode arrays.

      (e) It's not clear which sessions and how many go into which analysis

      We have now specified the number of task sessions for each analysis (see Methods).

      (f) How many correct and incorrect trials (<7?) are there per session?

      We have now specified the number of correct and incorrect trials per session (see Methods).

      (g) Why 'up to 10 shocks' on line 358? What amplitudes were tried? What does scrambled mean?

      We decided to use up to 10 mild shocks per trial because mice do not necessarily shuttle to the safe room after one or even a few shocks during the early stages of training. This design allows mice to efficiently learn the concept of the task (i.e., one room is safe while the other delivers shocks). Each shock was specified in the Methods section as 0.5 mA, 0.1 s. A “scrambled shock” refers to an electric shock delivered through multiple floor bars in a randomized pattern, effectively preventing the animal from avoiding the stimulus.

      (4) Why do the authors downplay pre-action encoding? It is clearly evident in the PETHs, and the classifiers are above chance. It's not surprising that post-shuttle classification is so high because the behavior has occurred. This is most evident in Figure S2B, which likely should be a main figure.

      We did not intend to downplay pre-action encoding. Our analysis shows that most ACC neurons exhibit either robust (22%; Types 1a & 2a) or moderate (51%;Types 1b & 2b) post-shuttle activity changes (Fig.4). Although a subset of ACC neurons exhibits ramping pre-shuttle activity, they represent a much smaller fraction (16%; Types 3b & 3c). Therefore, our conclusion was intended to highlight the role of post-shuttle activity in learning. While we do not exclude the possibility that pre-shuttle ACC activity contributes to learning, its involvement is likely more limited

      (5) The statistics seem inappropriate. A linear mixed effects model accounting for between-mouse variance seems most appropriate. Statistical power or effect size is needed to interpret these results. This is important in analyses like Figure 7C or 6B.

      We appreciate this feedback. We now use appropriate statistics and report effect size.

      (6) Better behavioral details might help readers understand the task. These can be pulled from Figures S2 and S5. This is particularly important in a 'novel' task.

      We now provide more details to help better understand the task and have added new figures (Fig.1; Figs. S1&S2).

      (7) Can the authors put post-action encoding on the same classification accuracy axes as Figure 6B? It'd be useful to compare.

      We appreciate the comment, but we are unsure what clarification is being requested.

      (8) What limitations are there? I can think of several - number of animals, lack of causal manipulations, ACC in rodents and humans.

      We now include discussions on limitation of our study. One caveat of our study is that the discrimination–avoidance task requires weeks of training in mice. By the time they master the task, ACC activity may reflect modified neural circuits. Investigating ACC activity during early phase of learning, such as by introducing a new pair of cues or contexts, could provide further insights into ACC’s role in learning and cognitive processes. Additionally, a limitation of the current study is the lack of evidence for the causal role of post-action ACC activity in complex associative learning. Future investigations using closed-loop strategies to selectively disrupt ACC activity during the post-action phase could help address this question.

      Minor:

      (1) Each PCA analysis needs a scree plot to understand the variance explained.

      We have added a scree plot for each PCA analysis.

      (2) Figure 4C - y and x-axes have the same label?

      We have corrected the y-axis label.

      (3) What bin size do the authors use for machine learning (Not clear from line 416)?

      The bin sizes used were 2.5, 5, 7.5, or 10 sec which have now been discussed in the Methods section.

      (4) Why not just use PCA instead of 'dimension reduction' (of which there are many?)

      We have adjusted the phrasing where appropriate.

      (5) Would a video enhance understanding of the behavior?

      We appreciate this feedback. We now include a few videos to accompany our paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Is Figure 1C sufficiently powered?

      We have now included data from additional mice and updated the figure accordingly.

      (2) Task performance was not plateaued after 10 sessions in Figure 1B. How variable is task performance in the datasets with ephys recordings (session to session, mouse to mouse).

      We have now included additional data from extended training (15 sessions; Fig.S2). Moderate variations across both sessions and mice are observed. Specifically, the total number of correct/incorrect shuttles used for ephys analysis are 19/5, 19/4, 21/5, 20/4 (mouse #1; 4 sessions); 20/7, 23/7, 20/7 (mouse #2; 3 sessions); 19/4, 16/2 (mouse #3; 2 sessions); 26/4, 23/4, 17/6, 25/5 (mouse #4; 4 sessions); 20/5, and 17/4 (mouse #5; 2 sessions), respectively.

      (3) Please quantify the results in Figure 3, for both within individual mice and across mice.

      We have calculated maximum trajectory length within the 3-D space (Fig. 3C).

      (4) What is the effect size in Figure 7C?

      We now report the effect size.

      (5) Please provide more details for spike sorting.

      We have now included more details in the Methods section.

      (6) More detailed cell type or correlation analysis in Figures 4 and 5 may be helpful. For example, if putative regular and fast-spiking neurons were simultaneously recorded, did the FS directly inhibit the RS to give rise to the apparent encoding properties?

      We recorded a small number of putative interneurons (n = 13) from only three mice, which precludes drawing meaningful conclusions, particularly given their heterogeneous responses during discrimination–avoidance tasks. Accordingly, we include only an example interneuron demonstrating discrimination between AB vs. BA shuttles (Fig. S5). Nevertheless, it is evident there are reciprocal monosynaptic connections between putative interneurons and certain pyramidal neurons, as indicated by short-latency (~2 ms) excitatory or inhibitory interactions (Fig. S5). That said, follow up studies with greater Ns are needed to parse out these details

      Reviewer #2 (Recommendations for the authors):

      (1) While I appreciate displaying the success rate for the sake of simplifying behavioral data in Figure 1B, it would be nice to also see these data broken out as correct vs incorrect for stay vs shuttle trials, since it is difficult to determine whether the performance increases are primarily driven by mice improving at stay vs shuttle responses

      We appreciate this feedback. In the revised manuscript, we present data separating stay versus shuttle trials (Fig.1; Fig.S2).

      (2) In Figure 2 the comparison between shuttle and stay is not particularly convincing, since the comparison is also essentially movement vs no movement and place1-->place2 vs place1-->place1. A more appropriate comparison might be action state neurons vs action content neurons during A-->B, B-->A, or both crossings. If it is true that these populations contain this information, then action state neurons should traverse a large component space in both directions, action content neurons only one direction, and so on.

      We agree that the comparison is not ideal due to differences in locomotion. However, it provides valuable information suggesting that the ACC plays a limited role during stay trials, despite these trials involve mental and cognitive processes comparable to shuttle trials. While we appreciate the reviewer’s suggestion, the proposed analysis is not particularly reliable given the relatively small number of simultaneously recorded action-state or action-content neurons.

      (3) I would say the above point applies to Figure 3 as well. I would also note that this reviewer greatly appreciates the rigor of showing ensemble activity in each subject.

      We appreciate this comment. See our response above.

      (4) In Figure 5 do these neurons show the same A-->B vs B-->A firing patterns during correct vs incorrect shuttles? The text describing the data in Figure 4 suggests this should be the case but even from a quick glance it sort of seems like the population dynamics during correct vs incorrect shuttles are not the same. My concern is that averaging neural activity over 5s windows washes out all these dynamics

      Preliminary analysis suggests that these firing patterns apply to both correct and incorrect shuttles. However, the main reason we did not compare correct and incorrect trials is the limited amount of data. In many sessions, there are only a few (≤5) incorrect shuttles, which include both AB or BA shuttles (Fig.1C; Fig.S2), thus lacking the statistical power for a meaningful comparison.

      (5) Some information on classifier validation is required - was this leave-out validation and if so how many trials were left-out vs tested? K-fold, and if so, how many folds? Was the trial order shuffled for each simulation? Classifiers will pick up within-session temporal information. In addition to this classifier accuracy during the different time points should be compared by a non-parametric test, and compared to the 95th percentile of the label-shuffled distribution.

      Yes, we use standard 10-fold cross-validation. We appreciate the suggestion on trial-order shuffling, and implementing this procedure does not change our original conclusion. Additionally, we have applied a non-parametric test.

      (6) How exactly were neurons classified as content vs state? Was it the average activity during the 5s following the shuttle? If this is stated I could not really find it easily so I might suggest clarifying.

      We now use a new method for classification of the two neuron types (Fig.7). We have included detailed methods in the revised manuscript.

      (7) Movement drives cortical neuron activity more than anything else I have ever seen. Really, more than anything else, it would be nice to demonstrate that it is not movement alone or movement multiplexed with place/sensory information/direction driving these responses.

      We have analyzed ACC neuronal activity in relation to locomotion speed. Our results indicate that only a small fraction of ACC neurons (<15%) show speed-correlated activity (Fig.5). It remains unclear whether these speed-related neurons represent a distinct subpopulation within the ACC or reflect recordings from nearby motor cortex. Postmortem examination of the recording sites suggests that most neurons were recorded from the ACC, while a small subset may be located at the border between the ACC and motor cortex. Therefore, it is possible that the small fraction of speed-related neurons originated from the motor cortex.

      Furthermore, we identify two distinct groups of ACC neurons: <iaction-state and action-content neurons, both of which tend to show sustained activity even when the animals remain immobile after completing shuttle behaviors. This prolonged activation in the absence of movement suggests that their activity is not directly driven by locomotion. Moreover, action-content neurons are selectively engaged in only one of the two shuttle categories, either rooms AB or BA shuttles. Therefore, differences in neuronal activity are unlikely to reflect locomotor differences, given that both shuttle types involve similar movement patterns.

      (8) In addition to the above, the place-field analysis in Supplemental Figure 5 only shows 4 neurons. Was the whole population analyzed? Is it possible to decode place from the population during the ITI? The data in this figure sort of look exactly like place fields - many cortical neurons and also some hippocampal neurons have more than 1 place field

      We have now provided additional place-field analysis. A comparison with hippocampal CA1 neurons (recorded during the same task) suggests that ACC neurons encode limited spatial information.

      (9) "a simple Pavlovian association strategy is unlikely to be sufficient for learning the task" ... is Pavlovian occasion setting not a simple association? Tones and contexts both readily act as Pavlovian occasion setters. Similarly positive/negative patterning might also explain how the task is learned.

      We appreciate this comment and have revised the sentence accordingly. It is possible that animals use multiple strategies to learn and perform the task effectively. In the early stages, animals may rely more heavily on sensory–spatial integration, whereas in later stages, sensory- or location-related Pavlovian associative strategies may contribute to performance, particularly when animals begin to show place preferences during inter-trial intervals.

      (10) I might suggest softening this language and others like it. For example, 2x2 factorial designs are not really novel.

      We have revised the language used to describe the task.

      (11) Some of the color-scale bars and figures do not have labels. For example, Supplementary Figure 3, Supplementary Figure 5. Please add labels.

      We have added the missing labels to all color bars.

      Reviewer #3 (Recommendations for the authors):

      (1) Some relevant papers that should be cited:

      https://doi.org/10.1523/JNEUROSCI.4450-08.2008

      10.1016/j.neuron.2018.11.016

      https://doi.org/10.1016/j.jphysparis.2014.12.001

      We appreciate these suggestions.

      (2) Where can we download the data and code?

      We will upload the essential data and MATLAB code to GitHub to accompany the publication of the final version of this paper.

    1. Author response:

      Thank you for the reviews of our article “PKMζ-PKCι/λ double-knockout demonstrates atypical PKC is crucial for the persistence of hippocampus LTP and spatial memory.” We will address all of the reviewers’ issues point-by-point in a revised version.

    1. Author response:

      We thank the reviewers for their insightful comments on our work.

      We agree with reviewer #1 that further experiments would be needed to figure out how the observations done on lab strains can apply to yeast in various ecological conditions and particularly in the wild. We here provide a proof of principle that multicellularity selection can arise as a side-effect. It obviously does not prove that it took place during yeast evolution, but we would like to emphasize that resource fluctuations are very common in ecological conditions, making it highly likely that the environmental conditions necessary for the selection of the side effects described have arisen.

      We agree with reviewer #2 that our work on yeast strains is “somewhat artificial” as often the case with model organisms under laboratory conditions. Importantly though, we showed that the effect found with the cln3 knock-out mutation can be phenocopied by overexpression of WHI5 (encoding the yeast equivalent of Rb). We propose that variations in the levels of cell cycle regulators during evolution may have played a role in multicellularity selection as a side effect. We agree that this is merely a hypothesis to explain the selection of multicellularity (just like predator escape) and that there is no direct evidence that this occurred in the history of the lineage. Nevertheless, our work provides a first evidence that such a selection of multicellularity as a side effect could be possible, and gives a framework to understand how multicellularity can persist in the wild, even when it is not the primary target of selection.

      We are currently working on the text and figure revisions suggested by the reviewers.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, the authors use a doxycycline-inducible DLD1 cell line expressing a Clover-tagged RNA-binding-defective TDP-43 2KQ mutant that forms nuclear "anisosomes" (TDP-43 shell with HSP70 core) to carry out a small-molecule screen using the LOPAC 1280 library to identify compounds that reduce anisosome number or shift their morphology and dynamics. They also conducted a genome-wide siRNA screen to identify genetic modifiers of anisosome formation and dynamics. From these screens, the authors identify pathways in RNA splicing, translation, proteostasis (proteasome and HSP90), and nuclear transport, including XPO1. They then focus on XPO1 as their primary hit. Pharmacological inhibition of XPO1 using KPT-276, Verdinexor, and Leptomycin B reduces anisosome number while enlarging remaining condensates, which retain liquid-like behavior by FRAP and fusion assays. XPO1 overexpression causes fewer, enlarged TDP-43 puncta, including cytoplasmic puncta, with little or no FRAP recovery, interpreted as gel or solid-like aggregates. Anisosome induction reduces detectable nucleoplasmic XPO1 staining. Finally, the authors examine a homozygous TDP-43 K181E iPSC-derived forebrain organoid model, showing increased cytosolic pTDP-43 in K181E/K181E organoids compared to wild-type controls. Chronic low-dose KPT-276 reduces cytoplasmic pTDP-43 without changing total TDP-43 levels. Bulk RNA-seq shows only a modest fraction of dysregulated genes in K181E/K181E organoids are rescued by KPT-276. They conclude that nuclear export, via XPO1, is a key regulator of TDP-43 liquid-to-solid phase transitions and that cytoplasmic aggregation per se may contribute only modestly to TDP-43 proteinopathy, with RNA-processing defects being dominant.

      We thank the reviewer for carefully summarizing our study.

      The study presents well-executed chemical and genome-wide siRNA screens in a DLD1 TDP-43 2KQ anisosome model and follows up on nuclear transport, particularly XPO1, as a modulator of TDP-43 phase behavior and cytoplasmic aggregation. The screens are impressive in scale, and the microscopy and fluorescence recovery after photobleaching (FRAP) work is technically strong. However, the central mechanistic and disease-relevance claims are not yet sufficiently supported. There are major concerns about the heavy reliance on non-physiological, RNA-binding-defective, and acetylation-mimetic TDP-43 (2KQ) and a homozygous TDP-43 K181E organoid model. An underdeveloped and partly contradictory mechanistic link exists between XPO1 and TDP-43 phase transitions in the context of prior work showing TDP-43 is not a canonical XPO1 cargo. The paper also appears to overinterpret organoid data to conclude that cytoplasmic TDP-43 aggregation plays only a minor role in pathology, based largely on pTDP-43 antibody staining with limited sensitivity and relatively modest rescue readouts. A deeper mechanistic analysis and additional, more physiological validation are needed for this to reach the level of rigor and impact implied by the title and abstract. The work feels screen-rich but conceptually underdeveloped, with key claims outpacing the data. A major revision with substantial new data and tempering of conclusions is warranted. I outline several problematic areas below:

      (1) The central mechanistic discoveries are derived almost entirely from a DLD1 colon cancer cell line overexpressing an RNA-binding-defective, acetylation-mimetic TDP-43 2KQ mutant and homozygous TDP-43 K181E iPSC-derived organoids. Both systems are far from physiological. The 2KQ mutation is a synthetic double lysine-to-glutamine mutant originally designed to mimic acetylation and disrupt RNA binding. In this study, essentially all cell-based mechanistic data on phase behavior, screens, and XPO1 effects rely on 2KQ. Yet there is no quantification of how much endogenous TDP-43 is acetylated in degenerating human neurons, nor whether a 2KQ-like acetylation state is ever achieved in vivo. It is not established that the phase behavior of 2KQ recapitulates the physiological or pathological phase behavior of wild-type TDP-43 or genuine disease-linked mutants, which may retain partial RNA binding and different post-translational modification patterns. As a result, it is difficult to know whether the modifiers identified here regulate a highly artificial 2KQ condensate or physiologically relevant TDP-43 condensates. To address this concern, the paper would benefit from quantifying endogenous TDP-43 acetylation at the relevant lysines in control and ALS/FTD patient tissue or more disease-proximal models such as heterozygous TARDBP mutant iPSC neurons, which would justify the focus on an acetyl-mimetic mutant. Key phenomena, including XPO1 dependence of phase behavior, effects of proteasome and HSP90 inhibition, and effects of splicing and translation inhibitors, should be tested for wild-type TDP-43 expressed at near-physiological levels and for one or more bona fide ALS/FTD-linked TARDBP mutants that are not acetyl mimetics. At a minimum, the authors should show that endogenous TDP-43 in neuronally differentiated cells exhibits qualitatively similar responses to XPO1 modulation, rather than exclusively relying on DLD1 2KQ overexpression.

      Acetylation of endogenous TDP-43 was reported by several studies. Although it occurs at low levels under normal conditions, TDP-43 acetylation is upregulated under stress conditions (e.g. oxidative stress and proteotoxic stress) (PMID: 25556531; PMID: 28724966). Importantly, Cohen et al. reported the identification of acetylated TDP-43 in ALS patient spinal cord (PMID: 25556531), while Yu et al. showed that endogenous wildtype TDP-43 undergoes demixing when neurons were treated with either a deacetylase inhibitor or proteasome inhibitors (PMID: 33335017). These studies also show that acetylated TDP-43 is defective in RNA binding and more prone to aggregation. Furthermore, ectopic expression of acetylated TDP-43 mimetics in cells and mice induces cellular defects similar to those observed in disease models (PMID: 28724966). Thus, our findings, based on previously established TDP-43 mimetics, should provide valuable information regarding the regulation of TDP-43 phase behavior. We agree with the reviewers that the model used in this study has its limitations, and we will be happy to revise the manuscript to tone down some conclusions, and include more background information to justify the use of TDP-43 acetylation mimetics.

      (2) The organoid model is based on a homozygous K181E knock-in line. However, in patients, TARDBP mutations are overwhelmingly heterozygous. Homozygosity is thus a severe, arguably non-physiological sensitized background that may exaggerate nuclear RNA mis-splicing and phase defects and alter the relative contribution of cytoplasmic aggregation versus nuclear loss-of-function. In addition, it is not fully clear from this manuscript whether the structures in K181E organoids are bona fide anisosomes as defined in Yu et al. 2021, characterized by HSP70-enriched central liquid cores with TDP-43 shells and similar FRAP and fusion behavior to anisosomes in the DLD1 model. At present, the organoid section is framed as validation of "anisosome-bearing organoids," but the figures in this manuscript mainly show pTDP-43 puncta and total TDP-43 immunostaining, without detailed structural or biophysical characterization. The authors should explicitly compare heterozygous K181E/+ organoids or another heterozygous TARDBP mutant line with homozygous K181E/K181E organoids to assess whether XPO1 inhibition has similar effects in a genotype that more closely resembles patient genetics. They should provide direct evidence that the K181E condensates in organoids are anisosomes through HSP70 core immunostaining, three-dimensional reconstruction, and FRAP measurements, and clarify whether KPT-276 is acting on anisosome-like structures or more generic cytoplasmic aggregates or puncta. Without this, the leap from a DLD1 2KQ cancer cell model to human ALS/FTD-relevant neurons is not convincingly supported.

      The reviewer is correct that the use of homozygous K181E organoids generates a homogenous background that is more sensitive for detecting phosphor-TDP43. The goal of the experiment was to test whether XPO1 inhibition mitigates the aggregation of a TDP-43 disease mutant. For this purpose, we believe that our experimental setup is suitable. We agree that we should not extrapolate the result to overemphasize on its disease connections. We will revise the paper to tone down this part.

      Regarding the immunostained signals in K181E organoids, we did not report them as anisosomes. As widely documented in the literature, p-TPD-43 is widely used as a marker of pathological TDP-43 aggregation. P-TDP-43 is enriched in pathological aggregates in human ALS and FTLD patients, colocalized with other aggregation signatures such as ubiquitin and other aggregation prone proteins (PMID: 36008843), and is being used as a diagnostic marker for neurodegeneration (PMID: 31661037). Figure 7A showed that inhibiting nuclear export mitigates the accumulation of p-TDP-43 in mutant tissues. We will revise the subheading and the corresponding text to avoid the confusion.

      (3) The title and framing assert that "nuclear export governs TDP-43 phase transitions." However, prior studies such as Pinarbasi et al. 2018 and Duan et al. 2022 indicate that TDP-43 is not a canonical XPO1 cargo and that its export is largely passive, with active nuclear import being the dominant determinant of nuclear localization. The authors cite these studies but still position XPO1 as a central, quasi-direct regulator. The data presented are largely correlative or based on pharmacologic manipulation and overexpression in an overexpression mutant background, with no direct evidence that XPO1 engages TDP-43 in a specific, regulated manner. Even if XPO1 does not engage WT TDP-43, it could still engage the 2KQ variant, which needs to be tested.

      We did not conclude or imply the regulation of TDP-43 by XPO1 is direct. In fact, we explicatively mentioned on page 8 that the regulation is likely indirect and mediated by other factors. The sentence reads as “Since XPO1 does not bind TDP-43 directly (Pinarbasi et al., 2018), additional factors likely facilitate XPO1-mediated TDP-43 nuclear egression under this condition.” We can revise the part to make it clearer. We will also revise the title and change the framing accordingly. 

      (4) The XPO1 perturbations yield somewhat confusing phenotypes. XPO1 inhibition using Leptomycin B, KPT-276, and Verdinexor reduces anisosome number and enlarges remaining anisosomes, which remain liquid-like by FRAP recovery and fusion assays and stay nuclear. XPO1 overexpression causes fewer, enlarged puncta, but these are FRAP-impaired (gel-like) and redistribute to the cytoplasm. Thus, both decreased and increased XPO1 activity reduce anisosome number and enlarge puncta, but with opposite phase behaviors and subcellular localizations. The model presented in Figure 5L is relatively qualitative and does not resolve these issues. Moreover, XPO1 inhibition globally impairs nuclear export of many cargos and profoundly alters the nuclear environment, transcription, RNA processing, and chromatin. It is therefore difficult to conclude that the observed effects are specific to TDP-43 phase regulation as opposed to secondary consequences of broad nuclear export blockade.

      The reviewer correctly summarizes our data and interpretation: XPO1 loss-of-function and gain-of-function generate opposite phenotypes regarding TDP-43 phase behavior. We agree that additional studies are needed to elucidate the underlying mechanism (e.g. direct or indirect), but we feel that belong to a separate study. We plan to re-test the effect of nuclear export inhibition on the subcellular distribution of WT TDP-43 and the acetylation mimetics. We will also add more discussions about the potential indirect effect of XPO-1 inhibition on TDP-43 phase behavior.

      (5) The authors show that anisosome induction depletes nucleoplasmic XPO1 signal and that mCherry-XPO1 can be seen in some TDP-43 puncta. However, antibody penetration into anisosomes is limited, so XPO1 depletion from nucleoplasm could reflect sequestration in the anisosome shell or core, but this is not demonstrated. There is no demonstration of physical interaction, even indirect interaction, between XPO1 and TDP-43 or a defined adaptor, nor identification of a specific mutant of XPO1 that selectively disrupts this putative interaction while preserving other functions. The known TDP-43 NES has been shown to be weak and not a functional XPO1-dependent NES in multiple studies. If XPO1 is acting through an adaptor that recognizes 2KQ or K181E specifically, that by itself would bring into question the generality of the mechanism for wild-type TDP-43.

      We agree that our observation does not demonstrate an interaction between XPO1 and TDP-43. As mentioned above, we did discuss that the regulation of TDP-43 by XPO1 is likely indirect. We will revise our paper further to separate any speculative statements from the data and narrow our mechanistic claim.

      (6) To support a mechanistic claim that nuclear export governs TDP-43 phase transitions, more targeted evidence is needed. The authors should test whether siRNA knockdown or CRISPR interference of XPO1 in the DLD1 2KQ model reproduces the effects seen with Leptomycin B and KPT-276, including FRAP and fusion phenotypes, and verify on-target effects by rescue with an siRNA-resistant XPO1 construct. They should demonstrate that canonical XPO1 cargos behave as expected under the inhibitor conditions used, as a positive control, and that the concentrations used are not grossly toxic. They should attempt to identify or at least constrain candidate adaptors that might enable XPO1-dependent export of TDP-43 through proteomic analysis of XPO1 co-purifying with 2KQ condensates or loss-of-function studies of candidate adaptors from the siRNA screen. Finally, they should test whether a TDP-43 mutant that cannot bind the proposed adaptor still responds to XPO1 manipulation.

      The anisosome enlargement phenotype upon XPO1 depletion was seen in our siRNA screend, which was identified by machine-based image analyses using 6 distinct siRNAs. This, together with the chemical inhibition experiments, convinced us that the phenotype is specifically caused by XPO1 inactivation.

      When characterizing the effect of XPO1 inhibition on anisosome dynamics, we preferred chemical inhibitor because the effect is acute, and is therefore, less likely to be caused by secondary effects.

      Regarding the inhibitor concentration, a literature survey suggested that 50-200nM of Leptomycin B was commonly used. We chose 200nm to ensure a quick and complete inhibition of XPO1-mediated nuclear export (see Figure 3 in PMID: 9628873). This dose is also well tolerated by our cells, at least during the chosen time window.

      We did not propose any specific adaptor that mediates XPO1 interaction with TDP-43. The identification of such adaptor is out of the scope of this study. We will revise our paper to avoid this confusion.

      (7) Even with these data, what is currently shown is that global modulation of nuclear export capacity can alter the phase behavior and localization of a highly overexpressed RNA-binding-defective TDP-43 mutant and of K181E in organoids. This is important, but it is weaker than asserting that XPO1 directly governs TDP-43 phase transitions in physiological contexts. The title, abstract, and Discussion should be tempered to reflect that nuclear export is one of several pathways, alongside RNA splicing, translation, and proteostasis, that influence TDP-43 phase states in this model, and that the specific mechanism and cargo relationship between XPO1 and TDP-43 remain unresolved and may be indirect.

      We will revise the title, abstract, and discussion to temper the conclusion.

      (8) The authors conclude that cytoplasmic TDP-43 aggregation plays only a modest role in TDP-43 proteinopathies because in homozygous K181E organoids, chronic KPT-276 treatment almost abolishes cytoplasmic pTDP-43 puncta, yet bulk RNA-seq shows only a relatively small fraction of dysregulated genes are rescued. There are several issues with this inference. Relying primarily on pTDP-43 antibody staining to define cytoplasmic TDP-43 aggregation is limiting. pTDP-43 antibodies label only phosphorylated species and may miss non-phosphorylated, oligomeric, or amorphous TDP-43 species that could still be toxic. Different pTDP-43 antibodies vary in epitope accessibility depending on aggregate conformation and subcellular location. More sensitive approaches, such as high-affinity TDP-43 RNA aptamer probes developed by Gregory and colleagues, biochemical fractionation for SDS-insoluble and urea-soluble TDP-43, and filter-trap assays, would provide a more quantitative assessment of cytoplasmic aggregation and its reduction by KPT-276. Without these, it is not safe to assume that cytoplasmic aggregation has been eliminated, as opposed to one antigenic subclass.

      We agree with the reviewer that p-TDP-43 may not represent all aggregate species. However, p-TDP-43 antibodies detect the pathologically validated species most tightly associated with TDP-43 proteinopatheis. In human ALS and FTLD-TDP tissues, cytoplasmic inclusions are strongly immunoreactive for phosphorylated TDP-43 (typically S409/410, as used here). Additionally, p-TDP-43 immunohistochemistry is a routine diagnostic criterion in neuropathology. For these reasons, we believe that the observation that inhibition of XPO1 significantly reduces p-TDP-43 is a very significant finding, as it suggests that an improvement in TDP-43 proteinopathy can be achieved by the inhibition of nuclear transport. We plan to revise the text to better explain the significance of p-TDP-43 staining.

      (9) The treatment window, spanning from day 87 to 122 with 20 nanomolar KPT-276, may be too late or too mild to reverse entrenched nuclear RNA-processing defects, even if cytoplasmic inclusions are cleared. Once widespread cryptic exon inclusion and alternative polyadenylation misregulation are established, many downstream changes may become self-sustaining or only partially reversible. Moreover, XPO1 inhibition will massively rewire nucleocytoplasmic transport of many transcription factors, splicing factors, and RNA-binding proteins. Thus, the lack of full transcriptomic rescue cannot be cleanly interpreted as evidence that cytoplasmic aggregates are only modest contributors. It may instead reflect that nuclear dysfunction is primary and XPO1 inhibition does not correct, and may even exacerbate, certain nuclear defects.

      We agree with the reviewer that the lack of rescue may be caused by technical issues. We will remove the RNAseq data and related texts since it is not essential for our main conclusion.

      (10) To support a causal statement about the modest contribution of cytoplasmic aggregates, one would want more direct measures of neuronal health and function, such as cell death, neurite complexity, synaptic markers, and electrophysiology before and after KPT-276, not only transcriptomics. A way to selectively reduce cytoplasmic aggregation without globally inhibiting nuclear export would allow comparison of outcomes.

      We will remove the discussion regarding the role of cytoplasmic aggregates in disease.

      (11) Given these caveats, the concluding statements that cytoplasmic TDP-43 aggregation is only a modest contributor should be substantially softened. A more defensible interpretation is that in this homozygous K181E organoid model, chronic global XPO1 inhibition reduces pTDP-43-positive cytoplasmic puncta but only partially normalizes the steady-state transcriptome, suggesting that persistent nuclear RNA-processing defects and other pathways continue to drive pathology.

      We agree with the review and will revise this part accordingly.

      (12) The screens are a major strength but need more rigorous validation for key hits, especially nuclear transport factors. For the siRNA screen, hits are filtered by anisosome number per nucleus, but there is no direct demonstration in the main text that XPO1 or CSE1L knockdown is efficient at the messenger RNA or protein level. For the highlighted genes, Western blot or quantitative polymerase chain reaction validation and phenotypic rescue would strengthen confidence. For small-molecule hits, it is not systematically shown that anisosome modulation is independent of changes in total TDP-43 2KQ expression or gross toxicity. Translation inhibitors are tested for this, but for many other hits, including proteasome, HSP90, and kinase inhibitors, expression and general nuclear structure should be monitored. Given the reliance on anisosome count as a readout, secondary screens that specifically distinguish changes in TDP-43 expression levels, changes in nuclear morphology or cell cycle, and specific changes in anisosome phase behavior, including FRAP and fusion for top hits, would greatly increase interpretability.

      For the siRNA screen, each positive hit was confirmed by two rounds of screen with 6 independent siRNAs in total. Although we did not validate the knockdown efficiency due to the large number of hits, we routinely include a positive siRNA control in our study (siRNAdeath), which targets an essential gene. Transfection efficiency was controlled by measuring cell viability after knocking down this essential gene. In addition, the identification of XPO1 as a positive regulator of TDP-43 phase behavior was independently validated by our chemical genetic screens. We feel confident that XPO1 is a key modulator of TDP-43 phase behavior. For chemical treatment experiments, the anisosome fusion phenotypes could be detected as early as 5 h post treatment. Given the short treatment, we do not expect a significant change in protein level or toxicity.

      (13) The classification of condensates as liquid versus gel-like or solid is based almost entirely on FRAP recovery or lack thereof. While FRAP is appropriate, interpretations could be made more robust by including half-region-of-interest bleach controls and assessing mobile fractions and recovery kinetics more quantitatively across conditions. Complementing FRAP with other phase-behavior assays such as sensitivity to 1,6-hexanediol, shape relaxation after deformation, and coarsening behavior over longer timescales would strengthen the analysis. At present, some assignments, such as that XPO1 overexpression drives a gel-like transition, are reasonable but somewhat qualitative.

      In this study, we described two types of condensates formed by TDP-43 2KQ, one characterized previously as nuclear anisosome and the other as cytosolic puncta in XPO1 over-expressing cells. The two can be clearly distinguished by several features including the subcellular localization, shape, and mobility. We feel that our FRAP data clearly segregate these puncta into two distinctive types of assemblies. The difference in fluorescence recovery rate is huge. The proposed half-region-of-interest bleach is technically challenging for small anisosomes under normal conditions. When they were enlarged by Leptomycin B treatment, we did perform both whole anisosome bleach and partial bleach (Figure 5D, I). Both assays demonstrate that TDP-43 in these enlarged anisosomes is highly mobile.

      (14) For the Leptomycin B and KPT-276 experiments in cells and organoids, it would be important to confirm that canonical XPO1 cargo proteins accumulate in the nucleus and that the concentrations used are within a range that is not overtly toxic over the experimental timeframe. Assessing nuclear morphology, chromatin condensation, and general transcriptional activity through global RNA synthesis or key reporter genes would ensure that observed effects are not secondary to severe global nuclear export collapse.

      In Leptomycin B treatment experiments, we carefully chose a dose that was previously validated (see Figure 3 in PMID: 9628873). Based on our DAPI staining, the nuclear morphology appears normal (Figure 5A). Additionally, in cell line-based experiment, the effect of Leptomycin B on anisosomes was detected 6-8 hours post treatment. The change in global protein synthesis should be relatively minor at this time point. In the organoid experiment, the drug dose was determined by a pre-experiment in which the morphology of organoids was evaluated after prolonged treatment with different doses of the inhibitors.

      (15) In the organoid section, it is not clear how many independent iPSC clones and organoid batches were used per condition, nor whether batch effects were assessed in the bulk RNA-seq analysis. This should be fully specified and ideally controlled with isogenic wild-type and K181E clones. For transcriptional rescue, it is important to know whether the changes in wild-type organoids treated with KPT-276 are negligible. A direct wild-type comparison with or without KPT-276 is important to disentangle general drug effects from K181E-specific rescue. More detailed quantification of total TDP-43 and pTDP-43 in both nuclear and cytoplasmic fractions, including biochemical fractionation if possible, would strengthen the assertion that KPT-276 specifically reduces cytosolic pTDP-43 aggregates while sparing nuclear TDP-43.

      The organoid experiment was performed with two batches per condition. This is to reduce the effect of batch variation. The wildtype cells and K181E mutant are derived from the same genetic background. We will revise the text to clarify these issues. Given the cost of this experiment, we did not include drug-treated wild-type as a control. Given the criticisms by review 1 and 2 on the RNAseq data, we will remove this non-essential data from our revision.

      (16) Beyond the core issues above, several additions could greatly enhance the impact. The manuscript currently emphasizes XPO1, but the genetic and chemical data clearly implicate RNA splicing, translation, and proteostasis as equally strong or stronger regulators of TDP-43 phase states. A more integrated model that explains how these pathways intersect, for example, how splicing factor availability, ribosome loading, and proteasome capacity co-govern anisosome nucleation, growth, and hardening, would be valuable.

      We agree with the reviewer that these are important directions for future studies. We will include some discussions on a possible model that integrate these factors.

      (17) A key unresolved question is whether XPO1 is acting directly on TDP-43, or instead primarily regulates anisosomes by exporting other factors that more proximally control TDP-43 phase behavior. Given that TDP-43 is not a canonical XPO1 cargo and prior work indicates that its nuclear export is largely passive, it seems at least as plausible that XPO1 inhibition alters the nuclear concentration or localization of splicing factors, RNA-binding proteins, chaperones, or other modifiers identified in the screens, and that changes in these proteins secondarily reshape anisosome dynamics. In other words, XPO1 may be exporting a more direct regulator of anisome formation and hardening, rather than exporting TDP-43 itself in a specific, regulated way. The current data do not distinguish between these possibilities. Systematic identification of XPO1-dependent cargos that colocalize with or biochemically associate with anisosomes, combined with targeted perturbation of their nuclear export, would be needed to determine whether the relevant XPO1 substrate in this system is actually TDP-43 or an upstream modulator of its phase behavior.

      The reviewer raises an important point. We did include some discussions along this line in our paper. We can add more to further clarify this issue. Again, as mentioned in the original draft, we did not conclude there is an interaction between TDP-43 and XPO1.

      (18) Testing whether identified modifiers converge on nuclear TDP-43 concentration would be informative. Since phase separation is concentration-dependent, measuring nuclear versus cytoplasmic TDP-43 levels across key perturbations, including splicing inhibition, translation inhibition, proteasome inhibition, HSP90 inhibition, and XPO1 modulation, would help determine whether modifiers mainly work by changing nuclear TDP-43 concentration or by altering interaction networks and the material properties of condensates.

      We will measure the nuclear TDP-43 concentration in our imaging experiments and add the data to a revised version.

      (19) Examining other ALS-relevant RNA-binding proteins would be valuable. Given the role of XPO1 and other hits, it would be informative to briefly test whether similar principles apply to FUS, hnRNPA1, or other ALS-relevant RNA-binding proteins in the same cellular context, to argue for generality versus TDP-43-specific idiosyncrasies of the 2KQ system.

      We agree that this is an important issue but we feel the proposed experiments are beyond the scope of the study.

      (20) The Introduction sometimes implies that anisosomes are common and well-established intermediates en route to pathology. It would be helpful to more clearly state that, to date, anisosomes are primarily observed in overexpression and mutant systems and have not yet been unequivocally demonstrated in human patient tissue. The link between PDGFRβ, PAK4, GSK-3β, and YAP and TDP-43 phase dynamics is intriguing but only briefly mentioned. The authors should either expand on this or tone down the emphasis in the Results section.

      We will revise the introduction accordingly.

      (21) In the organoid methods, the authors should consider clarifying whether doxycycline is continuously used, which might alter TDP-43 expression and nuclear transport in a non-negligible way.

      The organoid model does not involve protein overexpression or doxycycline treatment. We measured endogenous p-TDP-43. We will revise to paper to avoid the confusion.

      (22) For statistical methods, it would be beneficial to indicate whether multiple-comparison corrections were applied for the many FRAP, anisosome count, and size comparisons beyond DESeq2 internal corrections for RNA-seq.

      We will add this information to the figure legends during revision.

      (23) Some figure legends could more clearly indicate whether the images shown are single z-planes or maximum intensity projections and how the thresholding for anisosome detection was performed.

      We will revise the figure legends to include this information. As for anisosome detection, because they are so obvious, standard thresholding was sufficient to identify them.

      (24) In its current form, the manuscript contains an impressive set of screens and some nicely executed imaging of TDP-43 condensates, highlighting nuclear export among other pathways as a modulator of TDP-43 phase behavior. However, the physiological relevance is undercut by heavy reliance on an acetylation-mimetic, RNA-binding-defective TDP-43 mutant and a homozygous K181E organoid model. The mechanistic link between XPO1 and TDP-43 remains largely inferential and partly at odds with prior work. The conclusion that cytoplasmic TDP-43 aggregation is only a modest contributor to disease is not firmly supported by the available data.

      We agree with the reviewer that the strength of the study is our unbiased approach that identify pathways capable of modulating TDP-43 phase separation behavior. We will revise our paper to carefully discuss the potential physiological relevance of our study and tone down some mechanistic conclusions, as suggested by the reviewer.

      (25) With substantial additional mechanistic work, particularly around XPO1, rigorous validation in more physiological TDP-43 contexts, more sensitive detection of cytoplasmic TDP-43 aggregates, and a tempering of the central claims, this study could make a meaningful contribution to understanding how nucleocytoplasmic transport and other cellular pathways influence TDP-43 phase transitions and aggregation. The work should be reframed as an important screening study that identifies nuclear export as one among several cellular processes that modulate TDP-43 phase behavior in a model system, rather than as a definitive demonstration that nuclear export governs pathological TDP-43 aggregation in disease.

      We will reframe the study as an important screening study that identifies nuclear export among several other pathways as modulators of TDP-43 phase behavior.

      Reviewer #2 (Public review):

      Summary:

      This manuscript addresses an important and timely question in TDP-43 biology by systematically identifying regulators of TDP-43 anisosome formation, with a particular focus on nuclear export via XPO1. Using a combination of unbiased chemical screening, genetic perturbation, and advanced imaging approaches, the authors propose that inhibition of nuclear export modulates the abundance and biophysical properties of TDP-43 anisosomes. The study is conceptually innovative and has potential relevance for neurodegenerative diseases characterized by TDP-43 pathology. However, significant concerns regarding experimental controls, reporting transparency, and model translatability currently limit the strength of the conclusions and the interpretability of several key findings.

      We thank the reviewer for acknowledging the significance and innovation of our study.

      Strengths:

      (1) The study employs an unbiased, hypothesis-free compound screen to identify regulators of TDP-43 anisosome formation, which is a major strength and reduces confirmation bias.

      (2) The authors combine chemical and genetic screening approaches, providing orthogonal validation of key pathways and increasing confidence in the biological relevance of top hits.

      (3) The focus on biophysical properties of TDP-43 assemblies, assessed through imaging and FRAP, moves beyond simple presence/absence of aggregates and provides mechanistic insight into the biophysical states of TDP-43.

      (4) The use of multiple experimental modalities, including live-cell imaging, FRAP, pharmacological perturbation, and transcriptomic analysis, reflects a technically sophisticated and ambitious study design.

      (5) The authors attempt to extend findings beyond immortalized cancer cell lines by incorporating organoid models, demonstrating awareness of disease relevance and translational importance.

      Overall, the manuscript is clearly written and logically structured, making complex experimental workflows accessible and the central hypotheses easy to follow.

      Weaknesses:

      Despite its strengths, the manuscript has several major limitations that affect data interpretation and confidence in the conclusions.

      (1) Lack of appropriate controls for overexpression experiments:

      A central concern is the absence of proper controls for TDP-43 and XPO1 overexpression. Prior studies (including those cited by the authors, Archbold et al.2018) show that overexpression of WT TDP-43 alone is toxic to neurons. Thus, the experimental system itself may induce anisosome formation independently of the mechanisms under study. Similarly, XPO1 overexpression lacks a suitable control (e.g., mCherry alone or mCherry fused to a protein known to be independent of TDP-43). The near-complete colocalization of XPO1 with TDP-43 anisosomes upon overexpression raises the possibility that these structures reflect non-physiological protein accumulation rather than regulated assemblies.

      As mentioned in our response to reviewer 1, point 1, we will add more discussion regarding the use of acetylation mimetics in our study. We agree with the reviewer that these large puncta (both anisosomes and gel-like structures) likely resulted from TDP-43 overexpression. Nevertheless, in a titration experiment done by Yu et al. 2020 (PMID: 33335017), they showed that ectopic TDP-43 undergo demixing even at concentrations lower than endogenous TDP-43, although the demixed puncta were very small. Their result suggested that overexpression per se does not change TDP-43 phase behavior, only enlarging the demixed TDP-43 structures. This is necessary for our screen and imaging-based characterization. We will revise the text to clarify this point.

      For XPO1, we did include mCherry alone control in the study but due to space limit in Figure 5, we did not include it. We can put the data in a Supplementary Figure during revision.

      (2) Insufficient experimental and analytical transparency:

      The manuscript frequently lacks clear reporting of experimental details. In multiple figures, the stated number of independent experiments does not match the number of data points shown, making it difficult to assess statistical validity. Concentrations used in the compound screen are not clearly defined, nor is it stated whether multiple concentrations were tested. It is unclear how many wells, cells, or independent cultures were analyzed. The criteria used to reduce 1,533 screening hits to 211 candidates via STRING analysis are not explained. Knockdown and overexpression efficiencies are not reported.

      We apologize for these omissions. We will add more experimental details to the figure legends and the method part. For the imaging experiments, data points reflect randomly selected individual cells imaged in 2-3 independent biological repeats. For chemical screens, we screened against NCATS libraries first at top concentration (10 mM) to ensure inhibitory efficacy for all compounds. In the follow-up study, we validated the top hits using a series of concentrations, as shown in Figure 1B.

      We will explain the STRING analysis in more detail. We did not check XPO1 knockdown efficiency in high through-put screens (HTS) for several reasons. Firstly, the large number of positive hits makes it impossible to check knockdown efficiency for all these hits. Secondly, the effect of XPO1 knockdown on anisosomes was seen with 6 different siRNAs in two rounds of screens. Thirdly, in the HTS protocol, we routinely included a transfection control (siRNAdeath) to indicate high transfection efficiency. We would only process the data if siRNAdeath control killed > 90% of the cells.

      (3) RNA-seq concerns:

      The RNA-seq experiments are particularly problematic. The number of biological replicates per condition is not stated, and heatmaps suggest that only one sample per group may have been used, which would preclude statistical analysis. No baseline comparison between WT and mutant TDP-43 is shown. Given that TDP-43 is an RNA-binding protein, splicing analyses would be far more informative than gene expression alone, yet no splicing data are presented. Moreover, nuclear retention of TDP-43 does not preclude nuclear aggregation, which may still impair its splicing function.

      We apologize for the lack of clarity regarding the RNA-seq design. For each condition, organoids of two independently differentiated batches were treated in triplicate. We pooled the organoids of the same treatment from the two batches to reduce the impact of batch variation.

      Given the criticisms from both reviewer 1 and 2 on the limitation of the RNAseq study, we plan to remove this data from the revised manuscript.

      (4) Limited translatability to neuronal biology:

      All anisosome analyses are performed in a cancer cell line, raising concerns about relevance to post-mitotic neurons. While organoids are used as a secondary model, the assays performed do not overlap with those used in cancer cells, making it difficult to assess whether anisosome-related mechanisms are conserved. Neuronal toxicity, a critical outcome given known TDP-43 biology, is not assessed. Prior work has shown that WT TDP-43 overexpression alone is toxic to neurons, yet this is not addressed.

      We agree with the reviewer that the model used in this study is not directly relevant to neurodegeneration. However, as pointed out by the reviewer, neurons are much more sensitive to TDP-43-associated toxicity. By contrast, the cell line used in this study can tolerate TDP-43 overexpression with no detectable cytotoxicity. This feature makes it feasible to evaluate how different cellular processes modulate TDP-43 phase behavior without the confounding effect from toxicity. The fact that TDP-43 expression was induced for a short period of time also help minimize the impact of toxicity. Notably, the processes identified by our screens are all house-keeping pathways that is present in neurons. Thus, we believe that the reported findings are likely applicable to neurons, though we will revise our paper to make sure that we don’t overstate the clinical relevance of our work.

      (5) Conceptual and interpretational gaps:

      The authors quantify anisosome number but also report conditions in which anisosome number decreases while size increases. The biological interpretation of larger anisosomes is not discussed, and whether this reflects improvement or worsening of pathology is unclear. Compounds targeting the same mechanism (e.g., nuclear export inhibition) are inconsistently used across experiments (KPT compounds, verdinexor, leptomycin B), raising concerns about reproducibility. In organoids, the experimental paradigm shifts to long-term treatment (35 days vs. 16 hours), further complicating interpretation.

      As pointed out by the reviewer 1 in point 4 above, we do not have evidence to establish a convincing correlation between the size of anisosomes and clinical phenotypes. Regarding the use of different drugs for different experiments, the initial screen identified KPT and Verdinexor because Leptomycin B was not in our library. In the follow-up studies, we switched to Leptomycin B because 1) it is commercially available; 2) it is highly potent and specific; 3) it was more commonly used as inhibitors of XPO1 according to the literature. However, for the organoid study, we had to switch back to KPT because of the toxicity issue associated with long-term application of Leptomycin B.

      (6) Overinterpretation of rescue effects:

      Although the authors state that they aim to test whether nuclear export inhibition rescues neuronal defects, no functional neuronal readouts are provided (e.g., viability, morphology, axon outgrowth, or electrophysiological measures). RNA-seq alone is insufficient to support claims of rescue.

      Our interpretation of the RNA-seq data was that the rescue effect by nuclear export inhibition was limited and likely insignificant. Given that this negative data is not conclusive, we will remove it from the revised manuscript.

      (7) Finally, the model does not appear to exhibit cytosolic TDP-43 aggregation at baseline. It remains unclear whether longer induction would produce cytosolic gel-like assemblies and whether these would be prevented by nuclear export inhibition. Long-term data are shown only in organoids, yet anisosome formation is not assessed there.

      The expression system used in the study reaches a steady state after 48 h of induction. At this point, we did not observe any gel-like structures. We can clarify this point during revision.

      Reviewer #3 (Public review):

      Summary:

      TDP-43 proteinopathy is broadly found in neurodegenerative diseases. This manuscript investigates how nuclear export influences the biophysical properties of TDP-43. The authors use a combination of chemical screening and genome-wide siRNA screening to identify pathways that modulate TDP-43 liquid-to-solid transitions. Overall, the study employs a broad array of approaches and addresses an important question in TDP-43 pathobiology. The identification of nuclear export as a central regulator is compelling and conceptually aligns with the emerging view that TDP-43 nucleocytoplasmic trafficking is a major defect in neurodegeneration.

      Strengths:

      This work integrates chemical and genetic screening to identify novel modifiers. The candidates were validated in both reporter cell lines and iPS-differentiated organoids. The findings support the nucleocytoplasmic transport is important for the biophysical properties of TDP-43.

      We thank the reviewer for acknowledging the significance and strength of our study.

      Weaknesses:

      The mechanisms underlying the connection between nuclear export and phase transition need further clarification. Broader consequences of XPO1 inhibition are not addressed.

      We agree that our study does not address how nuclear export inhibition affect TDP-43 phase behavior. As discussed in the paper, we proposed that the effect of nuclear export inhibition on TDP-43 phase separation is likely indirect. The most likely scenario is that inhibition of nuclear export changes the nuclear environment over time, which affects TDP-43 phase separation. We have tried to isolate nuclear extracts from control and LMB-treated cells and used mass spec to identify proteins that are differentially present in the nucleus. However, knockdown of the identified top candidates did not abolish LMB-induced phase alteration. Considering our observation that RNA splicing is another modulator of TDP-43 phase behavior, it is possible that it is the combined change of RNA and protein composition in the nucleus that alters TDP-43 phase behavior. However, defining the mechanism would require substantial work that is beyond the scope of the current study.

  3. Feb 2026
    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public review:

      Weaknesses:

      (1) Controls for the genetic background are incomplete, leaving open the possibility that the observed oviposition timing defects may be due to targeted knockdown of the period (per) gene but from the GAL4, Gal80, and UAS transgenes themselves. To resolve this issue the authors should determine the egg-laying rhythms of the relevant controls (GAL4/+, UAS-RNAi/+, etc); this only needs to be done for those genotypes that produced an arrhythmic egg-laying rhythm.

      (2) Reliance on a single genetic tool to generate targeted disruption of clock function leaves the study vulnerable to associated false positive and false negative effects: a) The per RNAi transgene used may only cause partial knockdown of gene function, as suggested by the persistent rhythmicity observed when per RNAi was targeted to all clock neurons. This could indicate that the results in Fig 2C-H underestimate the phenotypes of targeted disruption of clock function. b) Use of a single per RNAi transgene makes it difficult to rule out that off-target effects contributed significantly to the observed phenotypes. We suggest that the authors repeat the critical experiments using a separate UAS-RNAi line (for period or for a different clock gene), or, better yet, use the dominant negative UAS-cycle transgene produced by the Hardin lab (https://doi.org/10.1038/22566).

      We have followed the referee advice,repeating the experiments with the dominant negative UAS-cyc<sup>DN</sup>. They nicely confirm our conclusions: the abolition of the cellular clock in LNd neurons rule out the rhythmicity of oviposition. The results are presented in Fig. 3 of the new manuscript, panels H to N. We thank the reviewer for this suggestion that has definitely improved our paper, since it allows us to confirm our result using both a different driver and a different UAS sequence. In addition, we included the required GAL4 controls, which can be found in Panels E, L of the figure as well as average egglaying profiles for all genotypes involved (Panels B, D, F, I, K and M). Regarding the MB122Bsplit-Gal4>UAS-per<sup>RNAi</sup> experiment, we moved it to a supplementary figure (Figure 3S1). The paragraph where the new Figure 3 is discussed has been modified accordingly.

      (3) The egg-laying profiles obtained show clear damping/decaying trends which necessitates careful trend removal from the data to make any sense of the rhythm. Further, the detrending approach used by the authors is not tested for artifacts introduced by the 24h moving average used.

      The method used for the assessment of rhythmicity is now more fully explained and tested in the supplementary material. In particular, the issue of trend removal is treated in the second section of the SM, and the absence of "artifacts" (interpreted as the possibility of deciding that a signal is rhythmic when it is not, or vice versa) shown in figs. S3 to S5.

      (4) According to the authors the oviposition device cannot sample at a resolution finer than 4 hours, which will compel any experimenter to record egg laying for longer durations to have a suitably long time series which could be useful for circadian analyses.

      The choice of sampling every 4 hours is not due to a limitation imposed by the device used. In fact the device can be programmed to move at whatever times are desired. As mentioned in the Material and Methods section, "more frequent sampling gives rise to less consistent rhythmic patterns", because the number of eggs sampled at each time slot become too small. In particular, we have tested sampling at intervals of 2 hours, and we have observed that this doubles the work performed by the experimenter but does not lead to an improvement in the assessment of rhythmicity.

      (5) Despite reducing the interference caused by manually measuring egg-laying, the rhythm does not improve the signal quality such that enough individual rhythmic flies could be included in the analysis methods used. The authors devise a workaround by combining both strongly and weakly rhythmic (LSpower > 0.2 but less than LSpower at p < 0.05) data series into an averaged time series, which is then tested for the presence of a 16-32h "circadian" rhythm. This approach loses valuable information about the phase and period present in the individual mated females, and instead assumes that all flies have a similar period and phase in their "signal" component while the distribution of the "noise" component varies amongst them. This assumption has not yet been tested rigorously and the evidence suggests a lot more variability in the inter-fly period for the egg-laying rhythm.

      As stressed in the paper, and in the new Supplementary Material, the individual egg records are very noisy, which in general precludes the extraction of any information about the underlying period and phase. The workaround we (and others, e.g. Howlader et al. 2006) have used is analyzing average egg records for each genotype. Even though this implies assuming the same period and phase for all individuals, we have observed, using experiments with synthetic data, that small variations in individual periods (of the same amount as those present in real experiments where the period of some flies can be assessed individually) still allow us to use our method to decide if the genotype is rhythmic or not. This issue is discussed at length in the new Supplementary Material. There we also discuss an experiment with real flies, showing the individual records, and the corresponding periodograms, for each fly, for a rhythmic (Fig. S14) and an arrhythmic genotype (Fig. S17).

      (6) This variability could also depend on the genotype being tested, as the authors themselves observe between their Canton-S and YW wild-type controls for which their egg-laying profiles show clearly different dynamics. Interestingly, the averaged records for these genotypes are not distinguishable but are reflected in the different proportions of rhythmic flies observed. Unfortunately, the authors also do not provide further data on these averaged profiles, as they did for the wild-type controls in Figure 1, when they discuss their clock circuit manipulations using perRNAi. These profiles could have been included in Supplementary figures, where they would have helped the reader decide for themselves what might have been the reason for the loss of power in the LS periodogram for some of these experimental lines.

      We have added the individual periodograms of the arrhythmic lines to the Supplementary material (Figs. 3S2, 3S5 and panel G of Fig. 3S1), where they can be compared with their respective controls (Figs 3S3, 3S4, 3S6, 3S7 and panel F of Fig. 3S1).

      (7) By selecting 'the best egg layers' for inclusion in the oviposition analyses an inadvertent bias may be introduced and the results of the assays may not be representative of the whole population.

      We agree that the results may be biased for 'the best egg layers'. We remark however, that the flies that have been left out lay very few eggs, some of them even laying no eggs on a whole day. For these flies it is difficult to understand how one can even speak of egg laying rhythmicity (let alone how one can experimentally assess it). Thus, we think it might be misleading to speak of results as "representative of the whole population". Furthermore, it is even possible that the very concept of egg laying rhythmicity makes little sense if flies do not lay enough eggs.

      (8) An approach that measures rhythmicity for groups of individual records rather than separate individual records is vulnerable to outliers in the data, such as the inclusion of a single anomalous individual record. Additionally, the number of individual records that are included in a group may become a somewhat arbitrary determinant for the observed level of rhythmicity. Therefore, the experimental data used to map the clock neurons responsible for oviposition rhythms would be more convincing if presented alongside individual fly statistics, in the same format as used for Figure 1.

      In general, we have checked that there are no "outliers", in the sense of flies that lay many more eggs than the others in the experiment. But maybe the reviewer is referring to the possibility that a few rhythmic flies make the average rhythmic. This issue is addressed in the supplementary material, at the end of section "Example of rhythmicity assessment for a synthetic experiment". In short, we found that eliminating some of the most rhythmic flies from a rhythmic population makes the average a bit less rhythmic, but still significantly so. Conversely, if these flies are transferred to an arrhythmic population, the average is still non rhythmic.

      Regarding "the number of individual records that are included in a group may become a somewhat arbitrary determinant for the observed level of rhythmicity", we stress that we have not performed a selection of flies for the averages. All of the flies tested are included in the average, independently of their individual rhythmicity, provided only that they lay enough eggs.

      (9) The features in the experimental periodogram data in Figures 3B and D are consistent with weakened complex rhythmicity rather than arrhythmicity. The inclusion of more individual records in the groups might have provided the added statistical power to demonstrate this. Graphs similar to those in 1G and 1I, might have better illustrated qualitative and quantitative aspects of the oviposition rhythms upon per knockdown via MB122B and Mai179; Pdf-Gal80.

      We are aware that in the studies of the rhythmicity of locomotor activity the presence of two significant peaks is usually interpreted as a “complex rhythm”, i.e. as evidence of the existence of two different mechanisms producing two different rhythms in the same individual. In our case, since the periodograms we show assess the rhythmicity of the average time series of several individuals, the two non-significant peaks could also correspond to the periods of two different subpopulations of individuals. However, a close examination of the individual periodograms, now provided as Supplementary Figures 3S2 to 3S9, does not show any convincing evidence of any of these two possibilities.

      Another possibility could be that such peaks are simply an artifact of the method in the analysis of time series that consist of very few cycles and also few points per cycle. In the supplemenatry material we show that this can indeed happen. Consider, for example, periodograms 2 and 4 in Fig. S12 of the SM. Even though both of them display two non significant peaks, these periodograms correspond to two synthetic time series that are completely arrhythmic.

      We have added to the manuscript a paragraph discussing the issue of possible bimodality (next to last paragraph in subsection "The molecular clock in Cry+ LNd neurons is necessary for rhythmic egg-laying").

      Wider context:

      The study of the neural basis of oviposition rhythms in Drosophila melanogaster can serve as a model for the analogous mechanisms in other animals. In particular, research in this area can have wider implications for the management of insects with societal impact such as pests, disease vectors, and pollinators. One key aspect of D. melanogaster oviposition that is not addressed here is its strong social modulation (see Bailly et al.. Curr Biol 33:2865-2877.e4. doi:10.1016/j.cub.2023.05.074). It is plausible that most natural oviposition events do not involve isolated individuals, but rather groups of flies. As oviposition is encouraged by aggregation pheromones (e.g., Dumenil et al., J Chem Ecol 2016 https://link.springer.com/article/10.1007/s10886-016-0681-3) its propensity changes upon the pre-conditioning of the oviposition substrates, which is a complication in assays of oviposition rhythms that periodically move the flies to fresh substrate.

      We agree that social modulation can be important for oviposition, as has been shown in the paper cited by the reviewer. But we think that, in order to understand the contribution of social modulation to oviposition, it is important to know, as a reference for comparisons, what the flies do when they are isolated. Our aim in this work has been to provide such a reference.

      Recommendations for the authors:

      (1) The weaknesses identified in the Public review could be addressed as follows: etc.

      We have followed the suggestions of the editor and addressed each of the weaknesses mentioned (see details above).

      (2) Could the authors comment on their choice of using individual flies for their assay rather than (small) groups of flies? Is it possible that their assay would produce less noisy results with the latter?

      First we want to emphasize that our aim here was to assess the presence of individual rhythmicity, free from any external influences, whether arising from environmental external cues (such as light or temperature changes) or by social interactions (with other females or males). However, we were also curious about the behavior when males were put in the same chamber with each female. We performed a few tests and the results were very similar to what we obtained with single females.

      (3) Minor points:

      (a) Line 57-58 - "around 24 h and a peak near night onset (Manjunatha et al., 2008). Egglaying rhythmicity is temperature-compensated and remains invariant despite the nutritional state": Rephrase to something simpler like temperature and nutrition compensated.

      Corrected.

      (b) Line 56-57 - "The circadian nature of this behavior was revealed by its persistence under DD with a period around 24 h and a peak near night onset (Manjunatha et al., 2008)." A better reference here would be to Sheeba et al, 2001 for preliminary investigations into the egg-laying rhythms of individual flies and McCabe and Birley, 1998 for groups of flies under LD12:12 and DD.

      Suggestion accepted.

      (c) Line 65-67 - "We determined..... molecular clock in the entire clock network reduced the LNv did not." This suggests that it was unknown until now that LNv does not have a role, whereas Howlader et al 2006 already suggested that. The reader becomes aware of this at a later part of the manuscript. Please revise.

      This has been revised, and the citation to Howlader et al 2006 added to the new sentence.

      (d) Line 67 - "impairing the molecular clock in the entire clock network reduced the circadian rhythm of.."; saying "Reduced the power of the circadian rhythm" might be better phrasing."

      Suggestion accepted.

      (e) Line 72 - using the Janelia hemibrain dataset.

      Corrected

      (f) Line 72 typo "ussing", should be 'using'.

      Corrected.

      (g) Line 94: why is the periodic signal the same for all on the first day of DD?

      It is well known that in LD conditions activity is driven by the environmental light-dark cycle, which entrains the endogenous circadian clock of all flies. Even after the transition to DD, the effects of this entrainment persist for a few days, allowing the individual rhythmic patterns set by the light-dark cycle to remain synchronized for at least a few cycles. We are assuming that the same happens with oviposition. A sentence has been added explaining this (beginning of third paragraph of subsection "Egg-laying is rhythmic when registered with a semiautomated egg collection device").

      (h) Figure 1A-D, Were all flies included or only rhythmic flies? Please make this clear. How do you distinguish rhythmic and arrhythmic flies in Figure 1E? Their representative individual plots of egg number graphs are required. Why was the number of flies under DD decreased from 20 to 18?

      Throughout the paper, the analysis of average rhythmicity has been performed including all flies, since we postulate that even flies that individually can be classified as non rhythmic have a rhythm that is corrupted by noise, and that this noise can be partially subtracted by performing an average. The explanation of the characterization of rhythmic and arrhythmic individuals is in the Methods section, under the Data Analysis subsection. This is now fully developed in the Supplementary material, where the individual plots for some of the genotypes are included.

      Regarding the question of the number of flies having "decreased from 20 to 18?", there is a misunderstanding here. The results depicted in Figure 1, and in particular in panel E, correspond to two different experiments: one performed only in LD (7 days, n=20), and a second one performed for 5 days in DD, with one previous day in LD (n=18).

      (i) Figure E and K, Are n=20, 18, and n=30, 22 the total numbers of flies including both rhythmic and nonrhythmic? If so, it would be better to put them in the column, not in the rhythmic column.

      The figure has been corrected.

      (j) Line 107-108, please provide a citation for this statement.

      We have added two references: Shindey et al. 2016, and Deppisch et al. 2022.

      (k) Figure 1, 2, etc., please write a peak value inside the periodogram graph. This makes comparison easier.

      The peak values have been added in all Figures.

      (l) Line 184-185, Figure 2F, tau appears shorter in Clk4.1>perRNAi flies than in control, which suggests that DNp1 may play a role?

      As explained in the Supplementary Material, the particularities of oviposition records (discrete values, noise, few samples per period, etc.) preclude an accurate determination of the period if the record is considered as rhythmic. In particular, Fig. S4 shows that differences of 1 hour between the real and the estimated periods are not unusual.

      (m) Figure 4. Why are 2 controls shown? Please explain. Are they the same strains?

      The two controls shown are the UAS control and the GAL4 control. This information has now been added to the figure.

      (n) Line 314 'that' should be 'than'?

      Corrected.

      (o) Line 73-74 - Phrasing is not clear in: "LNds and oviposition neurons, consisting with, the essential role of LNds neurons in the control of this behavior.""

      Corrected.

      (p) Line 81-84 - "the experiments particularly demanding and labor-intensive. In this approach, eggs are typically collected every 4 hours (sometimes also every 2 hours), which usually implies transferring the fly to a new vial or extracting the food with the eggs and replacing it with fresh food in the same vial (McCabe and Birley, 1998; Menon et al., 2014)." McCabe and Birley had an automated egg collection device designed for groups of flies, which sampled eggs laid every hour for 6 days. Please remove this reference in this context

      Reference removed.

      (q) Line 91-92 - "The assessment of oviposition rhythmicity is challenging because the decision of laying an egg relies on many different internal and external factors making this behavior very noisy." This sentence makes it appear that 'assessment' is the limitation. Even locomotor activity is governed by many internal and external factors, yet we can obtain very robust rhythms. The sentence that follows is also not easy to digest. Can the authors frame the idea better?

      We have rewritten the corresponding paragraph in order to make it more clear (second paragraph of the Results section). Additionally, the Supplementary Material contains now a more detailed explanation and analysis of the method used.

      (r) Line 104-107 - rhythmic (with a period close to 24 h, Figure 1F) although the average egg record is strongly rhythmic with a period around 24 h (Figure 1B). Under DD condition, individual rhythmicity percentages are the same as in LD (Figure 1E) and their average record is also very rhythmic with a period of 24 h (Figure 1D). 'Strongly rhythmic' and 'very rhythmic' are less indicative of what is happening with the oviposition rhythm and can be phrased as robust instead, with a focus on their power measured.

      We have accepted the suggestion.

      (s) Line 108-110 - "Thus, egg-laying displays a much larger variability than locomotor activity, compounding the difficulty of observing the influence of the circadian clock on this behavior." The section discussed here does not illustrate the variability in egg-laying as much as the lack of robustness of the rhythm. The variation in rhythmicity going from CS flies (~70% rhythmic) to yw flies (~50% rhythmic) showcases the variability in this rhythm and how it is difficult to observe when compared to locomotor rhythms, which are usually consistently >90% rhythmic across multiple genotypes. These lines can be placed after the discussion about yw and perS flies. Moreover, previous studies using individual flies have reported that egg-laying rhythm is more variable than others Figure 1, Sheeba et al 2001.

      We have accepted the suggestion, replacing "Thus, egg-laying displays a much larger variability than locomotor activity..." by "This shows that, at the individual level, egg-laying is much less robust than locomotor activity ..."

      (t) Figure 1. Genotype notation within the figure panels is not consistent with the accepted / conventional notation or with the main text or legend notations throughout the manuscript.

      We are sorry for this mistake. We have corrected the genotype names in Figures and text in order to make notation consistent across the paper.

      (u) Supplementary Figure 1 Legend. Error in upper right corner? Not left corner? The photo does not clearly show the apparatus. The authors may wish to consider clearer images and more details about the apparatus including details of the 3D printing of the device and perhaps even include a short video where the motor moves the flies to a new chamber (This is only a suggestion to advertise the apparatus, not related to the review of the manuscript). They could also provide information about what fraction of females survived till the end of each trial when 21 flies were examined with 4-hour sampling across 4-5 cycles.

      In general, more than 80% of the females are alive at the end of a one week oviposition experiment. We have added this information in the Methods section at the end of the corresponding subsection ("Automated egg collection device"). Regarding the eggcollection device, we have replaced the photographs in what is now Supplementary Figure 1S1, and a short supplementary movie showing its operation.

      (v) The results depicted in Figure 2B are that of averaged time series. Hence the reader does not know 'the fact' that knocked-down animals are not completely rhythmic. Is the "not completely arrhythmic" in reference to flies with a power > 0.2 (weakly rhythmic) in their egg-laying rhythm or to the presence of ~40% of male flies (Supplementary Table 1) with a locomotor rhythm after perRNAi silencing of most of their clock neurons? This is confusing because no intermediate category of flies is discussed in Figure 2. Please edit for clarity.

      We were referring to the rhythmicity of the genotype, not of the individuals. We have rewritten the corresponding paragraph in order to make it clearer (last paragraph of the first subsection of the Results section).

      (w) Line 173 - ablation or electrically silencing all PDF+ neurons (Howlader et al., 2006). There were no experiments carried out using electrical silencing of PDF+ neurons in the referenced paper.

      We are sorry for this mistake. This has been corrected (we have deleted the mention to electrical silencing).

      (x) Line 173 - Shortening of period by nearly 3 hours cannot be considered minor.

      We agree, and we have deleted the word "minor".

      (y) Line 332-333 - "We also disrupted the molecular clock (or electrically silenced) in PDFexpressing neurons as well as in the DN1p group with no apparent effect on egg-laying rhythms". There was period shortening observed for pdf GAL4 > perRNAi manipulation so there was an effect on the egg-laying rhythm. Additionally, perRNAi based silencing does not electrically silence PDF neurons as the kir 2.1 was expressed only using Clk4.1 GAL4 in the Dn1ps. This line should be rewritten.

      We have rewritten the paragraph mentioned (third paragraph of the Discussion) in order to make it more accurate.

      (4) Page 22 - Data Analysis

      Since the number of eggs laid by a mated female tend to show a downward trend, we proceeded as follows, in order to detrend the data (see the Supplementary Material for further details). First, a moving average of the data is performed, with a 6 point window, and a new time series T is obtained. In principle, T is a good approximation to the trend of the data. Then, a new, detrended, time series D is generated by pointwise dividing the two series (i.e. D(i)=E(i)/T(i), where i indexes the points of each series)." Can the authors provide a reference for this method of detrending? Smoothing can frequently introduce artifacts in the data and give incorrect period estimates. Additionally, the trend visible in the data, especially in Figure 1, suggests a linear decay that can be easily subtracted. Also, there is no discussion of detrending in the Supplementary material attached.

      We are sorry for the confusion with the Supplementary materials. The method used for subtracting both noise and trend from the data is now fully explained in the new Supplementary Material. All the issues raised by the reviewer in this comment have been addressed there.

      (5) Figure by figure

      Page - Type (Figure or text) - Comment

      (a) Page 6 Figure 1C There is remarkable phase coherence seen in the average egg laying time series for CS flies 5 days into DD and as the authors note in Lines 94-95 in the text "Under light-dark (LD) conditions, or in the first days of DD, it can be that the periodic signal is the same for all flies". Since this observation is crucial to constructing the figures seen later in the paper, a note should be made about why this rhythm could persist across flies, so deep into DD.

      As mentioned above, we have added a couple of lines explaining why we think that the assumption of a synchronized periodic signal is reasonable, at least during the first cycles (second paragraph of the first subsection of section Results).

      (b) Figure 1 G The effect of period/phase decoherence seems to be showing up here in the average profile for yw flies as they seem to completely dampen out after 2 days in DD and yet have a 24-hour rhythm in the averaged periodogram. The authors should make a note here if the LS periodogram is over-representing the periodicity of the first few days in DD or if comparing the first 3 vs. the last 3 days in DD gives different results.

      The dampening observed in average oviposition records is a product of the dampening of the oviposition records, which is well known phenomenon, probably caused by the depletion of sperm in the female spermatheque. One of the aims of the method used in the paper was to avoid the bias introduced by this dampening, by means of a detrending procedure. This is explained in the Materials an Methods, and now full details are given in the new Supplementary Materials.

      (c) Figure 1E, K Is this data pooled across 2-3 experiments, as discussed in lines 500-01 under 'Statistical Analysis'? Also, what test is being performed to check for differences between proportions here, seeing as there are no error bars to denote error around a mean value and no other viable tests mentioned in Statistical Analysis?

      We are sorry for this omission. For the comparison of proportions we used the 'N-1' Chisquared test. We have added a sentence detailing this at the end of the Statistical analysis section.

      (d) Figure 1 F, L Can the total number of weakly and strongly rhythmic values be indicated in the scatter plot?

      Corrected.

      (e) Figure 1F, L (legend) Is the Chi-squared test being performed on the proportion values of Figure 1(E, K) or for Figure 1(F, L)?"

      The chi-squared test mentioned was used for Fig1 F-L. As explained above, for the comparison of proportions we used 'N-1' Chi-squared test. This has now been added to the legend of the figure

      (f) Page 8 Figure 2B Seeing as individual flies with a LS periodogram power < 0.2 are considered weakly rhythmic in Figure 1 F, L can Clk856 > perRNAi flies on average also be considered weakly rhythmic, as the peak in the periodogram is above 0.3?

      We prefer to use the weakly rhythmic class only for individual flies. Nevertheless, we agree that this periodogram shows that the genotype analyzed is not completely arrhythmic, and that this might be due to some remaining individual rhythmicity. As mentioned above, we have rewritten the last paragraph of the first subsection of section Results in order to discuss this.

      (g) Figure 2D Can the authors comment on why there is a shorter period rhythm when PDF neurons have a dysfunctional clock, whereas previous evidence (Howlader et al., 2004) suggested that these neurons play no role in egg-laying rhythm? They should also refer to McCabe and Birley, 1998 to see if their results (where they observed a shorter period of ~19h with groups of per0 flies), might be of interest in their interpretations.

      We have added a line commenting this in the corresponding subsection ("LNv and DN1 neurons are not necessary for egg-laying rhythmicity") of the Results, as well as a discussion of this in the third paragraph of the Discussion. In a nutshell, even though Howlader et al did not find a shortening when PDF neurons are ablated, they did find it in pdf01 flies.

      (h) Figure 2 F, H As the authors mention in their Discussion on Page 16, lines 340-45, the manipulation of DN1p neurons might abolish the circadian rhythm in oogenesis as reported by Zhang et al, which is why they looked at this circuit driven by Clk4.1 neurons and comment that "The persistence of the rhythm of oviposition implies that it is not based on the availability of eggs but is instead an intrinsic property of the motor program". However, no change in fecundity is reported for either kir2.1 or perRNAi-based manipulations of these neurons, to help the reader understand if egg availability (at the level of egg formation) is playing any role in the downstream (and seemingly independent) act of egg laying. The authors should report if they see any change in total fecundity for either set of flies w.r.t their respective controls. Also, is the reduction in power seen with electrical silencing vs perRNAi expression of any relevance? Does the percentage of rhythmic flies change between these two manipulations?

      In the line mentioned by the reviewer what we meant is that our results show that the rhythm of oviposition does not seem to be based in the rhythmic production of oocytes, which is not necessarily connected with the total number of eggs produced. We have modified the corresponding line in the paper, in order to avoid this misunderstanding. Regarding the "reduction in power" mentioned, it must be stressed that, in general, the height of the peak is correlated with the fraction of rhythmic individuals. The problem is that this fraction is a much more noisy output, and that is the reason why we have chosen to work with periodograms of averages.

      (i) Figure 2 E and G, a loss of rhythmicity could also be due to a decrease in fecundity in the experimental lines. Since the number of eggs laid for each genotype is already known, can the authors show statistically relevant comparisons between the experimental lines and their respective controls? In this vein, can the averaged time series profiles also be provided for all the genotypes tested (as seen previously in Figure 1 A, C, G, I), perhaps in the supplementary?

      We did not focus on fecundity in the present work. However, our observations do not seem to show any definite relationship with rhythmicity. We plan to address the issue of fecundity more systematically in a future work. The averaged time series profiles have now been added to the figure.

      (j) Scatter plots showing the average period and SEM as seen in Figure 1 (F, L) would help in understanding if these manipulations have any effect on variation in the period of the egg-laying rhythm across flies. Particularly for pdf GAL4 > perRNAi flies which have a net shorter period, (but this might vary across the 34 flies tested).

      We have added a Supplementary Figure (2S1) that shows that the shortening of oviposition period can be also observed at the individual level. We have also added a line commenting this in the corresponding subsection ("LNv and DN1 neurons are not necessary for egg-laying rhythmicity") of the Results, as well as a discussion of this in the third paragraph of the Discussion.

      (k) Page 11 Figure 3B Does the presence of two peaks in the LS periodogram at a power > 0.2 indicate the presence of weakly rhythmic flies with both a short(20h) and a long(~27h) period component or either one? The short-period peak is nearly at p < 0.05 level of significance. So then, do most of the flies in MB122B GAL4 > perRNAi line show a weakly rhythmic shorter period?

      (l) Figure 3D A similar peak is observed again at 20h (LS power > 0.2 and nearly at p < 0.05 significance level again) and a different longer one at (~30h) though this one is almost near 0.2 on the power scale. Given the consistency of this feature in both LNd manipulations, the authors should comment on whether this is driven by variation in periods detected or the presence of complex rhythms (splitting or change in period) in the oviposition time series for these lines.

      (m) Figure 3 General scatter plots showing average period {plus minus} SEM could help explain the bimodality seen in the periodograms. Additionally indicating just how many flies are weakly rhythmic vs. strongly rhythmic can also help to illustrate how important the CRY+ LnDs are to the oviposition rhythm's stability.

      For these three comments (k, l and m), we note that the issue of bimodality has been addressed above, in our response to Weakness 9.

      (o) Figure 4B Same as comments under Figure 1, what is the statistical test done to compare the proportions for these three genotypes?

      As mentioned above, for the comparison of proportions we used the 'N-1' Chi-squared test. We have added a sentence detailing this at the end of the Statistical analysis section.

      (p) Figure 4C Are all flies significantly rhythmic? The authors should also provide an averaged LS periodogram measure for each genotype, to help illustrate the difference in power between activity-rest and egg-laying rhythms.

      Yes, the points represent periods of (significantly) rhythmic flies. This has been added to the caption, to avoid misunderstandings. The differences that arise when assessing rhythmicity in activity records vs. egg-laying records is addressed at length in the Supplementary Material (see e.g. Fig S1).

      (q) Page 15 Figure 5 - general As the authors discuss the possible contribution of DN1ps to evening activity and control over oogenesis rhythm, investigating the connections of the few that are characterized in the connectome (or lack thereof) with the Oviposition neurons, can help illustrate the distinct role they play in the female Drosophila's reproductive rhythm.

      This information was in the text and the Supplementary Tables. Lines 273-275 of the old manuscript read: "The full results are displayed in Supplementary Tables 2 and Table 3, but in short, we found that whereas there are no connections between LNv or DN1 neurons and oviposition neurons..."

      (r) Minor: The dark shading of the circles depicting some of the clusters makes it difficult to read. Consider changing the colors or moving the names outside the circles.

      Figure corrected.

      (s) Line 38: The estimated number of clock neurons has been revised recently (https://www.biorxiv.org/content/10.1101/2023.09.11.557222v2.article-info).

      Thank you for the reference. We have corrected the number of clock neurons in the Introduction of the new manuscript.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      In this study, Li et al. used genetically engineered murine intestinal organoids to investigate how the temporal order of oncogenic mutations influences cell state and tumourigenicity of colorectal epithelial cells. By sequentially introducing Apc and Trp53 loss-of-function mutations in alternate orders within a Kras^G12D background, the authors generated isogenic organoid lines for both in vitro and in vivo characterisation. Bulk RNA-seq reveals expected transcriptional changes with relatively modest differences between the two triple-mutant configurations (KAT vs KTA). The key finding emerges from transplantation assays: while KAT and KTA organoids show equivalent tumourigenic potential in immunodeficient mice, only KAT organoids form tumours in immunocompetent hosts (5/10 vs 0/10), suggesting that mutation order shapes susceptibility to immune-mediated clearance. The experiments are well-executed, and the conclusions are generally supported by the data. 

      Strengths: 

      The experimental system is well-designed for the question. By combining a Kras^G12D transgenic background with sequential CRISPR-mediated knockout of Apc and Trp53 in alternate orders, the authors generated truly isogenic organoid lines that differ only in mutational sequence. This is technically non-trivial and provides a clean platform for dissecting order effects, a question otherwise difficult to address experimentally. 

      The authors performed comprehensive baseline characterisation of these organoids, including morphological and histological assessment, quantification of organoid-forming efficiency and proliferation, and bulk RNA-seq profiling. While these analyses revealed no major differences between KAT and KTA organoids, and the observed enhancement of epithelial stemness upon Apc loss and proliferative advantage conferred by Trp53 loss are largely expected, the systematic nature of this characterisation establishes a useful methodological template for future organoid-based studies. 

      The authors further investigated the functional impact of mutational order using subcutaneous transplantation assays. By comparing tumour formation in immunodeficient versus immunocompetent hosts, the authors uncover a genuinely unexpected finding: KAT and KTA organoids behave equivalently in the absence of adaptive immunity, but diverge dramatically when immune pressure is applied (KAT: 5/10; KTA: 0/10). This observation is arguably the most compelling aspect of the study and opens an interesting line of inquiry. 

      We greatly appreciate your positive comments on our study.

      Weaknesses: 

      The authors acknowledge that initiating with Kras^G12D does not reflect the typical human sporadic CRC trajectory, where APC loss is usually the first event. While this design choice was pragmatic, it means the observed order effects are contextualised within an artificial starting point. It remains unclear whether the Apc/Trp53 order would matter in a Kras-wild-type background, or whether the Kras-driven cellular state is a prerequisite for these phenotypes to emerge. 

      We agree with the reviewer that initiating tumorigenesis with Kras<sup>G12D</sup> does not fully recapitulate the most common trajectory of sporadic human CRC, where APC loss typically occurs first. We had noted this point in the original Discussion and will further clarify it more explicitly in the Introduction part of the revised manuscript.

      Our experimental design was intended to establish a controlled and genetically tractable system to interrogate the principle of mutation order effects. In this context, Kras<sup>G12D</sup> activation provides a stable oncogenic baseline that facilitates sequential genome engineering and comparison of isogenic lines.

      Although APC loss is frequently the initiation event, a recent study has suggested that Kras<sup>G12D</sup> priming can reshape the selective landscape for subsequent driver events, including Apc alterations (PMID: 41339549). Consistent with this notion, our data indicate that Kras<sup>G12D</sup> activation induces a permissive oncogenic cellular state that may influence the phenotypic consequences of later mutations. We therefore speculate that the Kras<sup>G12D</sup>-primed context may contribute to the observed order-dependent effects.

      We agree that testing Apc/Trp53 order in a Kras-wild-type background would be an important future direction, and we will point this out explicitly in the revised Discussion.

      Subcutaneous implantation provides a tractable readout of tumourigenicity, but the cutaneous immune microenvironment differs substantially from that of the intestinal mucosa. Given that the central claim concerns immune-mediated selection, orthotopic transplantation would more directly test whether the observed order effects hold in a physiologically relevant context. 

      In the present study, we employed subcutaneous transplantation, which is a widely used platform to assess tumorigenic potential under controlled immune conditions. This approach offers high reproducibility, straightforward tumor monitoring, and has been broadly applied in organoid-based cancer studies in both immunodeficient (PMID: 23273993, 23776211, 32209571, 33055221) and immunocompetent (PMID: 32209571, 33055221, 41672595) settings.

      Importantly, our primary goal was to determine whether mutation order influences susceptibility to immune-mediated clearance, rather than to model the full complexity of the intestinal niche. The clear divergence between KAT and KTA specifically in immunocompetent hosts supports the existence of intrinsic mutation order-dependent immune vulnerability.

      Nevertheless, we fully agree with the reviewer that orthotopic transplantation would provide a more physiologically relevant immune microenvironment and represents also an important direction for future investigation. We will explicitly discuss this limitation and highlight orthotopic validation as an important future direction in the revised Discussion.

      The ssGSEA comparison involves only 14 ATK tumours, and the key comparisons (Figure 6E) yield borderline significance (p=0.052). More fundamentally, since mutation order cannot be inferred from the clinical samples, the authors are correlating organoid-derived IFN signatures with tumour immunophenotypes without direct evidence that these patients' tumours followed a KAT-like trajectory. The reasoning becomes circular: KAT organoids define the signature used to identify KAT-like clinical tumours. 

      We thank the reviewer for raising this important point. We would like to clarify that our intention was not to infer the actual mutation order in clinical samples, which indeed cannot be reliably reconstructed from bulk tumor RNA-seq data.

      Instead, our goal was to determine whether the transcriptional programs distinguishing KAT and KTA organoids could be observed in human CRC cohorts. In this context, the organoid-derived IFN-related signature was used as a molecular reference to assess potential clinical relevance, rather than to classify tumors by evolutionary trajectory.

      We agree that the statistical significance in Figure 6E is modest (p = 0.052), and we would like to revise the text to present this analysis more cautiously as a suggestive trend rather than definitive evidence. We will also clarify this limitation explicitly in the revised manuscript to avoid overinterpretation.

      Furthermore, the most striking finding of the study, that KTA organoids fail to form tumours in immunocompetent hosts while KAT organoids can, lacks a mechanistic follow-up. The transcriptomic differences between KAT and KTA are modest when cultured as monocultures, yet their in vivo fates diverge dramatically. The authors do not address why these subtle intrinsic differences translate into such divergent immune susceptibility, nor do they characterise the immune response adequately (beyond limited CD4/CD8 IHC at tumour peripheries). 

      We thank the reviewer for this important point. We agree that the mechanistic basis underlying the differential immune susceptibility between KAT and KTA remains incompletely resolved.

      A practical limitation of the current study is that KTA grafts failed to establish tumors in immunocompetent hosts, which precluded downstream histological and immune profiling of established lesions. As a result, our in vivo immune characterization of KTA grafts is nearly impossible.

      Nevertheless, our transcriptomic analyses indicate that KAT and KTA organoids differ in interferon-response and immune-related programs prior to transplantation, and those differentially expressed genes were consistently preserved in tumor cells derived from immunodeficient hosts. These results suggest the presence of intrinsic tumor-cell-autonomous differences may influence immune recognition and clearance.

      We will expand the Discussion to outline several non-mutually exclusive mechanisms that could account for this phenotype, including altered interferon responsiveness, differential antigen presentation capacity, and changes in tumor cell-intrinsic immune visibility programs. These hypotheses are consistent with the transcriptional differences observed prior to transplantation and provide a framework for future mechanistic investigation. We agree that deeper immune profiling (e.g., immune infiltrate composition, antigen presentation status, and functional immune assays) will be important to fully elucidate the mechanism and represents a key direction for future work.

      Reviewer #2 (Public review): 

      Summary: 

      This study addresses an important and timely question in colorectal cancer biology by systematically examining the effects of the common driver mutations APC, KRAS G12D, and TP53 in murine colorectal organoids, with particular emphasis on how the order of APC and TP53 acquisition influences tumor phenotype. These mutations are well known to be frequent, truncal, and often co-occurring in colorectal cancer. While it is increasingly appreciated that mutational order can shape tumor behavior, studies directly comparing the phenotypic consequences of alternative APC-TP53 mutation orders remain rare. This work, therefore, addresses a relevant and timely question. 

      Strengths: 

      A major strength of the study is its focus on previously unexplored biology, combined with the generation of multiple isogenic murine organoid models with controlled mutational sequences. The authors employ careful and robust quality control of the CRISPR-mediated alterations, and the inclusion of both in vitro and in vivo experiments strengthens the relevance of the work.

      We greatly appreciate your positive comments on our study.

      Weaknesses: 

      There are, however, several limitations that should be considered when interpreting the findings. First, KRAS G12D activation is used as the initiating alteration, whereas APC loss is generally believed to be the initiating event in most human colorectal cancers.

      We sincerely thank the reviewer for their insightful comments regarding the initiation of tumorigenesis with a Kras mutation rather than the more canonical Apc loss, which was also raised by the reviewer #1. We fully agree that the Apc-first represents the most prevalent sequence in human colorectal cancer (CRC), We will more clearly explain the rationale for our experimental design in the revised Introduction part as outlined in our response to reviewer #1.

      Second, the analysis is restricted to comparing only two mutation orders (KAT versus KTA), which limits the breadth of conclusions that can be drawn about mutation ordering more generally.

      We thank the reviewer for pointing this limitation out. However, as a proof-of-concept, study of Apc and Trp53 loss, two major oncogenic events in CRC, serves as a biologically meaningful starting point for dissecting order-dependent effects. Although it is of great significance to compare all six possible mutation orders of these three driver genes, generating and thoroughly characterizing all genotypes represents a substantial undertaking beyond the scope of this initial study.

      Finally, key RNA-sequencing and in vivo experiments rely on a single isogenic line, which substantially constrains interpretability. 

      The aim of the study was to systematically investigate how mutation accumulation and order influence colorectal cancer initiation. While the data suggest that the relative timing of APC and TP53 loss may be particularly important for tumor initiation, the absence of biological replication makes it difficult to draw robust conclusions. Engraftment efficiency and tumor behavior can be influenced by many factors for a single clone, including additional passenger mutations acquired during culturing, as well as epigenetic differences that are independent of the engineered mutations.

      We thank the reviewer for raising his/her concern. We apologize that we have not made a clear presentation of our data source. Indeed, for all major in vitro and in vivo assays of double and triple mutants (KA/KT/KAT/KTA), we analyzed at least two independently derived clones per genotype. These independent clones harbor distinct mutations in target genes and were treated as biological replicates throughout the study.

      To improve clarity and transparency, we will revise the relevant figures and figure legends to explicitly indicate the clonal origin of each data point.

    1. Author response:

      The following is the authors’ response to the original reviews

      We would like to sincerely thank the editor and reviewers for their thoughtful and constructive feedback on our manuscript. We are grateful not only for the close reading and insightful suggestions, but also for the open and generous way in which the reviewers engaged with our work. In revising the manuscript, we have clarified how our contribution is situated within the existing literature, conducted additional analyses to examine individual differences in exploration strategies, expanded and refined our description of the DDM analyses, and added correlations between strategies and other behavioral measures. We have also clarified methodological points, such as the estimation of thresholds, and provided new supplementary figures and analyses where appropriate. In several places, we have modified and qualified our interpretations in line with the reviewers’ comments. We believe these changes have significantly strengthened the manuscript, and we are grateful for the scientific dialogue with the reviewers.

      Review 1 (Public review):

      This manuscript reports on the behavior of participants playing a game to measure exploration. Specifically, participants completed a task with blocks of exploratory choices (choosing between two 'tables', and within each table, two 'card decks', each of which had a specific probability of showing cards with one color versus another) and test choices, where participants were asked to choose which of the two decks per table had a higher likelihood of one color. Blocks differed on how long (how many trials) the exploration phase lasted. Participants' choices were fit to increasingly complex models of next-trial exploration. Participants' choices were best fit by an intermediate model where the difference in uncertainty between tables influenced the choice. Next, the authors investigated factors affecting whether participants sought out or avoided uncertainty, their choice reaction times, and the relationship of these measures with performance during the test phase of each block. Participants were uncertainty-seeking (exploratory) under most levels of overall uncertainty but became less uncertainty-seeking at high levels of total uncertainty. Participants with a stronger tendency to approach uncertainty at lower levels of total uncertainty were more accurate in the test phase, while the tendency to avoid uncertainty when total uncertainty was high was also weakly positively related to test accuracy. In terms of reaction times, participants whose reaction times were more related to the level of uncertainty, and who deliberated longer, performed better. The individual tendency to repeat choices was related to avoidance of uncertainty under high total uncertainty and better test performance. Lastly, choices made after a longer lag were less affected by these measures.

      The authors note that their paradigm, which does not provide immediate rewarding feedback, is novel. However, the resulting behavior appears similar to other exploratory learning tasks, so it's unclear what this task design adds - besides perhaps showing that exploratory behavior is similar across types of reward environments. Several papers have shown that cognitive constraints modulate exploration (PMIDs: 30667262, 24664860, 35917612, 35260717); although this paper provides novel insights, it does not situate its findings in the context of this prior literature. As a result, what it adds to the literature is difficult to discern.

      We are grateful for your thoughtful reading of our paper and for pointing us to these relevant references. We appreciate the need to clarify how our work is situated within the existing literature. In brief, the novelty of our paper lies in measuring exploration in contexts where it is not in direct competition with the need to exploit knowledge for reward. This approach enables us to include orders of magnitude more exploration trials. With this increased power, we were able— for the first time— to distinguish between competing algorithms for addressing uncertainty, and we identified a novel tendency to avoid uncertainty when overall uncertainty is high. We now state this more clearly in the discussion section and cite the suggested papers.

      “While the literature on exploration is expansive, the paradigm presented here extends it in important ways. Researchers of reinforcement learning have previously examined exploration in the context of reward-seeking decisions. Using such paradigms as the bandit task Schulz and Gershman (2019), it was demonstrated that humans don't always choose the option they believe will yield the most reward, but also make random and directed choices with the aim of exploring other uncertain options (Schulz and Gershman, 2019; Wilson et al., 2014). Recently, studies using the bandit task have lent empirical support to the notion that exploration is difficult, as participants explore less under time pressure or cognitive load (Brown et al., 2022; Otto et al., 2014; Cogliati Dezza et al., 2019; Wu et al., 2022). Crucially, this literature has focused on cases where reward can be gained on each trial (Brown et al., 2022; Cohen et al., 2007; Daw et al., 2006; Schulz and Gershman, 2019; Song et al., 2019; Tversky and Edwards, 1966; Wilson et al., 2014; Wu et al., 2022). In such tasks, the motivation to exploit current knowledge predominates exploration, rendering it rare and difficult to measure (Findling et al., 2019). In contrast, our task was designed to remove the impetus to immediately exploit current knowledge , and as a result we were able to observe many exploratory choices. With this increased experimental power, we were able to compare different algorithms approximating the goal of approaching uncertainty, and describe how and when humans avoid uncertainty instead of approaching it.”

      Reviewer #1 (Recommendations For The Authors):

      Are all participants best fit by the delta uncertainty model? Since other parts of the paper focus on individual differences, it would be useful to examine if people differ in the computational complexity of their exploration strategies and if this difference relates to other behavior.

      We thank you for this helpful suggestion, which prompted us to conduct additional analyses. To address your question, we summarized point-wise predictive accuracy for each participant and compared it across the three models. The results are presented in the new Supplements 2 and 3 to Figure 6.

      These analyses show that, for the vast majority of participants, uncertainty was favored over exposure as a choice strategy, and for a sizable majority, it was also favored over EIG. As detailed in Figure 6 and its supplements, 125 participants were best described by uncertainty relative to EIG, 58 by EIG, and 11 showed inconclusive results. Similarly, 96 participants were better fit by uncertainty than exposure, while an additional 72 had negative exposure coefficients (consistent with uncertainty-based choice). Exposure was supported for 26 participants.

      We also examined how these strategies relate to other behavioral measures. Exposure was not strongly linked to test performance. EIG, by contrast, showed a positive association with test performance, perhaps because it is more closely correlated with uncertainty. Importantly, however, across posterior predictive checks in the main text and supplements, approaching uncertainty continues to provide the best overall description of participants’ strategies.

      The authors construct a hierarchy of exploratory strategies. Perseveration/switching is also an explore/exploit strategy that would lie above random exploration in the authors' hierarchy.

      We chose not to place perseveration within the hierarchy, as from a normative perspective it is not, strictly speaking, an exploration strategy. At its extreme, perseveration would lead a participant to repeatedly sample only one option, leaving the others entirely unexplored. Switching is represented in the hierachy by the equating exposure strategy – they are very similar.

      For the analyses examining uncertainty seeking vs. aversion by total uncertainty, how was the cut point determined? Did this differ across people?

      Thank you for highlighting the need for greater clarity on this point. The threshold was indeed fitted to the data and varied significantly across participants (see Table 6 in Appendix 3). For each participant, the threshold marks the point at which behavior shifts from approaching to avoiding uncertainty. This threshold is a key factor underlying individual differences in the tendency to avoid uncertainty when overall uncertainty is high, as illustrated in the analyses of Figure 6 and related results. We now make this point clearer in the methods section:

      “To quantify how the influence of Δ-uncertainty on choice varied with overall uncertainty, we fit a multilevel piecewise logistic regression model. This model estimated a threshold in overall uncertainty, treated as a free parameter, and allowed the slope of Δ-uncertainty on choice to differ below and above this threshold. Below the threshold, a positive slope reflects a tendency to approach uncertainty; above the threshold, a negative interaction captures the tendency to avoid Δ-uncertainty with higher values of overall uncertainty.”

      More details on the DDM analyses are needed - it's not clear how the outputs of the DDM correspond to what is stated in the text in the results.

      We agree that the section detailing the DDM analyses could be clarified. We analyzed two key parameters of the DDM: the drift rate, which we interpret as reflecting the efficacy of deliberation over uncertainty, and the bound separation, which corresponds to the tendency to deliberate rather than respond quickly. Our results show that good learners exhibit both higher drift rates and higher bounds. When participants repeat a previous choice, both the drift rate and bounds are lower. We changed the way we report the results:

      “We found that RTs indeed varied in relation to the absolute value of Δ-uncertainty as expected b=0.69, 95\% PI=[0.58,0.78]. Crucially, a stronger dependence of RT on the absolute value of Δ-uncertainty predicted better performance at test (drift-rate and test performance association b=0.81, 95% PI=[0.58,1.07]). We further found that participants who tended to deliberate longer for the sake of accuracy also tended to perform better at test (bound height and test perfromance association b=1.46, 95% PI=[0.58,2.34]; Figure8c). In summary, participants who were better at deliberating about uncertainty during exploration, and who deliberated for longer, performed better at test. Thus, making good exploratory choices that lead to efficient learning involves prolonged deliberation.”

      We also provide a detailed explanation of this correspondence in the Methods section:

      “The DDM explains RTs as the culmination of three interpretable terms. The first is the efficacy of a participant’s thought process in furnishing relevant evidence for the decision - in our case the efficacy of choosing according to Δ-uncertainty (the drift rate in DDM parlance). The second term governs the participant’s speed-accuracy tradeoff by determining how much evidence they require to commit to a decision. This can also be thought of as how long a participant is willing to deliberate when a decision is difficult (bound height). Finally, the portion of the RT not linked to the deliberation process is captured by a third term (non-decision time).”

      The authors note that "the three choice strategies prescribe different table choices on most trials" but (from what I can see) only provide a representative participant's plot in Figure 2. What was the overall correlation of predicted choices from the three models?

      Thank you for pointing out this oversight. The correlations are now shown in the supplement to Figure 2. In brief, correlations between exposure and the other two strategies are low, while the correlation between EIG and uncertainty is moderate. These dependencies motivated our decision to fit a separate logistic regression model for each strategy and to compare strategies using formal model comparison and posterior predictive checks, rather than including them all in a single regression model.

      It appears that the models are all constructed to predict table choices and not card deck choices. Can the authors clarify this? If so, what role do the card deck choices have?

      Indeed, the manuscript focuses on table choices, as these are the choices of primary interest from an exploration perspective. It is most straightforward to define the three exploration strategies with respect to table choices, whereas for deck choices it is not clear how to define EIG in respect to the perforamnce at test. The hierarchical structure of the task was originally chosen to increase complexity, with the goal of creating a rich task that engages cognitive resources. We have not formally tested this assumption, and do not expect that the patterns we observe should be absent in a flat version of the task.

      Reviewer 2 (Public review):

      Summary:

      This paper focuses on an interesting question that has puzzled psychologists for decades, that is, why do people demonstrate a mix of uncertainty approach and avoidance behavior, given the fact that reducing uncertainty could always gain information and seems beneficial? This paper designed a novel task to demonstrate behavioral signatures of uncertainty approaching and avoidance during the exploration phase within the same task at both a within-subject and betweensubject level. On the algorithmic level, this paper compared four different implementations of uncertainty-guided exploration and found that the model sensitive to relative uncertainty provides the best fit for human behavior compared to its counterparts using expected information gain or past exposure. This paper then links people's uncertainty attitude with accuracy and finds that uncertainty avoidance during exploration does not impair task performance, implying that uncertainty avoidance may be the output of a resource-rational decision-making process. To examine this account, this paper uses reaction time as an independent proxy of costly deliberation and shows that people deliberate shorter when engaging in repetitive choice, which presumably saves cognitive resources. Finally, the paper shows that people's tendency to engage in repetitive choice correlates with their tendency to avoid uncertainty, which supports the argument that avoiding uncertainty could be a strategy developed under the constraint of limited cognitive resources.

      Strengths:

      One of the highlights of this paper, as mentioned in the previous paragraph, is that the authors can establish the existence of the uncertainty approach and avoidance behavior within the same task whereas previous work usually focuses on one of them. This dissociation allows the authors to examine what situational factor is related to the emergence of the act of avoiding uncertainty, and extract parameters describing participants' attitude towards uncertainty during baseline as well as during situations where uncertainty avoidance is more common. Besides documenting the existence of uncertainty avoidance behavior, this paper also tried to explain this behavior by proposing under the resource rational framework and has carefully quantified different aspects (e.g., accuracy; choice speed) of participants' behavior as well as examined their relationships. Though more experiments are needed to fully understand human uncertainty avoidance behavior, this paper has provided both empirical and theoretical contributions toward a mechanistic understanding of how people balance approaching and avoiding uncertainty.

      Weaknesses:

      I have a couple of concerns related to this paper. First, there seems to exist an anticorrelation between total uncertainty and absolute relative uncertainty (Figure 5 panel C, \delta uncertainty is restricted to a small range when total uncertainty is high). It seems to be a natural product of the exploration process since the high total uncertainty phase is usually the period where the participant knows little about either option, leading to a less distinguishable relative uncertainty. However, it remains unknown whether the documented uncertainty avoidance still applies when extrapolating to larger absolute relative uncertainty.

      We sincerely thank you for your close reading of our manuscript and for highlighting its strengths. In the paradigm we study, overall and relative uncertainty are not anticorrelated. While the two are related—as in any finite-information exploration task, where the value of overall uncertainty constrains the possible range of relative uncertainty—they are not correlated and can therefore be used as predictors in a single regression model. We agree that strategies could differ substantially in a (near) infinite-information setting, such as when people seek semantic knowledge. The advantage of a finite-information task is its tractability, which enables the computational analyses we conducted. That said, the inherently greater intractability of an infinite-information task would likely alter human strategies, as it poses challenges both to participants and to researchers.

      It would be great if the experiment allows for a manipulation of uncertainty in the middle of the experiment (e.g., introducing a new deck/informing that one deck has been updated)

      We agree, and look forward to probing this question in the future. We’ve added the point to our discussion section:

      “Our theoretical analysis and experiments leave several open questions. One concerns the relationship between overall uncertainty and time on task: in our paradigm, overall uncertainty was correlated with the number of cards observed. Although our findings remain robust when trial number is included as a covariate in the regression models, future work could more directly disentangle these factors by orthogonalizing overall uncertainty and elapsed time. This might be achieved, for instance, by manipulating overall uncertainty within a game—such as by introducing new tables or altering outcome probabilities mid-round.”

      Relatedly, the current 'threshold' of uncertainty avoidance behavior, if I understand correctly, is found by empirically fitting participants' data. This brings the question: can we predict when people will demonstrate uncertainty avoidance behavior before collecting any data? Or, is it possible that by measuring some metrics related to cognitive cost sensitivity, we could predict the proportion of choices that participants will show uncertainty-avoidant behavior?

      Thank you again for probing our thinking further. The threshold of uncertainty is indeed fitted on an individual basis using a hierarchical model. We believe there should be ways to predict it. In the current data, we find that it is correlated with the baseline tendency to approach uncertainty: in other words, participants who perform better show a slightly stronger tendency to avoid uncertainty when overall uncertainty is high. This underscores the complexity of identifying correlates of a coping strategy, as it is intricately linked to the difficulty being coped with. We speculate that working memory capacity may play an important role in this strategy, as well as the interplay between working memory–based learning and slower incremental learning mechanisms. Beyond speculation, however, we currently have no data to test these ideas.

      Finally, regarding the analysis of different behavior patterns in the game, it seems that the authors try to link repetitive behavior, uncertainty attitude, and accuracy together by testing the correlation between the two of them. I wonder whether other multivariate statistical methods e.g., mediation analysis, will be better suited for this purpose.

      This was a very insightful comment. We revisited the data and fitted test performance using a multiple regression model, predicting performance from the three exploration-phase strategies simultaneously: baseline tendency to approach uncertainty, tendency to avoid uncertainty when overall uncertainty is high, and tendency to repeat previous choices. When adjusting for the baseline tendency to approach, we find that the tendency to avoid uncertainty is indeed associated with a slight decrement in test performance. However, in our sample, the better learners—who are more effective at approaching uncertainty—also tend to avoid it when overall uncertainty is high. This nuance highlights the point discussed earlier. We find similar results when fitting the data with a mediation model, but we favour the multiple regression approach, since have no strong convictions about which exploration strategy causes another. We have detailed this analysis in the main text and have accordingly modified and qualified our interpretation of this finding:

      “In contrast, the relationship between the tendency to avoid uncertainty and test performance was more nuanced. In both samples, participants who were more inclined to approach uncertainty also tended to avoid it when overall uncertainty was high r=0.43, p=5.42 x 10<sup>-10</sup>. Accordingly, avoidance was positively correlated with test performance at the population level b=1.18, 95% PI=[0.80, 1.58] Figure 7b; see Methods for parameter estimation). However, once we adjusted for the tendency to approach, avoidance was reliably associated with worse test performance b=-0.83, 95% PI=[-1.28,-0.40].”

      Reviewer #2 (Recommendations For The Authors):

      Could the authors elaborate more on why the negative relationship between exposure and choice (Figure 4a) is a natural phenomenon under the relative uncertainty model?

      Indeed, we believe this is a natural phenomenon under the uncertainty model. When simulating an uncertainty-driven agent, the negative relationship arises naturally. We interpret this as the agent repeatedly pursuing tables that are more difficult to learn—those with smaller probability differences. The agent is drawn to these tables precisely because they are harder to master. By contrast, an EIG-driven agent would not repeatedly return to tables that are too difficult to learn. We have revised the Results section to make this point clearer:

      “The simulations demonstrate that the surprising negative correlation between choice and Δ-exposure is an epiphenomenon of uncertainty-driven exploration: agents repeatedly return to harder-to-learn tables, gaining more exposure to them precisely because they remain more uncertain about these tables.”

      It would be great if the authors could provide the correlation between different uncertainty estimates to help the readers have a better sense of how different these estimates are.

      We’ve added this information in the supplement to Figure 2. In brief, correlations between exposure and the other two strategies are low, while the correlation between EIG and uncertainty is moderate. These dependencies motivated our decision to fit a separate logistic regression model for each strategy and to compare strategies using formal model comparison and posterior predictive checks, rather than including them all in a single regression model.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Pierre Despas et al. studied the role of Salmonella typhimurium LppB in outer membrane tethering. Using E. coli ∆lpp mutant the authors showed that Salmonella LppB is covalently attached to PG throug K58 and that these crosslinks are formed by the L,Dtranspeptidase LdtB, primarily. Additionally, authors demonstrate that LppB forms homodimers via a disulfide bond through C57, but when Lpp is present it can also form heterotrimers with it. Thus, suggesting a regulatory role in Lpp-PG crosslinking.

      Strengths:

      In my view, this is a nice piece of work that expands our understanding of the role of lpp homologs. The experiments were well-designed and executed, the manuscript is wellwritten and the figures are well-presented.

      Weaknesses:

      I have some suggestions to give a clearer message, because I think a few images don't reflect much of what the authors wrote.

      We thank Reviewer #1 for this important comment. We agree that several figures could more directly illustrate the points made in the text. In a revised version, we intend to revise the relevant figure panels and legends to better align the visual message with the conclusions, and we will adjust the corresponding text to explicitly state what each figure demonstrates and how the data support our interpretation. We anticipate that these changes will improve clarity and strengthen the alignment between figures and text.

      It'd be helpful for readers to see the phylogenetic tree of the rest of the organisms that harbor LppB homologs and Lpp.

      We thank Reviewer #1 for this suggestion. We examined the distribution of Lpp-family proteins across closely related Enterobacteriaceae. While species such as Escherichia fergusonii, Shigella flexneri and Shigella dysenteriae encode Lpp and as well as a paralogous small lipoprotein (YqhH, see Fig.S7), we find that LppB-like orthologs (equivalent to lppB from Salmonella) appear to be restricted to Salmonella species to our knowledge. Because LppB shows this lineage-specific distribution, inclusion of a broader phylogenetic tree would primarily highlight its restricted presence rather that provide additional evolutionary insight. We will clarify this point in the revised manuscript.

      Increased expression of LppB under low pH is subtle. This result would benefit from quantifying the blots (Fig. S1) and performing statistical analysis.

      We thank Reviewer #1 for this observation. We agree that the increase in LppB levels at acidic pH appears modest. We will carefully reassess this result across independent experiments and, where technically appropriate, provide quantitative information to better document the magnitude of the effect. Additionally, we will revise the text to more accurately described the observed difference.

      Similarly, the SDS-EDTA sensitivity result (Fig. S2) is not convincing; the image doesn't seem to show isolated colonies at low pH (Fig. S2B). Please measure CFU/mL and report endpoint growth graphs instead. Statistical analysis should also be presented.

      We thank Reviewer #1 for this suggestion. We agree that the SDS-EDTA sensitivity assay presented in Fig. S2 could benefit from a more quantitative assessment. We will perform CFU/mL measurements from independent biological replicates to better quantify the observed differences and include statistical analysis when appropriate. In addition, we will revise the corresponding text to more accurately reflect the magnitude of the phenotype.

      The reduction to PG crosslinking of the C57R mutant is unclear (Fig 4B lane 22). The authors state: "suggesting that additional features of the LppB C-terminal region underlie its reduced efficiency." Does this mean additional amino acids play a role? Did the authors try to substitute Cys with other amino acid residues like Ala or Ser and quantify protein levels to find a mutant with similar expression levels? Do these have less crosslinking too?

      We thank Reviewer #1 for this important comment. As correctly noted, the reduced abundance of the LppB<sub>C57R</sub> variant likely contributes to its reduced level of peptidoglycancrosslinked species. Therefore, we cannot formally distinguish whether the reduced peptidoglycan crosslinking reflects decreased intrinsic crosslinking efficiency or simply reduced protein abundance and stability. We will revise the text to clarify this point and explicitly acknowledge this limitation. The C57R substitution was chosen because arginine is present at the equivalent position in the Salmonella LppA homolog, allowing us to assess the functional consequences of a naturally occurring sequence variation between Lpp-family members. While substitutions such as C57A or C57S could further dissect the specific contribution of the cysteine residue, our use of the C57R substitution provides direct insight into the functional implications of this naturally occurring difference between Lpp homologs.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Pierre Despas and co-workers, reports the biochemical characterization of LppB a peculiar Lpp (Braun's lipoprotein) homolog found in Salmonella enterica. S. enterica encodes two Lpp homologs LppA and LppB: while LppA and Lpp function similarly, the role of LppB is less clear. LppB shares with Lpp the Cterminal Lys needed for covalent attachment to peptidoglycan (PG) but diverges in residues that precede the terminal Lys featuring a Cys residue at the penultimate position. By using E. coli as a surrogate model, the authors show that LppB can be covalently linked to PG via the terminal Lys residues and that the penultimate Cys residue can be used to form homodimer species when expressed alone and heterotrimeric complexes when co-expressed with Lpp. Interestingly, LppB expressed in E. coli seems to be stabilized at acidic pH a condition Salmonella encounters in macrophage phagosomes. Finally, based on decreased intensity of LppB-PG crosslinked bands as LppB expression increases the authors suggest that LppB is able to negatively modulate the outer membrane-peptidoglycan connectivity.

      Strengths:

      The manuscript is interesting, describes a novel strategy employed by bacteria to fine tuning outer membrane-PG attachment and provides new insights into how envelope remodeling processes can contribute to bacterial fitness and pathogenicity.

      Weaknesses:

      The analysis and quantification of muropeptides formed in E. coli strains overexpressing LppB would strengthen the main conclusion of the manuscript.

      We thank Reviewer #2 for this insightful comment. We agree that quantitative analysis of muropeptides in E. coli strains expressing LppB would strengthen the main conclusion. This point was also raised in the editorial assessment and by Reviewer #3, underscoring its importance. In a revised version, we plan to perform muropeptide profiling by HPLC, coupled where appropriate to mass spectrometry, to quantitatively assess peptidoglycan composition in the relevant strains.

      Reviewer #3 (Public review):

      Summary:

      The manuscript is interesting, and it is clearly written. While the experiments are well executed, a general flaw is that the LppA/B analyses are done in the E. coli K12 host as surrogate for Salmonella enterica. For the mechanistic and molecular analyses of LppB a surrogate host is certainly adequate, yet it limits extrapolation of the physiological implications of LppB in the natural context. 

      Strengths:

      The work convincingly demonstrates that LppB forms disulfide-based dimers and that it is crosslinked to PG via LdtB in E. coli. Moreover, dimerization is required for LppB abundance in E. coli and LppB can inhibit crosslinking of Lpp/A to PG in E. coli. 

      Weaknesses:

      Regarding the key conclusion of the work: while it is shown that LppB is oxidized in E. coli, whether envelope integrity (or OMV production) changes arise from switches in oxidation of the LppB cysteines remains to be shown, for E. coli let alone in the native host Salmonella. Does expression of LppB influence Lpp/A activity or OM tethering in E. coli? Since the inhibition of the Lpp/A linking to PG is not affected by the oxidation state of LppB, the abstract/title implies redox-control of envelope integrity which is a bit misleading and an overstatement. Both are features of LppB: i.e. it dimerizes through disulfide bond formation and it reduces PG binding of Lpp/A through trimerization. However, no link between the two is shown.

      We thank Reviewer #3 for this important comment and for highlighting the need to clarify the relationship between LppB oxidation, oligomerization, and its effect on peptidoglycan crosslinking. We agree that while our data demonstrate that LppB forms disulfide-linked oligomers and that LppB expression reduces Lpp/A attachment to peptidoglycan, our current results do not establish a direct causal link between the oxidation state of LppB and its ability to modulate outer membrane–peptidoglycan tethering. Therefore, we will revise the manuscript to avoid implying redox-dependent control of envelope integrity and to more clearly present these as distinct but potentially related properties of LppB.

    1. Author response:

      We thank the reviewers for their constructive feedback and careful evaluation of our manuscript. We are encouraged that the study was viewed as well designed and clearly presented, that its computational modeling approach was recognized as a strength, and that the key findings were appreciated. We agree that some claims would benefit from additional support and clarification. Below, we outline the main revisions we will undertake to strengthen the manuscript and address the points raised in the reviews. These revisions are intended to strengthen the evidential support for our conclusions and clarify aspects of the results and modeling.

      (1) Statistical support.

      Some claims were judged to lack sufficient statistical support [Reviewer 1]. In the revised manuscript, we will carefully review all inferential claims and ensure that they are supported by appropriate statistical analyses. Where necessary, we will implement additional statistical tests and expand statistical reporting to ensure that differences between conditions, models, or behavioral measures are formally evaluated and that key aspects of the data are appropriately described.

      (2) Modeling clarification.

      Some aspects of the modeling were considered insufficiently clear, particularly regarding how the models were implemented [Reviewers 1 and 2]. We will expand the Methods section to provide a clearer and more complete description of the Bayesian models and their implementation. In particular, we will clarify that full probability distributions were computed (without reduced approximations such as those used in simplified Bayesian variants), and that the only approximation concerns numerical discretization of continuous state spaces at fine resolution. We will clarify that variance is part of the joint multidimensional state space and is inferred jointly with the mean. We will also explicitly state that apparent learning rates are derived from predicted paddle responses in the same way as for participants, and are not directly computed within the Bayesian inference process.

      (3) Model fitting.

      The absence of direct model fitting to individual participants was identified as a limitation [Reviewers 1 and 3]. In response, we will implement individual-level model fitting (to the extent feasible in practice) and conduct formal model comparison based on the fitted models. We will further validate the fitted models by examining whether they reproduce the main behavioral signatures observed in the data.

      (4) Normative interpretation and control analyses.

      The interpretation of the models as normative was questioned in light of the response-probability mechanism [Reviewer 2]. In the revision, we will clarify the distinction between the normative inference component of the model and the response-level mechanism. We will revise the framing of the results accordingly and ensure that normative claims are restricted to the inference component. We will also expand the discussion to integrate relevant literature on perseveration and satisficing, and clarify how normative and non-normative mechanisms may jointly shape behavior. In addition, following the reviewer’s suggestion, we will include control analyses using standard Rescorla–Wagner models, with and without the response-probability mechanism, to evaluate whether the observed signatures can be accounted for by simpler learning rules.

      (5) Additional points.

      We will also address the additional points raised in the reviews. Specifically, we will include supplementary histograms of apparent learning rates [Reviewer 2]. We will provide additional clarification and analyses regarding the effects of stochasticity on learning [Reviewer 1]. Finally, we will explore hybrid or mixture models and strategies and expand the discussion of this possibility [Reviewer 3].

      We believe that these revisions will substantially strengthen the support for our claims and address the concerns raised in the current assessment. We are grateful for the reviewers’ engagement with our work and for their comments, which will allow us to significantly improve the clarity and strength of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work presents a GUI with SEM images of 8 Utah arrays (8 of which were explanted, and 4 of which were used for creating cortical lesions).

      Strengths:

      Visual comparison of electrode tips with SEM images, showing that electrolytic lesioning did not appear to cause extra damage to electrodes.

      Weaknesses:

      Given that the analysis was conducted on explanted arrays, and no functional or behavioural in vivo data or histological data are provided, any damage to the arrays may have occurred after explantation. This makes the results limited and inconclusive (firstly, that there was no significant relationship between degree of electrode damage and use of electrolytic lesioning, and secondly, that electrodes closer to the edge of the arrays showed more damage than those in the center).

      We agree insofar as we could not fully control the circumstances of each array during explantation. However, array explantation is potentially damaging, but not universally damaging, as demonstrated by some largely intact arrays in this paper. If electrolytic lesions were damaging to the array, they would be observed. All arrays examined in this paper were carefully stored as described in the paper. All analyses of this type require an explant surgery [?????]. Our conclusions remain as strong as any of the results of these analyses.

      Overall, these results do not add new insight to the field, although they do add more data and reference images.

      We respectfully disagree, as there is no extant SEM analysis on electrode arrays used for lesioning.

      Reviewer #2 (Public review):

      In this study, the authors used scanning electron microscopy (SEM) to image and analyze eleven Utah multielectrode arrays (including eight chronically implanted in four macaques). Four of the eight arrays had previously been used to deliver electrolytic lesions. Each intact electrode was scored in five damage categories. They found that damage disproportionately occurred to the outer edges of arrays. Importantly, the authors conclude that their electrolytic Lesioning protocol does not significantly increase material degradation compared to normal chronic use without lesion. Additionally, the authors have released a substantial public dataset of single-electrode SEM images of explanted Utah arrays. The paper is well-written and addresses an important stability issue for long-term chronically implanted array recordings and electrolytic lesioning, which is relevant to both basic science and translational research. By comparing lesioning and non-lesioning electrodes on the same array and within the same animal, the study effectively controls for confounds related to the animal and surgical procedures. The shared dataset, accessible via interactive plots, enhances transparency and serves as a valuable reference for future investigations. Below, we outline some major and minor concerns that could help improve the work.

      Major concerns:

      (1) Electrode impedance is a critical measurement to evaluate the performance of recording electrodes. It would be helpful if the authors could provide pre-explant and post-explant impedance values for each electrode alongside the five SEM damage scores. This would allow the readers to assess how well the morphological scores align with functional degradation.

      We agree, electrode impedance is very important in determining electrode performance. However, due to the multi-year, multi-subject nature of this work, we unfortunately do not have this data.

      (2) The lesion parameters differ across experiments and electrodes. It would be helpful if the authors could evaluate whether damage scores (and/or impedance changes) correlate with total charge, current amplitude, duration, or frequency.

      Thank you for this recommendation. We have included additional analyses in Supplementary Materials.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) ‘Both in vitro and in vivo testing of electrode arrays revealed environmental damage to these materials, such as cracking, textural defects, and degradation in response to the brain’s temperature and salinity [32]. The immune response of the brain also damages the electrodes due to effects like glial scarring (gliosis) and inflammation [33, 34]. This damage may be exacerbated by the surgical techniques used during implantation, which include pushing the electrode array into cortex and tethering the implant to the skull [33, 35, 36].’

      In the above text, several relevant references have been left out, e.g.:

      Barrese et al., 2013

      Patel et al., 2023

      Woeppel et al, 2021

      Chen et al., 2023

      Bjanes et al., 2025

      Thank you for this recommendation. This section has been updated.

      (2) ‘Aggressive electrical stimulation is known to dissolve platinum-based electrodes [37, 38]. Other studies have shown iridium oxide to be more resistant to stimulation-related damage, but not completely insusceptible [39, 40].’ Reference number 25 is relevant here.

      Thank you for this recommendation. This section has been updated.

      (3) ‘F’s and C’s PMd arrays were used for electrolytic lesioning experiments Monkey U was implanted with three 96-channel arrays; two in M1 and one in PMd.’ There seems to be a punctuation mark missing.

      Thank you for this recommendation. This section has been updated.

      (4) Methods: How much charge was injected via the electrodes that were used for lesioning? What current amplitudes, voltages, durations, and number of pulses were used? If more than 1 pulse was applied, what were the frequencies? Was the pulse cathode-only/ anode/only? What were the electrode impedance values at the time of stimulation? How many electrodes were used for lesioning at any given moment? How long after lesioning did the arrays remain in the tissue?

      Thank you for your questions. An additional supplemental table (Supplemental Table 6) detailing specific NHP lesions parameters has been added. A summary of the lesion procedure (DC, bipolar, two electrodes at a time) has also been included in Methods. All arrays remained in the subject until explant, which ranged between hours (same-day lesion and explant) to several years. Further details on the lesioning procedure are available in citation [?]. Explant dates are available in Supplemental Table 1. Unfortunately, we do not have the impedance values at time of lesioning as this is not a measure we record frequently after implant, though we agree the data would be useful to have.

      (5) Caption for Figure 1: ‘All array images are displayed with the wire bundle to the right side.’ I recommend adding this text from Figure 2 to the caption of Figure 1: ’electrode tips facing viewer’.

      Thank you for this recommendation. This section has been updated.

      (6) ‘Electrodes used for electrolytic lesioning are denoted with blue dots.’ Was stimulation carried out across all these electrodes simultaneously?

      No, stimulation was not carried out across all electrode simultaneously. Pairs of electrodes were stimulated at the same time to create lesions. Lesions were performed on different days. We have updated our methods section to reflect this. See the Methods section and citation [?] for more details.

      (7) For the control array, in Figure 1: ‘Click each column to view a close-up of the 5th row (from top to bottom) of electrodes:’ . It would be clearer to state: ’Click each column to view a close-up of a single electrode in the 5th row (from top to bottom):’.

      Thank you for this recommendation. This section has been updated.

      (8) Figure 2 caption: ‘Blank electrodes and electrodes with shank fractures are ignored and displayed in black, as they are not scored.’. What is a ‘blank’ electrode?

      A ‘blank’ electrode is an electrode on the array that physically exists but is not wire bonded at time of manufacture to produce recordings. The corner electrodes of the Utah array are all blank electrodes. We have updated this wording to ‘unwired’ for clarity.

      (9) I recommend incorporating Supplementary Figure 1 into Figure 2, so that the reader can immediately see where the rings are, without referring to the Supplementary Materials.

      Thank you for this recommendation. We have chosen to keep these figures separate for stylistic reasons.

      (10) Supplementary Figures: The figures should have the word ’Supplementary’ in the title, i.e., ‘Supplementary Figure X,’ not just ‘Figure X.’

      Thank you for this recommendation. These captions have been updated.

      (11) Throughout the results, the text is overly focused on the type of statistical test used and the p-values, e.g.: ‘When comparing lesioning and non-lesioning electrodes within the same array, each of the two nonparametric statistical tests (Mann-Whitney U-test, Levene Test) returned insignificant p-values for each category of damage as well as for total damage scores for all four arrays used in lesioning experiments.’.

      To make the findings more digestible for the reader, the text should be rephrased in terms of whether the metrics being compared were significantly different or not. E.g.: ‘For each category of damage, as well as for the total damage score, no significant difference was found between electrodes that were or were not used for lesioning (either the mean or the variance of the scores).’.

      Thank you for this recommendation. We have rephrased the text to reflect this note.

      (12) ‘In Monkey H, the Mann-Whitney U test resulted in an insignificant p-value for coating cracks and parylene C delamination scores, while the Levene test resulted in an insignificant p-value for abnormal debris, coating cracks, and parylene C cracking scores. In Monkey F, the Mann-Whitney U test resulted in an insignificant p-value for parylene C delamination scores, while the Levene test resulted in an insignificant p-value for coating cracks, parylene C delamination, and parylene C cracking scores. In Monkey U, the Mann-Whitney U test resulted in significant p-values for all scores, while the Levene test resulted in an insignificant p-value for abnormal debris, tip breakage, and coating cracks scores. Finally, in Monkey C, the Mann-Whitney U test resulted in an insignificant p-value for parylene C delamination and parylene C cracking scores, while the Levene test resulted in an insignificant p-value for abnormal debris, parylene C delamination, and parylene C cracking scores.’

      To point out another example, this chunk of text is highly repetitive and is unnecessary, as the reader can simply refer to Supplementary Table 4. It should be completely rephrased and summarized, to deliver the key message, i.e. briefly describe what kinds of damage occurred for which arrays. Also, what is the point of the two statistical tests? What are the authors trying to conclude?

      Thank you for this recommendation. We have rephrased and pared down the text to reflect this note.

      (13) Discussion: ‘Similarly, other work did not show significant differences in SEM-visible degradation between both platinum and iridium oxide coated electrodes used for stimulation [24, 25].’ What differences are being referred to here? Differences in degradation between stimulated Pt versus stimulated IrOx electrodes? Or between stimulated Pt and unstimulated PT electrodes? Stimulated IrOx and unstimulated IrOx? Or something else?

      Thank you for your questions. We are comparing platinum against iridium oxide in this sentence. The wording of our original text has been updated to clarify our intention.

      (14) Supplementary Tables: P-values lower than .05, .01, and .001 should simply be replaced with ¡.05, ¡.01, and ¡.001. The alpha value after a Bonferroni correction should be stated somewhere in each table or table caption.

      Thank you for this recommendation. We have edited the tables to reflect this note.

      (15) Title: ‘Material Damage to Multielectrode Arrays after Electrolytic Lesioning is in the Noise’ I don’t understand what the title means. What is in the noise? And what is ‘the noise’?

      “In the noise” is a colloquialism referring to how background information (“noise”) may obscure or distract from other features. This title conveys how material damage to multielectrode arrays due to electrolytic lesioning is largely obscured by the general damage observed on multielectrode arrays after implant and explant.

      (16) This reference has been left out altogether: Chen et al., 2014. The effect of chronic intracortical microstimulation on the electrode-tissue interface.

      Thank you, this reference is now included.

      Reviewer #2 (Recommendations for the authors):

      (1) The number of lesion electrodes is low, especially since there are only 2-10 lesion electrodes on three of the four arrays, yielding limited statistical power.

      We agree that the low number of lesioned electrodes limits statistical power. However, due to ethical considerations, it is unlikely for arrays to contain much more than this number of lesion electrodes.

      (2) The dataset includes both platinum and iridium oxide-coated electrodes. A direct comparison of their damage profiles would be informative.

      Thank you for this recommendation. We have included this additional analysis in Supplementary Materials.

      (3) It is unclear what “is in the Noise” in the title means without reading the manuscript. It is helpful to improve the clarity of the title.

      Thank you for this recommendation.

      (4) Please spell out “PMd” and “M1” at first mention to facilitate reading.

      Thank you for this note. The text has been updated to reflect this recommendation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using single-unit recording in 4 regions of non-human primate brains, the authors tested whether these regions encode computational variables related to model-based and model-free reinforcement learning strategies. While some of the variables seem to be encoded by all regions, there is clear evidence for stronger encoding of model-based information in the anterior cingulate cortex and caudate.

      Strengths:

      The analyses are thorough, the writing is clear, and the work is well-motivated by prior theory and empirical studies.

      Weaknesses:

      My comments here are quite minor.

      The correlation between transition and reward coefficients is interesting, but I'm a little worried that this might be an artifact. I suspect that reward probability is higher after common transitions, due to the fact that animals are choosing actions they think will lead to higher reward. This suggests that the coefficients might be inevitably correlated by virtue of the task design and the fact that all regions are sensitive to reward. Can the authors rule out this possibility (e.g., by simulation)?

      We fully agree with the reviewer that the task design has in-built correlations between transition and reward, and thus the correlation between neural selectivity for feedback and transition (Figure 3E) may be due to the different reward expectation after common or rare transitions. We did try to make this point in the manuscript:

      This suggests that the brain treats being diverted away from your current objective equivalent to losing reward, which is sensible as the subject would normally expect lower rewards on rare trials if their reward-seeking behaviour was efficient.

      We’ve now updated the wording of this statement to try and better make this point and avoid confusion that any non-reward-related encoding is involved:

      “As the reward expectation will be higher on common compared to rare trials, this demonstrates that the brain encodes being diverted to an area with a lower reward expectation equivalent to actually receiving a low reward (and vice versa).”

      We have also adjusted the significance test of this correlation to use a circular permutation test that accounts for correlations between the regressors. This test still found there to be significant correlation in all areas.

      We have described this new permutation test in Methods:

      “For comparing correlations between weights for different features (i.e., between transition and reward coding, Figure 3E), the null distribution of correlations observed in circularly shifted data was compared to the correlation seen in the actual data. This accounts for any correlations between features that existed in the task by preserving the structure of the design matrices.”

      And updated the text in Results accordingly:

      “All regions, but particularly ACC, encoded a common transition (at the time of transition) similar to a high reward (at the time of feedback), as there was a positive correlation between the coefficients for reward and transition (the transition parameter was signed such that common and rare transitions were equivalent to high and low rewards, respectively) (ACC r=0.4963, DLPFC r=0.3273, caudate r=0.4712, putamen, r=0.5052; all p<0.002 except DLPFC where p=0.006, circular permutation test; Figure 3E, S5).”

      The explore/exploit section seems somewhat randomly tacked on. Is this really relevant? If yes, then I think it needs to be integrated more coherently.

      We thank the reviewer for this comment. We agree that the motivation for the explore/exploit analysis was not sufficiently clear in the original version.

      Our aim was not to introduce this as a separate or tangential effect, but rather to highlight how the task’s reward structure (with outcome levels stable for 5–9 trials) naturally created alternating periods favoring exploitation of a known high-value option versus exploration when outcomes changed. This feature of the task is tightly linked to MB-RL computations, as it requires integration of state-transition knowledge and updating across trials.

      Importantly, we show previously in the manuscript that ACC encoded state-transition structure (i.e., common versus rare transition) and MB-value estimates (at choice epoch). However, here we aimed to highlight that the same region also modulated choice encoding as a function of whether the subject was in an exploratory or exploitative regime – by knowing another feature of the task that relies on state-transition and outcome. We have revised this section to better integrate it into the main logic of the paper:

      “In our task, the outcome level (high, medium, low) of each second-stage stimulus remained the same for 5-9 trials before potentially changing. This design naturally created periods where subjects could ‘exploit’ the same Choice 1 to maximize reward for several trials; and other periods where they had to ‘explore’ different second-stage stimuli to optimize reward (as contingencies shifted). In classical MB-RL, the transition between reward states can be learned by keeping counts of observed transitions from a current state-action pair to a subsequent state, yielding a maximum-likelihood estimate of the environment’s dynamics [42]. In fact, knowledge about the reward contingency schedule could support decision-making in both exploitation – by enabling efficient choice when rewards are stable; and exploration – by guiding alternative behaviour most likely to yield improved outcomes (this is different from MF learning, where exploration is more random since the agent lacks explicit state-transition knowledge).

      We thus repeated our decoding analysis of choice 1 stimulus identity, but this time limited trials to those where they had not received a high reward for the previous two trials (‘explore’ trials), and those where the previous two rewards had been the highest level (‘exploit’ trials). All regions encoded choice 1 for some duration of the choice epoch for both explore (p<0.002 in all cases, permutation test; Figure 7A) and exploit (p<0.002 in all cases; Figure 7B) conditions, but decoding accuracy was strongest in ACC. Choice 1 was less strongly decoded – particularly in ACC – in the former condition compared to the latter (p<0.002 for at least 140 ms in all cases, permutation test on differences observed; Figure 7C); and, also during exploitation, the ACC encoded choice 1 before the choice was even presented to the subject (Figure S8). This pre-choice ACC encoding in exploit trials may reflect the need to allocate cognitive (or attentive) resources to features – i.e., choice 1 stimulus identity – that are most certain predictors of important outcomes. As a control, we also decoded the direction of the Choice 1 (where choice was indicated via joystick movement), which was randomised each trial and therefore orthogonal to the stimulus that was chosen. Again, all four regions encoded its direction in both explore (p<0.002 in all cases; Figure 7D) and exploit (p<0.002 in all cases; Figure 7E). However, there were minimal differences in the strength of the representation between explore and exploit conditions (ACC, p=0.088, cluster-based permutation test; DLPFC p=0.016; caudate p=0.32; putamen p=1; Figure 7F). Therefore, exploit behaviour specifically upregulated relevant task parameters that were worth remembering across trials.”

      Reviewer #2 (Public review):

      Summary:

      The authors investigate single-neuron activity in rhesus macaques during model-based (MB) and model-free (MF) reinforcement learning (RL). Using a well-established two-step choice task, they analyze neural correlates of MB and MF learning across four brain regions: the anterior cingulate cortex (ACC), dorsolateral PFC (DLPFC), caudate, and putamen. The study provides strong evidence that these regions encode distinct RL-related signals, with ACC playing a dominant role in MB learning and caudate updating value representations after rare transitions. The authors apply rigorous statistical analyses to characterize neural encoding at both population and single-neuron levels.

      Strengths:

      (1) The research fills a gap in the literature, which has been limited in directly dissociating MB vs. MF learning at the single unit level and across brain areas known to be involved in reinforcement learning. This study advances our understanding of how different brain regions are involved in RL computations.

      (2) The study used a two-step choice task Miranda et al., (2020), which was previously established for distinguishing MB and MF reinforcement learning strategies.

      (3) The use of multiple brain regions (ACC, DLPFC, caudate, and putamen) in the study enabled comparisons across cortical and subcortical structures.

      (4) The study used multiple GLMs, population-level encoding analyses, and decoding approaches. With each analysis, they conducted the appropriate controls for multiple comparisons and described their methods clearly.

      (5) They implemented control regressors to account for neural drift and temporal autocorrelation.

      (6) The authors showed evidence for three main findings:

      (a) ACC as the strongest encoder of MB variables from the four areas, which emphasizes its role in tracking transition structures and reward-based learning. The ACC also showed sustained representation of feedback that went into the next trial. b) ACC was the only area to represent both MB and MF value representations.

      (c) The caudate selectively updates value representations when rare transitions occur, supporting its role in MB updating.

      (7) The findings support the idea that MB and MF reinforcement learning operate in parallel rather than strictly competing.

      (8) The paper also discusses how MB computations could be an extension of sophisticated MF strategies.

      Weaknesses:

      (1) There is limited evidence for a causal relationship between neural activity and behavior. The authors cite previous lesion studies, but causality between neural encoding in ACC, caudate, and putamen and behavioral reliance on MB or MF learning is not established.

      We agree with the reviewer that the present study does not establish causal relationships, and we do not claim otherwise in the manuscript. Our work was designed as a comprehensive characterization of neural activity across ACC, DLPFC, caudate, and putamen during reward-seeking decision-making. By systematically comparing MB- and MF- RL signals across these regions, we provide new insights into the division of labor and cooperative interactions within cortico-striatal networks.

      While causal manipulations (e.g., lesions, inactivations, stimulation) are indeed required to directly establish necessity or sufficiency, correlational studies such as ours play a crucial role in identifying where and how computationally relevant signals are represented. Importantly, our findings align with and extend prior causal work, for example showing that ACC and striatal lesions disrupt MB control. Thus, our study contributes a detailed functional mapping of MB and MF RL encoding across multiple nodes of this circuit, which serves as an important foundation for future causal investigations (e.g., using transcranial ultrasound stimulation).

      (2) There is a heavy emphasis on ACC versus other areas, but it is unclear how much of this signal drives behavior relative to the caudate.

      We appreciate the reviewer's observation regarding this matter. Our intention was not to place a heavy emphasis on ACC, rather this came naturally from the data. The ACC demonstrated considerably more robust and enduring neural activity compared to other brain regions – for instance, reward-related signals in the ACC continued well beyond individual trials (Fig. 2A-B), and encoding of state transitions remained active from the initial transition through to the feedback phase (Fig. 3A-B). By comparison, distinctions among other regions were less pronounced, which naturally resulted in the ACC receiving greater attention in our analytical findings.

      We acknowledge that the caudate plays an essential and complementary role in driving behavior, and we believe that this is emphasized in the two key subsections of our “Results”. First, caudate neurons encoded model-based choice values (Fig. 4A, 4C) and uniquely remapped these values following rare transitions (Fig. 5), reflecting flexible adjustment of action values. Second, decoding analyses showed that both ACC and caudate populations predicted first-stage choices (Fig. 6C), linking their activity directly to behavioral decisions. In the Discussion section, we also highlight that “the distinctive caudate signal of updating (flipping) the value estimates of the currently experienced option on rare trials” goes beyond a “general temporal-difference RPE” and rather supports “the role of caudate in MB valuation”.

      (3) The role of the putamen is somewhat underexplored here.

      Our analyses were conducted in an identical manner across all four recorded regions (ACC, DLPFC, caudate, and putamen), and we consistently reported the results for putamen alongside the others. For example, in the Results section we describe how “both caudate and putamen encoded the reward from the previous trial negatively during the feedback period of the current trial” (Fig. 2F-G), and that “all regions had a significant population of neurons that encoded MB-, but not MF-, derived value” including putamen (Fig. 4F). Similarly, we show that putamen, like caudate, encoded a dopamine-like RPE signal at feedback (“both caudate and putamen neurons clearly responded at feedback with the parametric features of a dopamine-like RPE”; Discussion). These findings align with previous work linking the putamen to MF learning and are discussed explicitly in the context of MF-MB dissociations. We therefore believe that the putamen was not underexplored, but rather that its contribution was more circumscribed relative to ACC and caudate because the signals observed were quantitatively weaker and less distinctive for MB computations.

      (4) The authors mention the monkeys were overtrained before recording, which might have led to a bias in the MB versus MF strategy.

      We agree that extensive training can influence the balance between MB and MF in choice behaviour and neuronal responses.

      In a previous comprehensive behavioral analysis of the same dataset (Miranda et al., 2020, PLoS Computational Biology - ref. 36, Figure S6B) we showed that both MB and MF strategies contributed to behavior, with MB dominance stable across weeks of testing – supporting that overtraining did not eliminate MF influences (but rather stabilized a mixed strategy with robust MB contributions).

      In the same manuscript, we have also: i) cautioned the readers when comparing our results to data from the original human studies; ii) acknowledged that our extensive training cannot address earlier phases of learning in which sensitivity to the task structure is first acquired; and iii) also provided task-related reasons for such MB dominance – as training made the transition structure well learned (making MB computationally less costly and faster to implement) and the non-stationary outcomes favored the flexibility of MB strategies.

      In the present manuscript, we also have acknowledged that overtraining may have shifted neural signals toward stronger MB representations, or alternatively enabled more sophisticated task representations:

      “On the other hand, MF-based estimates were neither as striking nor as specific to striatal regions as expected and observed in previous studies [18]. The monkeys were extensively trained on the task before recordings commenced, which may have caused a shift towards both MB behaviour and MB value representation within the striatum. Alternatively, this training may have allowed more sophisticated representations to occur, such as using latent states to expand the task space [54].”

      Importantly, we strongly believe that this possibility does not detract from our main finding that both MB and MF signals were present across regions, with ACC showing the strongest multiplexing of the two.

      (5) The GLM3 model combines MB and MF value estimates but does not clearly mention how hyperparameters were optimized to prevent overfitting. While the hybrid model explains behavior well, it does not clarify whether MB/MF weighting changes dynamically over time.

      We appreciate this comment and would like to note that, for completeness, we have on several occasions directed the reader to our prior behavioural analysis of the same dataset (Miranda et al., 2020, PLoS Computational Biology, ref 36). In that work, we provide a full and detailed description of both the task and the computational modeling approach (see particularly the “Model fitting procedures” section). Furthermore, our model-fitting was grounded in the MF/MB RL framework used in the original human two-step study (Daw et al., 2011); and the fitting procedures also followed previous studies (Huys et al., 2011).

      Hyperparameters – including the MB/MF weighting parameter (ω) - were estimated using maximum likelihood under two complementary approaches and with priors providing regularization across sessions. First, we performed a fixed-effects analysis, in which parameters were estimated independently for each session by maximizing the likelihood separately; secondly, we conducted a mixed-effects analysis, treating parameters as random effects across sessions within each subject. The effect of the prior procedure reduces the risk of overfitting by constraining parameters based on their empirical distributions, rather than allowing unconstrained session-by-session estimates. Finally, all model fitting procedures were verified on surrogate generated data.

      With regard to dynamic weighting, our approach – consistent with most two-step studies – assumed ω to be constant across trials within each session. This was a deliberate choice, both for comparability with prior work and because our subjects were extensively trained, making session-level stability of strategy weights a reasonable assumption. Indeed, our analyses showed no systematic drift in ω across sessions, suggesting that MB/MF balance was stable over sessions. While approaches that allow dynamic ω estimation are possible, we believe such extensions would likely have minimal impact in the current dataset.

      (6) It was unclear from the task description whether the images used changed periodically or how the transition effect (e.g., in Figure 3) could be disambiguated from a visual response to the pair of cues.

      All images were kept constant across sessions. Common/Rare transitions themselves were not explicitly cued, but rather each second-stage state was associated with a specific background colour, followed ~1s later by the presentation of two specific second-stage choice cues (Figure 1B). Hence the subject could infer whether they were transitioned down a Rare or Common path by the background colour, which can be disambiguated in time from the visual responses to the second-stage cues. We’ve updated the Results text to make this clearer:

      “Tracking the state-transition structure of the task is imperative for solving the task as a MB-learner. All four regions encoded whether the current trial’s first-stage choice transitioned to the common or rare second-stage state (which could be inferred by a change in background colour immediately after choice indicating which second stage state they had just entered, Figure 1A).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 7 appears to be missing.

      We thank the reviewer for pointing this out. Figure 7 was inadvertently omitted in the previous version and has now been included in the revised manuscript.

      (2) No stats reported in the section on explore/exploit.

      We apologise for this oversight. This section now also reports the relevant statistics:

      “We thus repeated our decoding analysis of choice 1 stimulus identity, but this time limited trials to those where they had not received a high reward for the previous two trials (‘explore’ trials), and those where the previous two rewards had been the highest level (‘exploit’ trials). All regions encoded choice 1 for some duration of the choice epoch for both explore (p<0.002 in all cases, permutation test; Figure 7A) and exploit (p<0.002 in all cases; Figure 7B) conditions, but decoding accuracy was strongest in ACC. Choice 1 was less strongly decoded – particularly in ACC – in the former condition compared to the latter (p<0.002 for at least 140 ms in all cases, permutation test on differences observed; Figure 7C); and, also during exploitation, the ACC encoded choice 1 before the choice was even presented to the subject (Figure S8). This pre-choice ACC encoding in exploit trials may reflect the need to allocate cognitive (or attentive) resources to features – i.e., choice 1 stimulus identity – that are most certain predictors of important outcomes. As a control, we also decoded the direction of the Choice 1 (where choice was indicated via joystick movement), which was randomised each trial and therefore orthogonal to the stimulus that was chosen. Again, all four regions encoded its direction in both explore (p<0.002 in all cases; Figure 7D) and exploit (p<0.002 in all cases; Figure 7E). However, there were minimal differences in the strength of the representation between explore and exploit conditions (ACC, p=0.088, cluster-based permutation test; DLPFC p=0.016; caudate p=0.32; putamen p=1; Figure 7F).”

      (3) Make sure that error bars are explained in all figure captions where appropriate.

      We apologise that this information was absent. Error bars always represent the standard error of the mean. This has now been added to all relevant figure legends.

      Reviewer #2 (Recommendations for the authors):

      Overall, I think this is a great manuscript and was presented clearly and succinctly. I have some minor suggestions:

      (1) Typo: Abstract "ACC, DLPFC, caudate and striatum" I think should be "caudate and putamen".

      We have amended this incorrect reference in the introduction:

      “One such task that does enable the dissociation of MB and MF computations is Daw et al. (2011)’s ‘two-step’ task [18]. It contains a probabilistic transition between task states to uncouple MF learners (who would assign credit to which state was rewarded regardless of the transition) from MB learners (who would appropriately assign credit based on the reward and transition that occurred). Rodents [19], monkeys [36], and humans [18] all use MB-like behaviour to solve the task. Evidence in rodents suggests dorsal anterior cingulate cortex (ACC) tracks rewards, states, and the probabilistic transition structure, and that ACC is essential in implementing a MB-strategy [37]. Here, we compare primate single neuron activity of 4 different subregions implicated in reward-based learning and choice (ACC, dorsolateral PFC (DLPFC), caudate, and putamen) during performance of the classic two-step task, and demonstrate signatures of MB-RL primarily in ACC, and MF-RL signatures most notably in putamen.”

      (2) Could the authors provide a rationale for why they did the single-level encoding the way they did, instead of running an ANOVA?

      We thank the reviewer for this point. We are not entirely certain which specific ANOVA approach is being suggested, but our rationale for using a GLM-based encoding analysis is that such approach allows us to model continuous, trial-by-trial variables (e.g., value signals, prediction errors, transitions) while simultaneously controlling for multiple correlated predictors. This approach is widely used in systems neuroscience (particularly in decision-making research) offering analytical flexibility and comparability with prior approaches.

      (3) How were the 20 iterations for decoding decided? That seems low.

      We do not agree that 20 repetitions of 5-fold cross validation is low. The error bars in panels 6C-E demonstrate what low variance occurred across these 20 repetitions. It is the average of these low variance repetitions against which we performed statistics by performing a permutation test where these 20 repetitions were repeated a further 500 times.

      (4) It was unclear to me how the authors reached the conclusion "Thus, caudate activity appeared to represent the value of the state the subject was currently in." when the state value wasn't computed directly. I don't see how encoding the chosen and unchosen option is the same as the state the animal is in, which should also incorporate where the animal is in a block of trials or session, and the knowledge regarding the chosen and unchosen option.

      We agree with this point and have tempered this statement:

      “Thus, caudate’s encoding of an option’s value also reflected the availability of the option.”

      (5) Figures 1C, D, and E were not legible to me even at 200% zoom.

      We apologise for this oversight. We’ve now updated panels 1C-E to a more readable size:

      (6) There is a Figure 2H in the figure legend, but the panel appears to be missing from Figure 2.

      This text has been removed.

      (7) Figure 2: It would've been nice to see F and G for all areas.

      We have now added this data as additional panels in Figure 2.

      (8) Figure 3: How is the transition disambiguated from a visual response to the set of images?

      This was indicated by the background changing colour to that of the learned second stage state before the actual choices were presented. We’ve updated the Results text to make this clearer:

      “Tracking the state-transition structure of the task is imperative for solving the task as a MB-learner. All four regions encoded whether the current trial’s first-stage choice transitioned to the common or rare second-stage state (which was indicated by a change in background colour before the second stage choices were presented, Figure 1A).”

      (9) Figure 4F: Is this collapsed across time points? So neurons that were significant at any time? I'm confused how Figure 4A relates to 4F, as 4A shows much lower percentages of significant neurons.

      Figure 4F counts the total number of neurons that had a significant period of encoding at any timepoint over the epoch (as assessed with a length-based permutation test). Whereas, 4A shows the amount of significant encoding neurons at any one time point. Investigating this further, we found that the encoding was dynamic with different neurons encoding different parts of the epoch. We have now added a new supplementary figure to highlight this and refer to it in Results:

      “Examination of the strongest signal observed, ACC’s encoding of MB Q-values, showed a dynamic pattern with different neurons encoding the signal at different parts of the epoch (Figure S6). When aggregating the number of significant coders throughout the epoch, and examining the specificity of MB versus MF coding, we found that all regions had a significant population of neurons that encoded MB-, but not MF-, derived value (30, 18.72, 23 and 24% of neurons in ACC, DLPFC, caudate and putamen respectively; all p<0.0014 binomial test against 10% (as the strongest response to either of the two options was used); Figure 4F).“

      (10) Data/ code could be made publicly available instead of upon request.

      All data and code to reproduce figures are now available at https://github.com/jamesbutler01/TwoStepExperiment. The manuscript has been updated to reflect this:

      Data and materials availability:

      All data and code to reproduce figures are available at https://github.com/jamesbutler01/TwoStepExperiment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors' goal was to advance the understanding of metabolic flux in the bradyzoite cyst form of the parasite T. gondii, since this is a major form of transmission of this ubiquitous parasite, but very little is understood about cyst metabolism and growth. Nonetheless, this is an important advance in understanding and targeting bradyzoite growth.

      Strengths:

      The study used a newly developed technique for growing T. gondii cystic parasites in a human muscle-cell myotube format, which enables culturing and analysis of cysts. This enabled the screening of a set of anti-parasitic compounds to identify those that inhibit growth in both vegetative (tachyzoite) forms and bradyzoites (cysts). Three of these compounds were used for comparative Metabolomic profiling to demonstrate differences in metabolism between the two cellular forms.

      One of the compounds yielded a pattern consistent with targeting the mitochondrial bc1 complex and suggests a role for this complex in metabolism in the bradyzoite form, an important advance in understanding this life stage.

      Weaknesses:

      Studies such as these provide important insights into the overall metabolic differences between different life stages, and they also underscore the challenge of interpreting individual patterns caused by metabolic inhibitors due to the systemic level of some of the targets, so that some observed effects are indirect consequences of the inhibitor action. While the authors make a compelling argument for focusing on the role of the bc1 complex, there are some inconsistencies in the patterns that underscore the complexity of metabolic systems.

      We agree with reviewer #1 that metabolic fingerprints are complex to interpret and we did try to approach this problem by including mock treatment and non-metabolic inhibitors as controls. We address specific concerns below.

      Reviewer #2 ( Public review):

      Summary:

      A particular challenge in treating infections caused by the parasite Toxoplasma gondii is to target (and ultimately clear) the tissue cysts that persist for the lifetime of an infected individual. The study by Maus and colleagues leverages the development of a powerful in vitro culture system for the cyst-forming bradyzoite stage of Toxoplasma parasites to screen a compound library for candidate inhibitors of parasite proliferation and survival. They identify numerous inhibitors capable of inhibiting both the disease-causing tachyzoite and the cyst-forming bradyzoite stages of the parasite. To characterize the potential targets of some of these inhibitors, they undertake metabolomic analyses. The metabolic signatures from these analyses lead them to identify one compound (MMV1028806) that interferes with aspects of parasite mitochondrial metabolism. The authors claim that MV1028806 targets the bc1 complex of the mitochondrial electron transport chain of the parasite, although the evidence for this is indirect and speculative. Nevertheless, the study presents an exciting approach for identifying and characterizing much-needed inhibitors for targeting tissue cysts in these parasites.

      Strengths:

      The study presents convincing proof-of-principle evidence that the myotube-based in vitro culture system for T. gondii bradyzoites can be used to screen compound libraries, enabling the identification of compounds that target the proliferation and/or survival of this stage of the parasite. The study also utilizes metabolomic approaches to characterize metabolic 'signatures' that provide clues to the potential targets of candidate inhibitors, although falls short of identifying the actual targets.

      Weaknesses:

      (1) The authors claim to have identified a compound in their screen (MMV1028806) that targets the bc1 complex of the mitochondrial electron transport chain (ETC). The evidence they present for this claim is indirect (metabolomic signatures and changes in mitochondrial membrane potential) and could be explained by the compound targeting other components of the ETC or affecting mitochondrial biology or metabolism in other ways. In order to make the conclusion that MMV1028806 targets the bc1 complex, the authors should test specifically whether MMV1028806 inhibits bc1-complex activity (i.e. in a direct enzymatic assay for bc1 complex activity). Testing the activity of MMV1028806 against other mitochondrial dehydrogenases (e.g. dihydroorotate dehydrogenase) that feed electrons into the ETC might also provide valuable insights. The experiments the authors perform also do not directly measure whether MMV1028806 impairs ETC activity, and the authors could also test whether this compound inhibits mitochondrial O2 consumption (as would be expected for a bc1 inhibitor).

      We thank the reviewer for highlighting this important aspect. To further investigate the effect of MMV1028806 on the mETC, we adapted a commercial oxygen consumption assay and demonstrated that MMV1028806, like Atovaquone and Buparvaquone, inhibits the ETC, leading to reduced oxygen consumption similar to Antimycin A, which inhibits the bc1-complex. These results are now included in the revised manuscript (Methods, lines 210–233; Results, lines 460–468).

      (2) The authors claim that compounds targeting bradyzoites have greater lipophilicity than other compounds in the library (and imply that these compounds also have greater gastrointestinal absorbability and permeability across the blood-brain barrier). While it is an attractive idea that lipophilicity influences drug targeting against bradyzoites, the effect seems pretty small and is complicated by the fact that the comparison is being made to compounds that are not active against parasites. If the authors are correct in their assertion that lipophilicity is a major determinant of bradyzoicidal compounds compared to compounds that target tachyzoites alone, you would expect that compounds that target tachyzoites alone would have lower lipophilicity than those that target bradyzoites. It would therefore make more sense to (statistically) compare the bradyzoicidal and dual-acting compounds to those that are only active in tachyzoites (visually the differences seem small in Figure S2B). This hypothesis would be better tested through a structure-activity relationship study of select compounds (which is beyond the scope of the study). Overall, the evidence the authors present that high lipophilicity is a determinant of bradyzoite targeting is not very convincing, and the authors should present their conclusions in a more cautious manner.

      Thank you for raising this excellent point. We performed a statistical test of tachyzoidal and both bradyzoidal and dually active compounds and find indeed no significant difference (P = 0.06). We altered the results text line 367-368 and the figure S2B caption to explicitly mention this.

      (3) Page 11 and Figure 7. The authors claim that their data indicate that ATP is produced by the mitochondria of bradyzoites "independently of exogenous glucose and HDQ-target enzymes." The authors cite their previous study (Christiansen et al, 2022) as evidence that HDQ can enter bradyzoites, since HDQ causes a decrease in mitochondrial membrane potential. Membrane potential is linked to the synthesis of ATP via oxidative phosphorylation. If HDQ is really causing a depletion of membrane potential, is it surprising that the authors observe no decrease in ATP levels in these parasites? Testing the importance of HDQ-target enzymes using genetic approaches (e.g. gene knockout approaches) would provide better insights than the ATP measurements presented in the manuscript, although would require considerable extra work that may be beyond the scope of the study. Given that the authors' assay can't distinguish between ATP synthesized in the mitochondrion vs glycolysis, they may wish to interpret their data with greater caution.

      We thank the reviewer for addressing this important point. The enzymatic assay used in our study cannot distinguish whether ATP is produced via glycolysis or mitochondrial respiration. However, we minimized glycolytic ATP production in bradyzoites by starving them for one week without glucose. After this period, amylopectin stores are depleted, forcing the parasites to utilize glutamine via the GABA shunt to fuel the TCA cycle and generate ATP predominantly through respiration. While minor ATP production via gluconeogenic fluxes cannot be excluded, the main ATP supply under these conditions is expected to originate from the mitochondrial electron transport chain. Indeed, ATP levels are lower in HDQ-treated bradyzoites, which we attribute to the compound’s impact on electron-supplying enzymes upstream of the bc1 complex, although this inhibition is not sufficient to fully abolish ATP production as observed with Atovaquone treatment.

      Reviewer #3 (Public review):

      Summary:

      The authors describe an exciting 400-drug screening using a MMV pathogen box to select compounds that effectively affect the medically important Toxoplasma parasite bradyzoite stage. This work utilises a bradyzoites culture technique that was published recently by the same group. They focused on compounds that affected directly the mitochondria electron transport chain (mETC) bc1-complex and compared them with other bc1 inhibitors described in the literature such as atovaquone and HDQs. They further provide metabolomics analysis of inhibited parasites which serves to provide support for the target and to characterise the outcome of the different inhibitors.

      Strengths:

      This work is important as, until now, there are no effective drugs that clear cysts during T. gondii infection. So, the discovery of new inhibitors that are effective against this parasite stage in culture and thus have the potential to battle chronic infection is needed. The further metabolic characterization provides indirect target validation and highlights different metabolic outcomes for different inhibitors. The latter forms the basis for new studies in the field to understand the mode of inhibition and mechanism of bc1-complex function in detail.

      The authors focused on the function of one compound, MMV1028806, that is demonstrated to have a similar metabolic outcome to burvaquone. Furthermore, the authors evaluated the importance of ATP production in tachyzoite and bradyzoites stages and under atovaquone/HDQs drugs.

      Weaknesses:

      Although the authors did experiments to identify the metabolomic profile of the compounds and suggested bc-1 complex as the main target of MMV1028806, they did not provide experimental validation for that.

      In our updated manuscript we performed additional experiments such as oxygen consumption assay to further qualify the bc1 complex as the target. We also toned down some of our statements to make sure that no false claims are made.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Introduction: It would be helpful to briefly describe what the pathogen Box is, what compounds are in it, and the rationale for using a drug screen to better understand mitochondrial function in cysts.

      Thank you for this suggestion, we added an introduction of the MMV pathogen box and outlined our rationale for our experimental approach in lines 90 to 99.

      Please explain why dual-active drugs were useful for understanding differences, rather than just seeking drugs that might target bradyzoites alone.

      We focused on dually active compounds for two reasons. First, these are the most promising and potent targets to develop drugs against. Both stages might occur simultaneously and these dually active drugs may eliminate the need for treatment with a drug combination. Second, we speculated that monitoring the responses to inhibition of the same process in both parasite stages would reveal its functional consequences. Dually active compounds enable this direct comparison. Bradyzoite-specific compounds may be interesting from a developmental perspective but may require a reverse genetic follow-up to compare differences between stages. The lack of a well-established inducible expression system in bradyzoites that allows short term and synchronized knock-down makes metabolomic approaches difficult. We added these two points in brief to the results section (line 378 – 381).

      Figure 4: this is a very important figure in understanding the significance of the work, but it is not well described in the legend. Even if these graphics have been used in other manuscripts, it would be helpful to provide better annotation in the figure legend.

      Thank you for pointing this out. We expanded the figure legend to explain the isotopologues data in more detail. Line 793 to 802.

      B,D: Explain what the three columns for each drug category represent.

      Addressed

      C,E: Explain what isotopologues are, what the M+ notation means, and what the pie charts represent. Other main figures have suitable legends.

      Addressed

      Discussion: there are several places where the reasoning is a bit hard to follow, and rearrangement to provide a clear logical flow would be helpful. In particular, the reasoning for why HDQ impairs active but non-essential processes could be laid out more clearly.

      We added additional clarifications to the discussion section and re-wrote the HDQ paragraph. We hope that our reasoning is now easier to follow.

      Abbreviations: A list of abbreviations for the entire manuscript would be helpful.

      This is a good idea and we now provide an abbreviations list.

      Minor typos:

      P12, 2d paragraph: sentence beginning with: Consistent with this hypothesis... "cysts" is used twice

      Corrected

      P15, top of the second paragraph: "nano" and "molar" should be one word

      Corrected

      Reviewer #2 (Recommendations for the authors):

      Major comments (not already covered in the weaknesses section of the public review)

      (1) Figure 2 and the related description of these experiments in the methods section (page 3). The approach for calculating IC50 values for the compounds against tachyzoites is unclear. How did the authors determine the time point for calculating IC50 vacuoles? Was this when the DMSO control wells reached maximum fluorescence? This could be described in a clearer manner. A concern with calculating IC50 values on different days is that parasites will have undergone more lytic cycles after 7 days compared to 4 days, which means that the IC50 values for fast- vs slow-acting compounds might be quite different between these days. As a more minor comment on these experiments, the methods section does not describe whether the test compound was removed after 7 days, as the experimental scheme in Figure S1A seems to imply. Please clarify in the methods section.

      This is a very good point and we clarified this in the methods section, line 157–160. In brief, we choose the latest time point when exponential growth could be observed in the fastest growing cultures, generally this was in mock treated cultures and at day 4 post infection. We also clarified that we changed media and removed treatment after 7 days.

      Minor Comments

      (2) Page 2. "we employed a recently developed human myotube-based culture system to generate mature T. gondii drug-tolerant bradyzoites". What makes these bradyzoites 'drug-tolerant' or to which drugs are they tolerant? This isn't clear from the description.

      We added these details in the introduction (line 94 to 96) and state that these cysts develop resistance against anti-folates, bumped kinase inhibitors and HDQ, a Co-enzyme Q analog.

      (3) Figure 1E. The number of compounds in this pie chart adds up to 384, whereas the methods describe that 371 compounds were tested. What explains this discrepancy in numbers?

      We understand the confusion. We now updated the pie chart to reflect only compounds that were included in the primary screen (371) as reflected in Supplementary Table S1. We separately analysed 29 compounds that were previously tested against tachyzoites by Spalenka et al., and found an additional 13 compound, that were originally included in the pie chart. In a secondary test the activity of 10 of these 13 compounds could be confirmed. All in all we found the 16 compounds shown in Fig. 2 E-G.

      (4) Page 3. The resazurin assays for measuring host cell viability could be explained in a clearer manner. What host cells were used? Were the host cells confluent when the drug was added (and the assay conducted) or was the drug added when the host cells were first seeded? How long were the host cells cultured in the candidate inhibitors before the assays were performed? What concentration (or concentration range) were the compounds tested? The host inhibition data are not easily accessible to the reader - the authors might consider including these data as part of Table S2D.

      The necessary information was added to the methods section (line 145 to 153). We tested for host toxicity in both HFF and KD3 myotubes during the primary screen at 10 µM in triplicates. The colorimetric assay was performed after tachyzoite growth assays in HFFs 7 days post infection and after completion of the 4 week re-growth phase of bradyzoites in myotubes. The resulting data is already part of Supplementary File 1. In addition, we performed concentration dependent resazurin assays after secondary concentration dependent growth inhibition assays and also included data in Supplementary File 1. For the bradyzoite growth assay we performed visual inspection after drug exposure for one week and before tachyzoite re-growth to detect missing or damaged monolayer. Also, this data is included in the Supplementary File 1. We also included the cytotoxicity data as suggested into Table S2D.

      (5) Page 7. "Except for four compounds (MMV021013, MMV022478, MMV658988, MMV659004), minimal lethal concentrations were higher in bradyzoites". The variation in these data seems quite large to be making this claim. Consider a statistical analysis of these data to compare potencies in tachyzoites vs bradyzoites.

      With this sentence we aimed to describe the results and not to make a statement. We toned down the sentence to “… minimal lethal concentrations appear generally higher in bradyzoites… “ line 344 to 347. We also added a line 1 µM in the charts to facilitate easier comparison of compound efficacies.

      (6) It would be helpful to readers to include the structures of hit compounds in the figures (perhaps as part of Figure 3).

      This is a good idea and would improve the manuscript. To not overburden figure 3 we added structures to Fig S3.

      (7) Page 8. "Infected monolayers were treated for three hours with a 3-fold of respective IC50 concentrations". 3-fold higher than IC50 concentrations? This isn't clear.

      Thank you for noticing this: We clarified the sentence and also corrected the concentration, corresponding to five times their IC50s as stated in the methods section: “Infected monolayers were treated for three hours with compound concentrations five times their respective IC<sub>50</sub> values or the solvent DMSO.” Line 374 - 376

      (8) Page 9. "buparvaquone, which we found to be dually active against T. gondii tachyzoites and bradyzoites, targets the bc1-complex in Theileria annulata (McHardy et al. 1985) and Neospora caninum (Müller et al. 2015) and was recently found active against T. gondii tachyzoites (Hayward et al. 2023)." The latter paper showed that buparvaquone targets the bc1 complex in T. gondii tachyzoites as well.

      Yes, it was found to inhibit O2 consumption rate in tachyzoites. We changed the sentence accordingly. Line 407 to 411.

      (9) Page 9. "Anaplerotic substrates were also affected by all three treatments, most notably a strong accumulation of aspartic acid." It is interesting that the M+3 isotopologue of aspartate (presumably synthesised from pyruvate) is the predominant form (rather than the M+2 and M+4 isotopologues that would derive from the TCA cycle, and as the diagram in Figure 4A seems to suggest). Given that aspartate is a precursor of pyrimidine biosynthesis that is upstream of the DHODH reaction, it is conceivable that its accumulation is related to the depletion of pyrimidine biosynthesis (so would tie into the point about the accumulation of DHO and CarbAsp noted earlier in the paragraph).

      Yes, we assume the same. We altered the text and summarized the changes in Asp as a result of DHOD inhibition, as we also already do in the next paragraph using <sup>15</sup>N-glutamine labelling. Line: 416 - 418

      (10) Figure 6 and Page 10. Regarding the metabolomic experiments that show increased levels of acyl-carnitines. The authors note that "Since [beta-oxidation] is thought to be absent in T. gondii, we attribute these changes to inhibition of host mitochondria". This is conceivable, although the T. gondii genome does encode homologs of the proteins necessary for beta-oxidation (e.g. see PMID 35298557). If the carnitine is coming from host mitochondria, is host contamination a concern for interpreting the metabolomic data? Or do the authors think that parasites are scavenging carnitine from host cells? It is curious that the carnitine accumulation is observed in parasites treated with buparvaquone (and MMV1028806) but not atovaquone, even though buparvaquone and atovaquone (and possibly MMV1028806) target the same enzyme. Do the authors have any thoughts on why that might be the case?

      Yes, thank you for raising this point. We changed the discussion elaborating on this and included the debated presence of beta-oxidation: line 640: “We also detect elevated levels of acyl-carnitines in BPQ and MMV1028806 treated bradyzoites. These molecules act as shuttles for the mitochondrial import of fatty acids for β-oxidation. However, this pathway has not been shown to be active and is deemed absent in T. gondii (35298557, 18775675). The presence of acyl-carnitines in bradyzoites might reflect import from the host. It is conceivable that their elevation in response to buparvaquone and MMV1028806 indicates compromised functionality of the host bc1-complex and subsequently accumulating β-oxidation substrates. Indeed, BPQ has a very broad activity across Apicomplexa (Hudson et al. 1985) and kinetoplastids (Croft et al. 1992).“ Regarding the existence of beta-oxidation: some potential enzymes might be conserved, but those could in part take part in branched chain amino acid degradation pathways. On a separate note: we looked extensively on beta-oxidation using stable isotope labelling and became convinced that any activity occurred in the host cell only but not in the parasite (unpublished).

      (11) Page 11. "the mitochondrial [electron] transport chain in bradyzoites".

      Corrected.

      (12) Figure S6B. Were these optimization experiments performed in tachyzoites or bradyzoites? If the former, and given that bradyzoites have apparently smaller amounts of ATP per parasite (Figure 7C), are these values in the linear range for 10^5 bradyzoites?

      Yes, we do think that the assay remains linear for these lower concentrations. Tachyzoites give a linear response starting from 10^3 parasites per sample. In the actual experiment we used 10^5 parasites, both tachyzoites and bradyzoites. Under the tested conditions bradyzoites maintain 10% of the ATP pools of tachyzoites, which should be well within the linear range of the assay. Also in Atovaquone-treated bradyzoites ATP concentration could be lower to 10% and still remain in the linear range of the assay. For practical reasons, we simply acknowledge this limitation and consider it acceptable within the scope of this study.

      Reviewer #3 (Recommendations for the authors):

      Major comments

      (1) The authors should provide a negative control for the experiment on Figure 5. I would suggest doing the same experiment with an inhibitor that has no effect on mitochondrial potential.

      We addressed this criticism by repeating the assay on tachyzoites and additionally including inhibitors that do not have the mitochondrial electron transport chain as their primary target (Pyrimethamine, Clindamycin, 6-Diazo-5-oxo-L-norleucin). The results are summarized in the supplementary Fig S5, line 445 – 449) and show that there is no effect of these inhibitors on the mitochondrial membrane potential. This supports the specificity of the assay and suggests that MMV1028806 and BPQ indeed target a mitochondrial process in this stage. Also, in this repetition ATQ, BPQ and MMV1028806 did significantly deplete the Mitotracker signal.

      (2) Figure 5 - Did the authors perform this experiment in 3 biological replicates? This requires clarification of the figure legend.

      No, we did not perform the experiment in 3 biological replicates. After establishing the assay thoroughly, we performed it once on tachyzoites and bradyzoites. The sampling was done on every vacuole we encountered during microscopy going through the slide from left to right. That is the reason the sample size varies from treatment to treatment. The sample size is mentioned in the caption of figure 5. However, we repeated the experiment with additional controls (see Fig. S5), which showed that the Mitotracker signals were significantly depleted in a very similar manner in ATQ, BPQ and MMV1028806 treated parasites.

      (3) The authors identify that MMV1028806 has bc1-complex as the main target. I suggest that they should perform a complex III activity assay to affirm this. Also, it would be good to test if other mETC complexes are affected by this compound to prove its specificity. There is only one paper showing complex III activity in tachyzoites (PMID:37471441) and no papers in bradyzoites. So if the authors cannot do this assay, I suggest that they should change the text indicating that bc-1 complex could be the main target of the compound but more experimental validation is needed.

      We hope to have satisfied the reviewer’s request by performing an oxygen consumption assay on tachyzoites. Together with metabolic profiling and labelling data, this shows that both upstream and downstream processes are impacted by MMV1028806 and strongly suggest the bc1-complex as a target (Fig 5E).

      (4) Figure S5 - Are the differences shown in the EM experiment statistically supported?

      We analyzed 28 images and measured the areas in 12 to 26 images. We substituted the table of means in Fig S6B by a graph showing individual values. These areas are indeed statistically different between DMSO and ATQ / MMV treated parasites. We changed the wording in the results section accordingly “Analysis by thin section electron microscopy revealed a largely unaffected sub-mitochondrial ultrastructure but the areas of mitochondrial profiles were changed in comparison to control after exposure with ATQ and MMV1028806 but not with BPQ (Fig. S6)“. The description of Fig S6B was changed to “(B) Measured areas of mitochondrial profiles from 21, 12, 15 and 26 images showing DMSO, ATQ, BPQ and MMV1028806 treated parasites (* denotes p < 0.05 in Mann-Whitney tests)”.

      Minor comments:

      (1) What was the criteria to choose the example compounds in Figure 1B and 1D? The authors should clarify this in the text.

      These graphs are shown for illustrative purposes and were chosen based on their display of different drug efficacies. We considered this helpful for interpreting the screening data.

      (2) Figure 2G - add statistical analysis.

      We added Mann-Whitney tests and updated the figure legend and results text accordingly in line 344 – 347.

      (3) The authors should provide more insights in the discussion about why this new compound is the next step in drug discovery compared to atovaquone or burvaquone - for example, do you expect better availability in the brain, etc.

      We used MMV1028806 and the other hits ATQ and BPQ to make the point that the bc1-complex is a good target in bradyzoites that allows curative treatment. We do not suggest that the compound itself is a good starting point. We point to other actively developed candidates such as ELQ series in the discussion, line 719.

      (4) Scale bars in Figure 5 should be aligned and have equal thickness.

      We re-formatted the scale bars and aligned them when not obscuring parasites.

      (5) The authors should be consistent with font sizes and styles in all the figures.

      We adjusted the font styles to match each other.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Both reviewers indicated broad approval of the revised work, for which we are grateful.

      Reviewer #1 requested no further changes.

      Reviewer #2’s Public review states:

      The authors indicate that the adaptors of inflammatory signalosomes act as energy reservoirs for signal amplification. This is not demonstrated, but it is assumed that the energy stored in the supersaturated state is released upon polymerization.

      The “assumed” link between supersaturation and energy release is in fact a thermodynamic necessity. Supersaturation is, by definition, a high free energy state. Our data shows that triggering nucleation via optogenetics results in an immediate avalanche of polymerization and cell death. This is not an assumption; it is a direct observation of work performed by the system when the kinetic barrier is removed.

      Reviewer #2 recommended:

      Ideally, signal amplification could be tested by determining the levels of the final product, e.g., cytokines, activated caspases...

      We did measure CASP3/7 activation, demonstrating a correlation with supersaturation of upstream adaptors. We do agree however that measuring the levels of other signaling products, including for each of the supersaturated pathways, would strengthen our claims. This will be the subject of future work.

      The authors indicate a significant anticorrelation between the saturating concentrations and the transcript abundances (Figure 2B), reporting an R = -0.285.

      This is correct… no change appears to be requested or warranted.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a high-quality and extensive study that reveals differences in the self-assembly properties of the full set of 109 human death fold domains (DFDs). Distributed amphifluoric FRET (DAmFRET) is a powerful tool that reveals the self-assembly behaviour of the DFDs, in non-seeded and seeded contexts, and allows comparison of the nature and extent of self-assembly. The nature of the barriers to nucleation is revealed in the transition from low to high AmFRET. Alongside analysis of the saturation concentration and protein concentration in the absence of seed, the subset of proteins that exhibited discontinuous transitions to higher-order assemblies was observed to have higher concentrations than DFDs that exhibited continuous transitions. The experiments probing the ~20% of DFDs that exhibit discontinuous transition to polymeric form suggest that they populate a metastable, supersaturated form in the absence of cognate signal. This is suggestive of a high intrinsic barrier to nucleation.

      Strengths:

      The differences in self-assembly behaviour are significant and likely identify mechanistic differences across this large family of signalling adapter domains. The work is of high quality, and the evidence for a range of behaviours is strong. This is an important and useful starting point since the different assembly mechanisms point towards specific cellular roles. However, understanding the molecular basis for these differences will require further analysis.

      An impressive optogenetic approach was engineered and applied to initiate self-assembly of CASP1 and CASP9 DFDs, as a model for apoptosome initiation in these two DFDs with differing continuous or discontinuous assembly properties. This comparison revealed clear differences in the stability and reversibility of the assemblies, supporting the hypothesis that supersaturation-mediated DFD assembly underlies signal amplification in at least some of the DFDs.

      The study reveals interesting correlations between supersaturation of DFD adapters in short- and long-lived cells, suggestive of a relationship between the mechanism of assembly and cellular context. Additionally, the comprehensive nature of the study provides strong evidence that the interactions are almost all homomeric or limited to members of the same DFD subfamily or interaction network. Similar approaches with bacterial proteins from innate immunity operons suggest that their polymerisation may be driven by similar mechanisms.

      Weaknesses:

      Only a limited investigation of assembly morphology was conducted by microscopy. There was a tendency for discontinuous structures to form fibrillar structures and continuous to populate diffuse or punctate structures, but there was overlap across all categories, which is not fully explored.

      We agree that an in-depth exploration of aggregate morphology would be interesting, but we feel it has limited relevance to the central findings of the manuscript. Our analysis established a relationship between discontinuous transitions and ordering based on the assumption that ordered assembly by DFDs involves polymerization, for which there is much precedent in the literature. Nevertheless, polymers of similar structure can form with different kinetics and hence, polymerization does not by itself imply an ability to supersaturate. We see this empirically in the “fibrillar” column in Fig. 1B. We have now elaborated this important point more fully in the relevant results section and in the discussion. Only five of the 108 DFDs in Fig. 1B warrant additional explanation. CASP4<sup>CARD</sup> and IFIH1<sup>tCARD</sup> lacked AmFRET but formed puncta; this could result from interactions with endogenous structures or condensates. DAPK1<sup>DD</sup> and UNC5A<sup>DD</sup> were classified as continuous (low) and fibrillar, but their AmFRET values are in fact higher than monomer control revealing that the fibrils simply comprise a small fraction of the protein. The puncta of UNC5A<sup>DD</sup> additionally do not resemble the fibrillar puncta of other DFDs; we suspect it may be a false-positive resulting from localization to mitochondrial or other intracellular membranes. Finally, CASP2<sup>CARD</sup> was inadvertently classified as punctate; this turns out to have been a technical artifact that has now been corrected (the fibrils wrapped around the cell perimeter to form ring-like puncta with anomalously low aspect ratios). We have now updated the methods section describing manual validation of our automated classification procedure, including which samples required reclassification. We have also now included all microscopy data in the public repository accompanying this manuscript.

      The methodology used to probe oligomeric assembly and stability (SDD-AGE) does not justify the conclusions drawn regarding stability and native structure within the assemblies.

      The reviewer is correct that SDD-AGE does not provide evidence against non-amyloid misfolding. It merely provides evidence that the DFDs are not forming amyloid (which are characteristically sarkosyl resistant). We have revised the sentence and further clarified that the distinction with amyloid specifically is important because amyloid is the only known form of ordered assembly (other than DFD polymers) with a nucleation barrier large enough to support deep supersaturation. Together with the series of interfacial mutants tested (and shown to impede assembly in all cases), the lack of sarkosyl-resistance provides evidence that the discontinuous DFDs are assembling through canonical DFD subunit interfaces.

      The work identifies important differences between DFDs and clearly different patterns of association. However, most of the detailed analysis is of the DFDs that exhibit a discontinuous transition, and important questions remain about the majority of other DFDs and why some assemblies should be reversible and others not, and about the nature of signalling arising from a continuous transition to polymeric form.

      We focused on discontinuous DFDs because this property allows for executive control over their respective pathways. They make signaling switch-like, which we argue is essential for innate immune responses. By contrast, and as illustrated in Figure 6D, supersaturation is required for a DFD to drive its own polymerization -- hence activation for a continuous DFD must be stoichiometrically coupled either with D/PAMP binding or positive feedback from downstream or orthogonal processes. We consider the principles underlying such regulation of signaling to be better established and understood than supersaturation, and hence built our narrative for this manuscript around the latter. Our original text addresses the fact that only a small fraction of DFDs are discontinuous. Specifically, this is expected in light of the fact that a) only one supersaturated DFD is needed to make a signaling pathway switch-like, and b) every supersaturated DFD renders the cell susceptible to spontaneous death. Evolution should therefore limit supersaturation to only the highly connected DFDs (i.e. adaptors), which is what is seen. In this view, the many nonsupersaturable DFDs have evolved to accessorize the central supersaturable DFDs with various sensor and effector modules. Our revised text attempts to further clarify this perspective.

      Some key examples of well-studied DFDs, such as MyD88 and RIPK,1 deserve more discussion, since they display somewhat surprising results. More detailed exploration of these candidates, where much is known about their structures and the nature of the assemblies from other work, could substantiate the conclusions here and transform some of the conclusions from speculative to convincing.

      We were likewise initially surprised about the inability of MyD88 and RIPK1 to supersaturate. We have now elaborated in the Discussion how our findings can be rationalized by the apparent supersaturability of other adaptors in MyD88 and RIPK1 signaling pathways. We additionally discuss prior evidence that MyD88 may indeed be supersaturable, and how our experimental system could have led to a false positive in the unique case of MyD88.

      The study concludes with general statements about the relationship between stochastic nucleation and mortality, which provide food for thought and discussion but which, as they concede, are highly speculative. The analogies that are drawn with batteries and privatisation will likely not be clearly understood by all readers. The authors do not discuss limitations of the study or elaborate on further experiments that could interrogate the model.

      We have now added to the discussion a section on the limitations of our study. We appreciate that our use of “privatisation” was confusing and have omitted it. However, we consider the battery analogy to accurately convey the newfound function of DFDs and anticipate that this analogy will ultimately prove valuable for biologists. To facilitate comprehension, we have now broadened our description of phase change batteries in the introduction.

      Reviewer #2 (Public review):

      Summary:

      The manuscript from Rodriguez Gama et al. proposes several interesting conclusions based on different oligomerization properties of Death-Fold Domains (DFDs) in cells, their natural abundance, and supersaturation properties. These ideas are:

      (1) DFDs broadly store the cell's energy by remaining in a supersaturated state;

      (2) Cells are constantly in a vulnerable state that could lead to cell death;

      (3) The cell's lifespan depends on the supersaturation levels of certain DFDs.

      Overall, the evidence supporting these claims is not completely solid. Some concerns were noted.

      Strengths:

      Systematic analysis of DFD self-assembly and its relationship with protein abundance, supersaturation, cell longevity, and evolution.

      Weaknesses:

      (1) On page 2, it is stated, "Nucleation barriers increase with the entropic cost of assembly. Assemblies with large barriers, therefore, tend to be more ordered than those without. Ordered assembly often manifests as long filaments in cells," as a way to explain the observed results that DFDs assemblies that transitioned discontinuously form fibrils, whereas those that transitioned continuously (low-to-high) formed spherical or amorphous puncta. It is unlikely to be able to differentiate between amorphous and structured puncta by conventional confocal microscopy. Some DFDs self-assemble into structured puncta formed by intertwined fibrils. Such fibril nets are more structured and thus should be associated with a higher entropic cost. Therefore, the results in Figure 1B do not seem to agree with the reasoning described.

      The formation of microscopically visible elongated structures necessitates ordering on the length scale of 100s of nanometers. Otherwise surface tension would favor rounded aggregates. Conventional confocal microscopy is in fact well-suited and widely used to distinguish ordered from disordered assemblies in cells based on this principle.1,2 We are unaware of any examples of isolated DFDs forming regular polymers that manifest as round puncta or nets. The reviewer may be referring to full-length ASC, which forms a roughly spherical mesh of filaments because it has two DFDs joined by a flexible linker. This is not applicable to our analysis with single DFDs. Single DFDs polymerize in effectively one dimension; hence a spherical punctum formed by a single DFD can only happen through noncanonical interactions or clustering of small filaments, both of which reduce order relative to long filaments.

      (2) Errors for the data shown in Figure 1B would have been very useful to determine whether the population differences between diffuse, punctate, and fibrillar for the continuous (low-to-high) transition are meaningful.

      We have now performed two statistical analyses to address this. First, using Fisher’s exact test, we observe a highly significant association between the DAmFRET and morphology classifications (p-value: 0.0001). Second, to specifically address whether the continuous (low to high) category has a preferred morphology, we applied an Exact Multinomial Test using the total frequencies of each morphology. This test revealed that all categories are significantly enriched for particular morphologies, as now indicated in the figure and legend.

      (3) A main concern in the data shown in Figure 1B and F is that the number of counts for discontinuous compared to continuous is small. Thus, the significance of the results is difficult to evaluate in the context of the broad function of DFDs as batteries, as stated at the beginning of the manuscript.

      Fig. 1B simply reports the numerical intersections between fluorescence distribution classifications and DAmFRET classifications. In Fig. 1F, our use of the chi-square test is justified by a sufficiently large sample size. Nevertheless, we obtain similar results with Fisher's exact test that accounts for smaller sample size (Odds Ratio: 75.0, P-value: < 0.0001). See also our response to the related critique by Reviewer 1 regarding the small number of discontinuous DFDs.

      (4) The proteins or domains that are self-seeded (Figure 1F) should be listed such that the reader has a better understanding of whether domains or full-length proteins are considered, whether other domains have an effect on self-seeding (which is not discussed), and whether there is repetition.

      We define and consistently use “DFDs” to refer to domains, and “FL” or “DFD-containing protein” to refer to FL proteins. The Figure 1 title and corresponding section title both indicate the data refer to “DFDs”. The text callout for Figure 1F also directs readers to Table S1 where we believe the self-seeding results and details of constructs are clearly presented. There is no repetition. We have modified the legend to clarify that “Each DFD was co-expressed with an orthogonally fluorescent μNS-fused version of the same DFD.” We did not systematically evaluate seeding of FL proteins. We did however previously test self-seeding on seven representative FL proteins, and have now included those data in a new supplemental figure (S5). In short, only FL proteins with discontinuous distributions are self-seedable. These are limited to adaptors that had discontinuous seedable DFDs, revealing no adverse effect of FL protein context on seedability of adaptors (unlike receptors and effectors).

      (5) The authors indicate an anticorrelation between transcript abundance and Csat based on the data shown in Figure 2B; however, the data are scattered. It is not clear why an anticorrelation is inferred.

      An anticorrelation is indicated by the clearly placed negative R value at the top of the graph and the figure legend describing the statistical analysis.

      (6) It would be useful to indicate the expected range of degree centrality. The differences observed are very small. This is specifically the case for the BC values. The lack of context and the small differences cast doubts on their significance. It would be beneficial to describe these data in the context of the centrality values of other proteins.

      The possible range of centrality scores is 0 - 1, where 1 represents a protein interacting with every other protein in the network (degree centrality) or is on the shortest path between every other pair of proteins in the network (betweenness centrality). The expected range is difficult to address, as centrality values strongly depend on the size and function of the network. We considered that the SAM domain network could provide the most relevant comparison to the DFD network, as SAM domains resemble DFDs in size and structure, function heavily in signaling, are comparably numerous (76 in humans), and many of them form homopolymers (but importantly of a geometry that does not support nucleation barriers). We found that SAM domains have much lower betweenness centrality in their physical interaction network as compared to discontinuous DFDs (p = 0. 0003) while their degree centrality is not significantly different (Figure S3F). Nevertheless, we stress that what matters for our conclusion is that the continuous and discontinuous values are significantly different among DFDs. Since there is a large overlap in the distributions of centrality scores between the two classes of DFDs, we performed a more robust permutation test with the Mann Whitney U statistic and n = 10000. These tests reiterated that continuous and discontinuous DFDs have significantly different centrality scores (Degree centrality p = 0.008; Betweenness centrality p = 0.028) (Figure S3E).

      (7) Page 3 section title: "Nucleation barriers are a characteristic feature of inflammatory signalosome adaptors." This title seems to contradict the results shown in Figure 2D, where full-length CARD9 and CARD11 are classified as sensors, but it has been reported that they are adaptor proteins with key roles in the inflammatory response. Please see the following references as examples: The adaptor protein CARD9 is essential for the activation of myeloid cells through ITAM-associated and Toll-like receptors. Nat Immunol 8, 619-629 (2007), and Mechanisms of Regulated and Dysregulated CARD11 Signaling in Adaptive Immunity and Disease. Front Immunol. 2018 Sep 19;9:2105. However, both CARD9 and CARD11 show discontinuous to continuous behavior for the individual DFDs versus full-length proteins, respectively, in contrast to the results obtained for ASC, FADD, etc.

      We rigorously counter the inconsistent usage of the term “adaptor” in the signalosome literature by quantifying the centrality of each protein in the physical interaction network of DFD proteins. Such analysis shows that BCL10, which is also described as an adaptor, is the more central member of the CARD9 and CARD11 (CBM signalosome) pathways, and is therefore more “adaptor-like”. We have now elaborated this view in the text.

      FADD plays a key role in apoptosis but shows the same behavior as BCL10 and ASC. However, the manuscript indicates that this behavior is characteristic of inflammatory signalosomes. What is the explanation for adaptor proteins behaving in different ways? This casts doubts about the possibility of deriving general conclusions on the significance of these observations, or the subtitles in the results section seem to be oversimplifications.

      We agree that our initial presentation of these results and brief description of each protein’s function was insufficient to fully justify our conclusions. We have now elaborated that while FADD was historically considered an adaptor of extrinsic apoptosis, it is now appreciated as a pleiotropic molecule with both anti- and pro-inflammatory signaling functions. FADD’s pro-inflammatory roles include inflammasome activation and activating NF-kB through the FADDosome. We have now revised our section headings to avoid oversimplification.

      (8) IFI16-PYD displays discontinuous behavior according to Figure S1H; however, it is not included in Figure 2D, but AIM 2 is.

      We only tested a subset of FL proteins spanning different functions within diverse signalosomes. IFI16 was not included. Hence it could not be meaningfully included in Fig. 2D.

      (9) To demonstrate that "Nucleation barriers facilitate signal amplification in human cells," constructs using APAF1 CARD, NLRC4 CARD, caspase-9 CARD, and a chimera of the latter are used to create what the authors refer to as apoptsomes. Even though puncta are observed, referring to these assemblies as apoptosomes seems somewhat misleading. In addition, it is not clear why the activity of caspase-9 was not measured directly, instead of that of capsae-3 and 7, which could be activated by other means.

      We agree that describing our chimeric assemblies as “apoptosomes” could be misleading, and have now refrained from doing so. We measured caspase-3/7 instead of caspase-9 for purely technical reasons -- we were unable to find any reliable caspase-9 activity assays that were also compatible with our optogenetic and imaging wavelengths. In any case, our data with the widely used caspase3/7 reporter dyes confirm comparably effective signal propagation from the CASP9 versions to their relevant endogenous substrate for apoptotic signaling (pro-caspase-3/7). The subsequent differences in cell death efficiency between the two versions of CASP9 (Fig. 3E) cannot be attributed to indirect effects of blue light stimulation, because both versions received the same treatment. Note our stated justification for using these DFDs in the HEK293T background is that these cells lack NLCR4 and CASP1 proteins and therefore the activity we measure is due to the direct optogenetic activation.

      The polymerization of caspase-1 CARD with NLRC4 CARD, leading to irreversible puncta, could just mean that the polymers are more stable. In fact, not all DFDs form equally stable or identical complexes, which does not necessarily imply that a nucleation barrier facilitates signal amplification. Could this conclusion be an overstatement?

      Figure 3C shows that the polymers don’t simply persist following the transient stimulus -- they continue to grow. That is, the soluble protein continues to join the polymers for a net increase even though there is no longer a stimulus directing them to do so. This means the drive to polymerize is independent of the stimulus, i.e. the protein is supersaturated. In the absence of supersaturation, a difference in stability would simply change the rates at which the polymers shrink. That we see continued growth instead of shrinkage therefore cannot be explained just by a difference in stability. Nevertheless, the reviewer’s critique caused us to realize that increased persistence of the CASP1CARD polymers could contribute to signal amplification independently of supersaturation if they act catalytically (i.e. where each polymerized CASP9 subunit sequentially activates multiple CASP3/7 molecules), and we had not adequately considered this. Unfortunately, the relevant experimentalist has now moved on from the lab leaving us unable to conduct the necessary experiments to resolve these two effects in a timely fashion. Consequently, we have now tempered our interpretation of these data. 

      (10) To demonstrate that "Innate immune adaptors are endogenously supersaturated," it is stated on page 5 that ASC clusters continue to grow for the full duration of the time course and that AIM2-PYD stops growing after 5 min. The data shown in Figure 4F indicate that AIM2-PYD grows after 5 mins, although slowly, and ASC starts to slow down at ~ 13 min. Because ASC has two DFDs, assemblies can grow faster and become bigger. How is this related to supersaturation?

      That AIM2-PYD assemblies appear to grow somewhat (although not significantly statistically) would be consistent with AIM2-PYD’s sequestration into the growing ASC clusters. All that matters for our conclusion regarding ASC is that ASC assemblies grow following cessation of the stimulus, which we now describe quantitatively. Supersaturation is defined as the ratio of total concentration to saturating concentration, which is an equilibrium property. For a given protein concentration, the presence of two DFDs, each contributing their own interactions to overall stability of the assembly, will increase supersaturation relative to the individual DFDs. Importantly, growth will not occur if the protein concentration lies below its C<sub>sat</sub>, no matter how many DFDs it has.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      It isn't clear what is implied by the final sentence of the Abstract. Some of the conclusions have a speculative tone and would be better described in less certain terms. The final sentence of the abstract should be omitted.

      We have revised the abstract to add appropriate nuance but consider the final sentence to be both justified by our data and important to convey our findings to a broad audience.

      How does the size and nature of the seed influence the outcome of these DFD interactions? Although some non-seeded experiments are described, the majority of the results are derived from seeded experiments. Further details about the seeds should be included. How is the size of the nucleus controlled, and will seeds of smaller or larger size generate the same pattern of results?

      This is a very important question! The seeds comprised genetic fusions of each DFD to a condensate-forming domain, as described. While this system is insufficient to explore the size-dependence of nucleation, we are developing tools to do exactly that, for example our recently published multivalent nanobody against mEos3,[3] wherein we piloted its use to compare the size-dependence of ASC versus amyloid nucleation. Much further work will be needed to fully utilize this approach for the question of interest, and that is the subject of ongoing but open-ended work in the lab.

      What is the implication of the observation that only ~20% of the DFDs exhibited a discontinuous transition from no to high AmFRET signal? Further discussion of the DFDs that exhibit a continuous transition would enrich the manuscript.

      We consider the relationship to mortality important for understanding this observation. In the discussion we now explain that each supersaturated protein in a death-inducing pathway imposes a risk of unintentional death. We speculate that evolution therefore minimizes the number of supersaturated DFDs by restricting them to central nodes in the network. That way, a small number of supersaturable DFDs can be continuously “repurposed” with new receptor proteins for each D/PAMP. Additionally, as stated in our response to the related critique, we felt it was important to focus this manuscript on the novel concept of functional supersaturation necessarily at the expense of signaling regulation through better understood mechanisms.

      Were the initial experiments with DFDs unseeded (Figure S1, F-G)? Clarify this in the text. The morphologies of all the subcellular assemblies appear similar. It is not possible to distinguish between long filaments and spherical or amorphous puncta (Figure S1F-G). Higher magnification images that allow evaluation and comparison of morphology should be provided.

      The initial experiments were unseeded, as now clarified in the legend. We believe there was a misinterpretation resulting from both panels (S1F and G) showing fibrillar examples. To clarify, we have now added panel S1H showing representative DFDs classified as “punctate”, which we hope the reviewer agrees are clearly distinct from fibrillar.

      The ASC and CARD14 assemblies in Figure S1G show very distinct fibrillar structures emerging from the mNS-DFD seeds. Please provide further explanation of the nature of these. Do these resemble ASC and CARD assemblies generated as a result of native stimuli rather than mNS-DFD seeds?

      The μNS-DFD puncta contain numerous seeding competent sites, which presumably causes multiple fibrils to initiate and emanate from them. This and potential bundling of these fibrils produces the star-like shape. We have no reason to believe the internal structure of these fibers differs from native signalosome assemblies. For example, point mutations at native subunit interfaces that were previously shown to disrupt fibrilization and signaling likewise disrupt assembly in our DAmFRET experiments (Figure S2A). To our knowledge there exist no examples of high-resolution DFD fibril structures that were induced by native stimuli. However, recent work using super-resolution imaging confirmed that nigericin-triggered endogenous ASC specks comprise a network of filaments that superficially resembles our star-like assemblies.[4]

      Figure S2B is presented as evidence that assembly is mediated by native-like interfaces rather than amyloid-like misfolding. These SDD-Age gels cannot be used to infer a native-like structure for the protein within the assemblies, only that the assemblies are (mostly) solubilised by incubation with sarkosyl. Many misfolding but non-amyloid-structure assemblies could be consistent with these results. Additionally, several of the samples appear to show insoluble aggregates within the wells, which could also be consistent with amyloid-type structures. What is the nature of these aggregates? Why is the NLRP3PYD sample so much more intense than the others? Why was FL-ZBP1 included when it does not contain a DFD? Why were no sarkosyl-resistant assemblies observed with RIPK3-RHIM when this is known to be highly amyloidogenic?

      ZBP1 and RIPK3<sup>RHIM</sup> were one of multiple proteins inadvertently included on the complete gel shown in the original figure that is not relevant to the manuscript; we have now spliced out these unnecessary lanes (indicated with dashed lines) to avoid confusion. We have found that the specific fragment of RIPK3<sup>RHIM</sup> used in this experiment -- residues 446-464 -- does not allow for robust amyloid formation. We believe this is a steric artifact due to its small size (19 residues) relative to the fused mEos3, because a longer fragment (446-518) forms amyloid robustly. However the latter construct was not available at the time this experiment was done. Nevertheless, another known amyloid protein, RIPK1<sup>RHIM</sup>, does show the expected smears on this gel and suffices for the positive control for amyloid. We do not understand why the NLRP3<sup>PYD</sup> sample is more intense than the others. However, this anomaly does not impact our conclusion that DFDs do not form sarkosyl-resistant smears that would be indicative of amyloid.

      Expand on the concept of autoinhibited oligomerisation. Is this due to structural features? What might be the advantage of autoinhibited oligomerisation for these DFDs?

      We have elaborated on this section in the results.

      End of page 3, which "former set of adaptors" are referred to here? This is ambiguous.

      We have replaced “former” with “innate immune”.

      Page 5, the authors state that a kinetic barrier governs the activity of inflammatory signalosomes. While under the circumstances generated in this particular system, there is a kinetic barrier to the formation of large fibrillar complexes, can the same be said to be true in cells that respond to signals? They experience a specific triggering event. This should be redrafted to distinguish between the specific trigger in cells (downstream of a binding-driven event) and the kinetic barrier to self-association observed in this model system.

      Yes, our findings establish that a kinetic barrier governs signalosome activation. By engineering a triggering event that is more specific than natural triggering events (see Figure 3), we exclude the possibility that the cell first responds to the signal to create conditions that stabilize inflammasome formation. This means that regardless of what may happen with a natural trigger, the driving force for assembly clearly pre-exists and is therefore held in check by a kinetic barrier.

      On page 6, the statement "...lifespan may be limited by the thermodynamic drive for inflammatory signal amplification" is not clear. While this is strictly true following the initial triggering event, isn't lifespan limited by the stochastic activation? These very general statements stray beyond what can be substantiated on the basis of the data presented here.

      We believe the source of confusion here was our misuse of the term “lifespan”. We have now replaced it with “life expectancy”, which we believe is substantiated by our statements as written.

      Overall, the work presents a compelling, comprehensive analysis of the seeded self-assembly of DFDs. It identifies distinct properties for assembly of these domains that may underlie their particular physiological roles. However, some of the statements are quite general and not substantiated.

      Page 6. Is "end cell fate" the intended phrase?

      We have revised the phrase.

      The data regarding conservation of DFD-like modules and activity is interesting and probably deserves inclusion. However, without substantial evidence of expression levels (i.e., results) and a more complete understanding of these other systems, the statement "These results suggest that the function of DFDs as energy reservoirs preceded the evolution of animals" appears as an over-reach.

      We demonstrated that sequence-encoded nucleation barriers of DFDs are shared across animal signalosomes (human, zebrafish, sponge). This is not trivial as such nucleation barriers are uncommon even among targeted screens of prion-like proteins.5 Therefore, they appear to have existed in the basal animal. We have now omitted the data concerning bacterial DFDs as these systems are indeed much less understood, and the concerned pathways lack the tripartite architecture of animal signalosomes. We therefore revised the sentence in question by replacing “evolution” with “radiation”.

      Only a small number of DFDs exhibit this behaviour, so why is the conclusion drawn that energy storage for on-demand signalling may be the principal ancestral function of DFDs?

      The totality of the data supports this conclusion. Briefly (but elaborated in the text), 1) intrinsic nucleation barriers are unusual even among self-associating proteins, the vast majority of which (e.g. condensates) would suffice for the only other major function ascribed to DFDs -- bringing effectors close enough for proximity-dependent activation (which has been repeatedly demonstrated in DFD-replacement experiments), 2) nucleation barriers are nevertheless conserved in innate immune signaling pathway, 3) that they are limited to approximately one DFD in each pathway is consistent with evolutionary selection to minimize accidental death.

      Are there any other adapters like MyD88 that are inconsistent with this hypothesis? Are any others known to be controlled by oligomer formation? How strong is the evidence for hexameric oligomers? If there is a threshold size for oligomers, how does this differ from a stable seed/nucleus that triggers assembly, as in the discontinuous transition?

      These are all good questions related to critiques that we have now addressed.

      The use of the term "privatisation" is likely not consistently understood across the community and should be explained. Is it simply meant to imply independent operation? How is it actually different from other forms of deployment of DFDs that exhibit continuous assembly? Are they not also independent? What is implied by the opposite of privatisation here? The term may introduce ambiguity in this context.

      We have now omitted this term.

      Is there strong evidence that well-validated physiologically relevant LLPS systems exhibit supersaturation at concentrations that are very different from those of the DFDs examined in this study?

      No, and this is a major point. As discussed in the text (with references), LLPS is incompatible with cell-wide supersaturation to a comparable magnitude as crystalline transitions, which precludes them from driving signal amplification. This helps to explain why the active state of DFD assemblies is ordered, when it has been repeatedly demonstrated that signal propagation itself does not require ordering.

      The paragraph discussing TIR domains and functional amyloids would be enhanced with a comparison of amyloid systems where seeded nucleation results in assembly of a polymer with significant conformational change in the constituent monomers.

      We do not yet understand how DFDs (and TIR domains) in some cases exhibit amyloid-like nucleation barriers without overt conformational differences between monomers and polymers. Work is underway in the lab to test specific hypotheses, but such discussion would be too speculative for the present paper.

      The statement "High specificity also insulates pathways from each other" should be elaborated to discuss the issue of highly similar monomers that apparently assemble into filamentous forms with minimal structural rearrangement. How is the specificity generated?

      We have elaborated the paragraph.

      The final paragraph is speculative and utilises language that detracts from the quality and rigour of the study. While important principles have been revealed, more discussion of the limitations of the work would allow readers to evaluate the significance of the study and could be used to effectively stimulate further efforts to study the multiple different mechanisms that underpin critical signalling pathways in innate immunity and control cell fate.

      We have now revised the final paragraph and included an extensive discussion of the limitations of the work.

      Reviewer #2 (Recommendations for the authors):

      (1) For clarity, it would be useful to include the names of the proteins in the bottom table of STable1, and such information at the top and bottom tables can be connected.

      We are unable to determine what is meant by this suggestion. Table S1 does not have a “top” and “bottom table”. Every entry in Table S1 and S2 contains the protein name, its most frequently used alias in the literature (when not the official name), and the corresponding Uniprot protein ID.

      (2) The language used in the abstract makes analogies between scientific and mundane terms, which compromises clarity. For example, what is meant by the terms shown below?

      (a) "......specifically templated by other DFDs....."

      We have revised this phrase.

      (b) "...function like batteries, storing and converting energy for life-or-death decisions."

      Batteries convert chemical energy into electrical energy or thermal energy. What is the electrical energy produced by DFDs? Is there any evidence that DFDs change the temperature of the cells or transfer heat?

      We have now included a familiar example of a thermal battery that operates analogously to the manner we show for DFDs. As now elaborated extensively, such batteries operate via a physical rather than chemical process -- a change in the state of matter (solute to crystalline) of a supersaturated “phase change material” (this is an established term). This is exactly what we show is happening for DFDs. While it would be illustrative to measure the heat released upon DFD polymerization in cells, the much faster rate of heat transfer relative to molecular diffusion makes that impossible with present methods. Nevertheless, such measurements are unnecessary because disorder-to-order phase transitions are fundamentally exothermic.

      (c) "....privatizing..."

      We now avoid this term.

      Using appropriate scientific terms to explain the scientific results presented in this manuscript will increase clarity. Analogously, it is difficult to understand what the title of the manuscript means, "Protein phase change batteries..."

      We appreciate this critique and have removed “batteries” from the title to make the work more accessible to biologists. However, we reject the implication that such terminology is inappropriate. We presume the reviewer meant “unfamiliar” instead of “inappropriate”. The well-reasoned application of terms from other fields is standard practice and arguably essential to convey new concepts in biology. The modern biology lexicon is built on this. For example, Robert Hooke co-opted “cell” from the architecture of monasteries. More recently cell biologists appropriated “condensates” from soft matter physics. In both cases, the term while initially foreign to biologists usefully introduced a concept that lacked recognized precedent in biology. Similarly, “phase change battery” provides an accurate analogy for the central finding of our work, and we have now elaborated this analogy in the text.

      Bibliography

      (1) Garcia-Seisdedos, H., Empereur-Mot, C., Elad, N. & Levy, E. D. Proteins evolve on the edge of supramolecular self-assembly. Nature 548, 244–247 (2017).

      (2) Alberti, S., Halfmann, R., King, O., Kapila, A. & Lindquist, S. A systematic survey identifies prions and illuminates sequence features of prionogenic proteins. Cell 137, 146–158 (2009).

      (3) Kimbrough, H. et al. A tool to dissect heterotypic determinants of homotypic protein phase behavior. Protein Sci. 34, e70194 (2025).

      (4) Glück, I. M. et al. Nanoscale organization of the endogenous ASC speck. iScience 26, 108382 (2023).

      (5) Posey, A. E. et al. Mechanistic inferences from analysis of measurements of protein phase transitions in live cells. J. Mol. Biol. 433, 166848 (2021).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here the authors attempted to test whether the function of Mettl5 in sleep regulation was conserved in drosophila, and if so, by which molecular mechanisms. To do so they performed sleep analysis, as well as RNA-seq and ribo-seq in order to identify the downstream targets. They found that the loss of one copy of Mettl5 affects sleep and that its catalytic activity is important for this function. Transcriptional and proteomic analyses show that multiple pathways were altered, including the clock signaling pathway and the proteasome. Based on these changes the authors propose that Mettl5 modulate sleep through regulation of the clock genes, both at the level of their production and degradation.

      Strengths:

      The phenotypical consequence of the loss of one copy of Mettl5 on sleep function is clear and well-documented.

      Weaknesses:

      The imaging and molecular parts are less convincing.

      - The colocalization of Mettl5 with glial and neuronal cells is not very clear

      We truly appreciate your suggestion. We repeated the staining experiments. To ensure better results, we tried another antibody of ELAV (mouse) and optimized the experimental conditions. This result has been included in the Figure S1 of the revised version.

      - The section on gene ontology analysis is long and confusing

      The session is revised for clarity. To get a better flow of logic, we deleted the paragraph which describing the details of Figure S6.

      - Among all the pathways affected the focus on proteosome sounds like cherry picking. And there is no experiment demonstrating its impact in the Mettl5 phenotype

      Thank you for the comments. The changes of period oppositely at transcriptional versus translational levels puzzled us a while until we found the ubiquitin pathway components changes. The regulation of Period protein degradation by ubiquitin-proteasome pathway has been well documented (Grima et al., 2002; Ko et al., 2002; Chiu et al., 2008). In addition, previous reports indicated that N6 methyladenosine (m6A) regulates ubiquitin proteasome pathway in skeletal muscle physiology (Sun et al., 2023). This information has been included in the revised manuscript in the last paragraph under the title: Mettl5 regulates the clock gene regulatory loop.

      Indeed, we haven’t found a proper way to manipulate proteasome levels in genetic tests. Proteasome is a large protein complex which is composed of many subunits. Enhancing the its activity by overexpressing its components was not applicable. Moreover, proteasome has important function during many biological processed. Disrupting its function by simply MG132 treatment which we tried results in lots of side effects.

      In this study, we also noticed the codon usage alteration caused by mettl5 mutant. Please refer to the answers to the following question for details. Previous reports also found the regulation of mettl5 on translation in other systems (Rong et al, 2020; Peng et al., 2022). Based on these analyses, it is possible that both the regulation on translation and protein degradation contributed the period protein upregulation found in mettl5 mutant. This idea has been included in the Discussion session of the revised manuscript.

      References

      Sun J, Zhou H, Chen Z, et al. Altered m6A RNA methylation governs denervation-induced muscle atrophy by regulating ubiquitin proteasome pathway. J Transl Med. 2023;21(1):845. Published 2023 Nov 23. doi:10.1186/s12967-023-04694-3

      Grima, B. et al. The F-box protein slimb controls the levels of clock proteins period and timeless. Nature 420, 178–182 (2002).

      Ko, H. W., Jiang, J. & Edery, I. Role for Slimb in the degradation of Drosophila period protein phosphorylated by doubletime. Nature 420, 673–678 (2002).

      Chiu, J. C., Vanselow, J. T., Kramer, A. & Edery, I. The phosphooccupancy of an atypical SLIMB-binding site on PERIOD that is phosphorylated by DOUBLETIME controls the pace of the clock. Genes Dev. 22, 1758–1772 (2008).

      - The ribo seq shows some changes at the level of translation efficiency but there is no connection with the Mettl5 phenotypes. In other words, how the increased usage of some codons impact clock signalling. Are the genes enriched for these codons?

      Thank you for raising this point. In our analysis, we observed an increased usage of the codons for Asp in the Mettl5 mutant. Prior work has reported a possible connection between codon usage and per protein activity. In the report, a per version with optimized codon cannot rescue circadian rhythmicity caused by per mutant, in contrast to WT version (Fu J et al. 2016). Further study indicated that dPER protein levels were also elevated in the mutant flies, suggesting a role for codon optimization in enhancing dPER expression (Figure 2B in Fu J et al. 2016). Consistent with this, we analyzed the region of codon optimization in Fu J et al. 2016. The result indicated that that GAC has a relatively high usage rate in these regions (indicated in the following two Author response image charts by the red arrow), suggesting that the Mettl5 mutation may influence per protein accumulation through altered GAC usage. Further experiments are needed to confirm this possibility. We included these details in the second last paragraph of the Discussion session.

      Author response image 1.

      15-21

      SDSAYSN

      Author response image 2.

      43-316

      SSGSSGYGGKPSTQASSSDMIIKRNKEKSRKKKKPKCIALATATTVSLEGTEESPLPANGGCEKVLQELQDTQQLGEPLVVTETQLSEQLLETEQNEDQNKSEQLAQFPLPTPIVTTLSPGIGPGHDCVGGASGGAVAGGCSVVGAGTDKTSELIPGKLESAGTKPSQERPKEESFCCVISMHDGIVLYTTPSISDVLGFPRDMWLGRSFIDFVHHKDRATFASQITTGIPIAESRGCMPKDARSTFCVMLRRYRGLNSGGFGVIGRAVNYEPF

      Fu J, Murphy KA, Zhou M, Li YH, Lam VH, Tabuloc CA, Chiu JC, Liu Y. Codon usage affects the structure and function of the Drosophila circadian clock protein PERIOD. Genes Dev. 2016 Aug 1;30(15):1761-75.

      - A few papers already demonstrated the role of Mettl5 in translation, even at the structural level (Rong et al, Cell reports 2020) and this was not commented by the authors. In Peng et al, 2022 the authors show that the m6A bridges the 18S rRNA with RPL24. Is this conserved in Drosophila?

      Thanks for the reminder. We discussed and cited these papers in the revised version.

      Rong B, Zhang Q, Wan J, et al. Ribosome 18S m<sup>6</sup>A Methyltransferase METTL5 Promotes Translation Initiation and Breast Cancer Cell Growth. Cell Rep. 2020;33(12):108544. doi:10.1016/j.celrep.2020.108544

      Peng H, Chen B, Wei W, et al. N<sup>6</sup>-methyladenosine (m<sup>6</sup>A) in 18S rRNA promotes fatty acid metabolism and oncogenic transformation. Nat Metab. 2022;4(8):1041-1054. doi:10.1038/s42255-022-00622-9

      - The text will require strong editing and the authors should check and review extensively for improvements to the use of English.

      Thanks. The text of the paper are thoroughly revised.

      Conclusion

      Despite the effort to identify the underlying molecular defects following the loss of Mettl5 the authors felt short in doing so. Some of the results are over-interpreted and more experiments will be needed to understand how Mettl5 controls the translation of its targets. References to previous works was poorly commented.

      Thanks for your suggestion. We have incorporated the references mentioned above. However, our efforts have thus far fallen short of elucidating a precise picture of METTL5's functional mechanism. To address this, the limitations of the current study have been discussed more thoroughly in the revised main text.

      Reviewer #2 (Public review):

      Summary:

      The authors define the m6A methyltransferase Mettl5 as a novel sleep-regulatory gene that contributes to specific aspects of Drosophila sleep behaviors (i.e., sleep drive and arousal at early night; sleep homeostasis) and propose the possible implication of Mettl5-dependent clocks in this process. The model was primarily based on the assessment of sleep changes upon genetic/transgenic manipulations of Mettl5 expression (including CRISPR-deletion allele); differentially expressed genes between wild-type vs. Mettl5 mutant; and interaction effects of Mettl5 and clock genes on sleep. These findings exemplify how a subclass of m6A modifications (i.e., Mettl5-dependent m6A) and possible epi-transcriptomic control of gene expression could impact animal behaviors.

      Strengths:

      Comprehensive DEG analyses between control and Mettl5 mutant flies reveal the landscape of Mettl5-dependent gene regulation at both transcriptome and translatome levels. The molecular/genetic features underlying Mettl5-dependent gene expression may provide important clues to molecular substrates for circadian clocks, sleep, and other physiology relevant to Mettl5 function in Drosophila.

      Weaknesses:

      While these findings indicate the potential implication of Mettl5-dependent gene regulation in circadian clocks and sleep, several key data require substantial improvement and rigor of experimental design and data interpretation for fair conclusions. Weaknesses of this study and possible complications in the original observations include but are not limited to:

      (1) Genetic backgrounds in Mettl5 mutants: the heterozygosity of Mettl5 deletion causes sleep suppression at early night and long-period rhythms in circadian behaviors. The transgenic rescue using Gal4/UAS may support the specificity of the Mettl5 effects on sleep. However, it does not necessarily exclude the possibility that the Mettl5 deletion stocks somehow acquired long-period mutation allelic to other clock genes. Additional genetic/transgenic models of Mettl5 (e.g., homozygous or trans-heterozygous mutants of independent Mettl5 alleles; Mettl5 RNAi etc.) can address the background issue and determine 1) whether sleep suppression tightly correlates with long-period rhythms in Mettl5 mutants; and 2) whether Mettl5 effects are actually mapped to circadian pacemaker neurons (e.g., PDF- or tim-positive neurons) to affect circadian behaviors, clock gene expression, and synaptic plasticity in a cell-autonomous manner and thereby regulate sleep. Unfortunately, most experiments in the current study rely on a single genetic model (i.e., Mettl5 heterozygous mutant).

      We believe that the multiple rescue experiments presented in Figure 1H-L and Figure 2H-L have effectively addressed the background concern. To further confirm this, we have subsequently repeated sleep and circadian rhythm assays using RNAi lines, aiming to further eliminate any remaining concerns in this regard. It appears to replicate the reduced sleep phenotype seen at night. This result has been included in the Figure S1. It is true that we have not specifically addressed whether the effects of Mettl5 are mapped to circadian pacemaker neurons in this study. We acknowledge this as a limitation and appreciate the importance of this question. Further investigations focusing on circadian pacemaker neurons, such as PDF- or tim-positive neurons, would be necessary to clarify the precise role of Mettl5 in regulating circadian behaviors and related molecular mechanisms.

      (2) Gene expression and synaptic plasticity: gene expression profiles and the synaptic plasticity should be assessed by multiple time-point analyses since 1) they display high-amplitude oscillations over the 24-h window and 2) any phase-delaying mutation (e.g., Mettl5 deletion) could significantly affect their circadian changes. The current study performed a single time-point assessment of circadian clock/synaptic gene expression, misleading the conclusion for Mettl5 effects. Considering long-period rhythms in Mettl5 mutant clocks, transcriptome/translatome profiles in Mettl5 cannot distinguish between direct vs. indirect targets of Mettl5 (i.e., gene regulation by the loss of Mettl5-dependent m6A vs. by the delayed circadian phase in Mettl5 mutants).

      In the revised version, we provided data collected at multiple time points. Specifically, we reexamined the per expression at both transcriptional and translational levels at different timepoints. The corresponding results were incorporated in Figure 4 D-F. We also dissected fly brains from UAS-DenMark, UAS-syt.eGFP/+; pdf-GAL4/+ and UAS-DenMark, UAS-syt.eGFP/+; pdf-GAL4/Mettl5<sup>1bp</sup> at these four time points to quantify the synaptic structures of PDF neurons. The result has been included in revised Figure 6.

      (3) The text description for gene expression profiling and Mettl5-dependent gene regulation was very detailed, yet there is a huge gap between gene expression profiling and sleep/behavioral analyses. The model in Figure 5 should be better addressed and validated.

      Thank you for your suggestion. We added data to better confirm the expression changes of PER protein at different time points. Indeed, what you mention is the weak point of this paper. We did analysis thoroughly during the revision process.

      The opposing changes in Period at the transcriptional versus translational levels puzzled us for some time until we identified alterations in the ubiquitin pathway components. The regulation of Period protein degradation by the ubiquitin-proteasome pathway is well-documented (Grima et al., 2002; Ko et al., 2002; Chiu et al., 2008). Additionally, previous studies have shown that N6-methyladenosine (m6A) modulates the ubiquitin-proteasome pathway in skeletal muscle physiology (Sun et al., 2023). We have incorporated this information into the revised manuscript in the last paragraph under the section titled: Clock gene regulatory loop regulating circadian rhythm was affected by Mettl5<sup>1bp</sup>

      Indeed, we have not yet identified an effective method to manipulate proteasome levels in genetic tests. The proteasome is a large protein complex composed of numerous subunits, making it impractical to enhance its activity simply by overexpressing individual components. Furthermore, the proteasome plays a critical role in many biological processes. Disrupting its function—such as through MG132 treatment, which we attempted—leads to significant off-target effects.

      Sun J, Zhou H, Chen Z, et al. Altered m6A RNA methylation governs denervation-induced muscle atrophy by regulating ubiquitin proteasome pathway. J Transl Med. 2023;21(1):845. Published 2023 Nov 23. doi:10.1186/s12967-023-04694-3

      Grima, B. et al. The F-box protein slimb controls the levels of clock proteins period and timeless. Nature 420, 178–182 (2002).

      Ko, H. W., Jiang, J. & Edery, I. Role for Slimb in the degradation of Drosophila period protein phosphorylated by doubletime. Nature 420, 673–678 (2002).

      Chiu, J. C., Vanselow, J. T., Kramer, A. & Edery, I. The phosphooccupancy of an atypical SLIMB-binding site on PERIOD that is phosphorylated by DOUBLETIME controls the pace of the clock. Genes Dev. 22, 1758–1772 (2008).

      Reviewer #3 (Public review):

      Xiaoyu Wu and colleagues examined the potential role in sleep of a Drosophila ribosomal RNA methyltransferase, mettl5. Based on sleep defects reported in CRISPR generated mutants, the authors performed both RNA-seq and Ribo-seq analyses of head tissue from mutants and compared to control animals collected at the same time point. While these data were subjected to a thorough analysis, it was difficult to understand the relative direction of differential expression between the two genotypes. In any case, a major conclusion was that the mutant showed altered expression of circadian clock genes, and that the altered expression of the period gene in particular accounted for the sleep defect reported in the mettl5 mutant. As noted above, a strength of this work is its relevance to a human developmental disorder as well as the transcriptomic and ribosomal profiling of the mutant. However, there are numerous weaknesses in the manuscript, most of which stem from misinterpretation of the findings, some methodological approaches, and also a lack of method detail provided. The authors seemed to have missed a major phenotype associated with the mettl5 mutant, which is that it caused a significant increase in period length, which was apparent even in a light: dark cycle. Thus the effect of the mutant on clock gene expression more likely contributed to this phenotype than any associated with changes in sleep behavior.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Some of the questions that the authors should address are the following ones:

      How does Mettl5 control the translation of the clock genes ? Why the level of some genes are specifically increased or decreased? What is the relation with the effect on uORF and dORF, overlapping and non overlapping ones? The observation of these defects is interesting but how they occurs and how they impact clock signaling is missing.

      Thank you for your suggestion. This is the weak point of this paper. We did analysis thoroughly during the revision process.

      The opposing changes in Period at the transcriptional versus translational levels puzzled us for some time until we identified alterations in the ubiquitin pathway components. The regulation of Period protein degradation by the ubiquitin-proteasome pathway is well-documented (Grima et al., 2002; Ko et al., 2002; Chiu et al., 2008). Additionally, previous studies have shown that N6-methyladenosine (m6A) modulates the ubiquitin-proteasome pathway in skeletal muscle physiology (Sun et al., 2023). We have incorporated this information into the revised manuscript in the last paragraph under the section titled: Clock gene regulatory loop regulating circadian rhythm was affected by Mettl5<sup>1bp</sup>.

      Indeed, we have not yet identified an effective method to manipulate proteasome levels in genetic tests. The proteasome is a large protein complex composed of numerous subunits, making it impractical to enhance its activity simply by overexpressing individual components. Furthermore, the proteasome plays a critical role in many biological processes. Disrupting its function—such as through MG132 treatment, which we attempted—leads to significant off-target effects.

      In this study, we also observed codon usage alterations caused by the mettl5 mutant. For details, please refer to our responses to 4th question of the weakness session above. Previous studies have reported mettl5's role in translational regulation in other systems (Rong et al., 2020; Peng et al., 2022). Based on these findings, we propose that both translational regulation and protein degradation may contribute to the upregulation of Period protein in the mettl5 mutant. This hypothesis has been included in the Discussion section of the revised manuscript.

      “The mechanism by which METTL5 regulates translation warrants further investigation. Previous studies have demonstrated that METTL5 influences translation (Rong et al., 2020; Peng et al., 2022), but whether the mechanisms identified here are conserved across other systems remains an intriguing question. In our analysis, we observed increased usage of aspartate (Asp) codons in Mettl5 mutants. Notably, prior work has linked codon usage to PER protein function—specifically, a codon-optimized version of PER failed to rescue circadian rhythmicity in per mutant flies, unlike the wild-type version (Fu et al., 2016). Further analysis revealed that PER protein levels were elevated in these mutants, suggesting that codon optimization enhances PER expression (Figure 2B in Fu et al., 2016). Strikingly, when we examined the codon-optimized region from Fu et al. (2016), we found that GAC (Asp) was highly enriched, raising the possibility that Mettl5 mutation affects PER protein accumulation by altering GAC codon usage. Additional experiments will be needed to validate this hypothesis. Furthermore, we detected changes in upstream open reading frames (uORFs) in Mettl5 mutants, but their relationship to translational regulation requires further exploration.”

      References

      Sun J, Zhou H, Chen Z, et al. Altered m6A RNA methylation governs denervation-induced muscle atrophy by regulating ubiquitin proteasome pathway. J Transl Med. 2023;21(1):845. Published 2023 Nov 23. doi:10.1186/s12967-023-04694-3

      Grima, B. et al. The F-box protein slimb controls the levels of clock proteins period and timeless. Nature 420, 178–182 (2002).

      Ko, H. W., Jiang, J. & Edery, I. Role for Slimb in the degradation of Drosophila period protein phosphorylated by doubletime. Nature 420, 673–678 (2002).

      Chiu, J. C., Vanselow, J. T., Kramer, A. & Edery, I. The phosphooccupancy of an atypical SLIMB-binding site on PERIOD that is phosphorylated by DOUBLETIME controls the pace of the clock. Genes Dev. 22, 1758–1772 (2008).

      Rong B, Zhang Q, Wan J, et al. Ribosome 18S m<sup>6</sup>A Methyltransferase METTL5 Promotes Translation Initiation and Breast Cancer Cell Growth. Cell Rep. 2020;33(12):108544. doi:10.1016/j.celrep.2020.108544

      Peng H, Chen B, Wei W, et al. N<sup>6</sup>-methyladenosine (m<sup>6</sup>A) in 18S rRNA promotes fatty acid metabolism and oncogenic transformation. Nat Metab. 2022;4(8):1041-1054. doi:10.1038/s42255-022-00622-9

      Fu J, Murphy KA, Zhou M, Li YH, Lam VH, Tabuloc CA, Chiu JC, Liu Y. Codon usage affects the structure and function of the Drosophila circadian clock protein PERIOD. Genes Dev. 2016 Aug 1;30(15):1761-75.

      Reviewer #2 (Recommendations for the authors):

      Please find my comments to improve the quality of your manuscript.

      Major comments

      (1) The quality of text writing in English needs to be at publishable levels. It is not a trivial problem, but it literally impairs the readability of your work. So please have professionals edit your manuscript text appropriately.

      We have carefully revised the language throughout the manuscript during the revision process.

      (2) Fig 1O: please include the total sleep profile and other analyses for rebound sleep phenotypes in control vs. Mettl5 to better validate that both genotypes were comparably sleep-deprived, but the latter shows less sleep rebound.

      Thank you for your suggestion, The other reviewer also suggested to reanalyze the sleep rebound data. We did the analysis according to the following reference. We included data sleep profiles of both genotypes in original Fig 1O. Total sleep profile and other analyses for rebound sleep phenotypes are included in the revised panel. As shown in this revised panel (now Figure 1K, L), both genotypes were comparably sleep-deprived.

      Cirelli C, Bushey D, Hill S, Huber R, Kreber R, Ganetzky B, Tononi G. 2005. Reduced sleep in Drosophila Shaker mutants. Nature 434:1087-92.

      (3) Line 90: the authors did not actually address this critical question. Additional Gal4 mapping (e.g., Mettl5 rescue or Mettl5 RNAi) will determine which cells/neural circuits are important for Mettl5-dependent sleep.

      This sentence has been revised into “The observed expression pattern of Mettl5 further supports its sleep regulatory function.”

      (4) Fig 1H-L; Fig 2H-L: the authors should check if overexpression of wild-type or mutant Mettl5 in control backgrounds could affect nighttime sleep to better define the transgenic effects among overexpression, rescue, and dominant-negative.

      Thank you for the comment. We added the overexpression phenotypes in the revised version.

      (5) Lines 225-226. Fig S11: The neural projections from PDF-expressing neurons should be better imaged and quantified. Current images can visualize PDF projections onto the optic lobe but not others (e.g., dorsal, POT), so the conclusion is not validated.

      Thank you for the suggestion. We acknowledge the limitation in the current images of PDF-expressing neuronal projections. We included new, higher-resolution images to better visualize and quantify the neural projections, including the dorsal and POT regions, to ensure the conclusion is well-supported.

      (6) Lines 230-232: per RNA/PER protein expression oscillates daily, so the authors should perform time-point experiments to conclude Mettl5 effects on clock gene expression, including per.

      Thank you for the insightful comment. We performed experiments in the Mettl5 mutant background at four time points to analyze PER protein expression using both RT-PCR and Western blot (anti-PER). The updated results have been included in Figure 4D-F.

      (7) Lines 235-238: the authors should note that Mettl5 effects on sleep in Clk or per mutant backgrounds are actually opposite to those in w1118/control one. Mettl5 deletion promotes daytime or nighttime sleep in Clk or per mutants, respectively. Any explanation? 

      We are trying to use epistasis analysis to determine which gene is upstream here. Epistasis (or epistatic effect) in genetics refers to the interaction between different genes where the expression of one gene (the epistatic gene) masks or modifies the expression of another gene (the hypostatic gene). The epistatic gene (masking gene) usually functions downstream in the pathway because its effect overrides the output of the hypostatic gene. The double mutant showed the similar phenotype as downstream genes. Thus, Clk or per functions downstream of Mettl5.

      (8) Fig 6: The dorsal PDF projections actually show time-dependent plasticity. Results from the single time-point are not conclusive.

      Thank you for the insightful comment. we further dissected fly brains from UAS-DenMark, UAS-syt.eGFP/+; pdf-GAL4/+ and UAS-DenMark, UAS-syt.eGFP/+; pdf-GAL4/Mettl5<sup>1bp</sup> at these four time points to analyze the morphology of PDF neurons. The results have been included in figure 6.

      Minor comments

      (1) Please avoid simple bar graphs in the data presentation-include individual data points or use a different graph showing the distribution of raw data (e.g., violin plot, box plot, etc.).

      Thank you for the suggestion. In the revised version of the manuscript, we have included individual data points, violin plots, and box plots to present the data, effectively showing both the distribution and differences in the raw data.

      (2) Line 19: "Clock" indicates the gene name or general terminology such as "circadian clock". Please clarify it and revise the font accordingly.

      This has been revised into“clock”

      (3) The overall flow in the Abstract/Summary is somewhat challenging for a general audience to follow.

      We have revised the text, especially the overall flow in the Abstract/Summary.

      (4) Fonts for the names of genes and gene products (i.e., mRNA, protein) should be appropriately corrected throughout the manuscript.

      We have checked the text and made changes where necessary.

      (5) Methods: the authors should provide detailed information on the methods. For instance, there is little description of how they generate Mettl5 deletions (e.g., sgRNA/target sequence). Also, they should clarify whether they test heterozygous vs. homozygous mutants of Mettl5 deletions in each experiment since the genotype description in the figure appears mixed-up (e.g., Fig 1B vs. Fig 1I-L).

      Thank you for pointing this out. In the updated version, we provided detailed information about the strains used, including the sgRNA/target sequences for generating Mettl5 deletions. Regarding the genotypes, Figure 1B represents homozygous mutants, while Figures 1I-L represent heterozygous mutants. This distinction has been clarified in the figure legends, and the genotype notation for Figures 1I-L will be revised for consistency and clarity.

      (6) Fig 1: the figure panels should be re-arranged based on the order of their text description (i.e., Fig 1H-L should go after Fig 1M-O).

      Thank you for the suggestion. In the revised version, we rearranged the figure panels so that Figures 1H-L appear after Figures 1M-O, following the order of their description in the text.

      (7) Sleep education in Trmt112 RNAi looks different from that in Mettl5 mutant het. Any explanation?

      The functional divergence between Trmt112 and Mettl5 may also contribute to the observed sleep phenotype. While Trmt112 and Mettl5 share some downstream targets, they each regulate many unique genes, some of which could influence sleep. Sleep is a highly sensitive trait that can be modulated by numerous genetic factors. Previous studies have also suggested that sleep behaves more like a quantitative trait, reflecting the combined effects of multiple genes (Mackay and Huang, 2018).

      Mackay TFC, Huang W. Charting the genotype-phenotype map: lessons from the Drosophila melanogaster Genetic Reference Panel. Wiley Interdiscip Rev Dev Biol. 2018;7(1):10.1002/wdev.289. doi:10.1002/wdev.289

      Reviewer #3 (Recommendations for the authors):

      A detailed critique is provided below. Generally, the authors can greatly improve this manuscript if they focus more rigorously on the circadian phenotype associated with the Mettl5 mutant, which could be the basis for the apparent sleep phenotype.

      (1) Please provide more information as to how each of the mettl5 mutants were generated. This information should include, specifically, the gRNA sequences, plasmids generated for the 5' and 3' arms, and anything related to the CRISPR approach for generating the mutants. Was any sequencing done to verify the CRISPR alleles, or was this limited to the analysis of mettl5 expression and behavior? Please indicate where the qPCR primers (used in Fig 1B) are located relative to the mutant loci. The figure legend is also incomplete in that there is no reference to the boxed area in Fig 1A.

      In the updated version, we have provided detailed information about the how each of the mettl5 mutants were generated. The sequence was verified by sequencing following PCR. The following references to the boxed area were added in the revised version.

      Reference

      Iyer LM, Zhang D, Aravind L. Adenine methylation in eukaryotes: Apprehending the complex evolutionary history and functional potential of an epigenetic modification. Bioessays. 2016 Jan;38(1):27-40. doi: 10.1002/bies.201500104.

      (2) As noted, I am not in agreement with the interpretation of findings for the sleep defect reported in the mettl5[1b]/+ mutants. There is a clear increase in morning sleep in the mutants that may not have reached significance by lumping the data in 12h increments (Fig1C-E). Were the overall 24h sleep values between the mutants and controls the same? The sleep profile appears to be shifted, such that nighttime sleep onset in the mutants occurs much later than wild type, and daytime waking is also much later, all pointing to a long period phenotype, which is very strongly supported by the data in Table 1, as well as the RNA- and ribo-seq data. The implications for this leading to sleep disturbances in humans is very exciting. An additional suggestion to the authors here is to report the nighttime sleep latency values (time to onset of the first sleep bout after lights off).

      We appreciate your insightful observation. As shown in Table 1, the Mettl51bp/+ mutant exhibits a robust long-period phenotype, with circadian rhythms significantly extended to 28.3 ± 0.4 hours compared to the wild-type's 23.9 ± 0.05 hours. This prolonged period perfectly aligns with the observed behavioral phenotypes, including delayed nighttime sleep onset, later daytime waking, and the overall shift in sleep profile. This is indeed quite similar to previous report on Period3 variant (Zhang et al., 2016). We agree that the prolonged circadian period contributes to the observed sleep phenotype. However, since total sleep time was significantly reduced in the mutant, we cannot attribute the phenotype solely to period lengthening. Furthermore, our 24-hour PER expression analysis in mettl5 mutants revealed elevated PER protein levels at ZT1 and ZT18, while ZT6 and ZT12 showed no significant changes, with no apparent phase shift. These findings collectively suggest that the phenotype primarily results from PER protein stabilization and accumulation.

      Importantly, genetic rescue experiments restoring wild-type Mettl5 function (UAS-Mettl5/Mettl5-Gal4; Figure 1 and Table 1) completely normalized the circadian period to 24 ± 0.02 hours, providing compelling evidence that these phenotypes specifically result from loss of Mettl5 function. Together with the sleep architecture data, these findings establish Mettl5 as a crucial regulator of circadian rhythms, with important implications for understanding human sleep disorders. To further substantiate these observations, we have now included quantitative nighttime sleep latency measurements in the revised manuscript to better document the delayed sleep onset in mutants (Figure S1G).

      We have discussed this in the third paragraph of the Discussion session and included the reference in the revised manuscript.

      Zhang L, Hirano A, Hsu PK, et al. A PERIOD3 variant causes a circadian phenotype and is associated with a seasonal mood trait. Proc Natl Acad Sci U S A. 2016;113(11):E1536-E1544. doi:10.1073/pnas.1600039113.

      (3) The description for how circadian behavior was measured and analyzed (Table 1) is missing from the methods section.

      We have included a detailed description of the methods used to measure and analyze circadian behavior, as presented in Table 1, in the revised methods “Sleep behavior assays” section.

      (4) Please explain what the "awake %" values reported in Figs 1G, 1L, Fig 2G, and 2L, Fig 4G and 4M are. Is this simply the number of flies that are awake at a given time point? This does not provide useful information beyond what is already reported for the sleep profiling in other parts of these figures. If it is an arousal threshold assay, as shown in supplementary Fig 1H, please indicate this. The description for "sleep arousal" in the methods (lines 368-371) is also concerning. If most of the mutant flies are already awake at ZT 14, then I would expect that this assay would not work at this time of day. A more suitable time point would be ZT 19, or later, when the mutants are falling asleep. Moreover, calculating the number of flies awakened as long as 5 minutes after a stimulus pulse cannot be distinguished from a spontaneous awakening, and so is not really a metric of arousal threshold. The number of sleeping flies awakened by the stimulus should be calculated within, at most, one minute afterward.

      Thank you for your suggestion. Regarding the 'awake %' metric, it indicates that at specific time points (e.g., ZT14), the percentage of awake fruit fly population at that moment. In the revised version, we further clarify the definition and significance of 'awake %'. Additionally, we have reevaluated the time points for the arousal threshold assay, selecting a more appropriate time (e.g., ZT19) to better reflect the sleep state of the mutants. Based on your suggestion, we calculate the number of flies awakened within one minute after the stimulus to ensure a more accurate measurement of arousal threshold. This has been included in the revised Figure 1M.

      (5) Fig1M-O is problematic. First, is it possible that expression of Mettl5 mRNA fluctuates with time-of-day and is not affected by sleep loss? There are no undisturbed controls collected at equivalent time points. The method used for quantifying sleep rebound in Fig 1O (lines 365-367) does not make sense, as negative values would be expected. Moreover, since the Mettl5 mutants show high sleep amounts in the morning and very low sleep amounts from ZT 12-18, this analysis would be severely confounded. Also, the sleep deprivation applied would not produce equivalent amounts of sleep loss as compared to wild type controls, so this also needs to be corrected. The authors should consider consulting Cirelli et al (2005, DOI: 10.1038/nature03486 ) as an approach for quantifying sleep homeostasis in a short-sleeping mutant. Please also show the sleep profiling in the mutants for these experiments.

      Thank you for your valuable suggestions. Regarding the possibility that Mettl5 mRNA expression fluctuates with circadian rhythms rather than being affected by sleep deprivation, we acknowledge that collecting undisturbed control samples at equivalent time points would provide critical insights. In the revised version, we included undisturbed controls to distinguish between circadian-driven fluctuations and the effects of sleep deprivation on Mettl5 expression.

      For the quantification of sleep rebound in Figure 1O, we agree that the current method may not fully capture the dynamics of sleep recovery, especially in Mettl5 mutants, where sleep patterns differ significantly from wild-type. We have referred to the method proposed by Cirelli et al. paper for quantifying sleep homeostasis in short-sleeping mutants, ensuring a more accurate evaluation of sleep rebound. The results have been included in Figure 1K-L of the revised version.

      (6) Fig 3B and C (minor) - while the volcano plots are clear, it is not clear whether "down" or "up" means for the mutant relative to wild type or the other way around? Please clarify. In Fig 3P, the legend indicates a depiction of the "top 5 pathway associated genes", but it seems there are 10 pathways depicted. Which of these are the "top 5"?

      In the volcano plots (Fig. 3B and 3C), “up” and “down” refer to genes that the mutant relative to the wild-type strain. In Fig. 3P, the legend was mislabeled as “top 5” pathway-associated genes. In fact, we displayed the top 10 pathway-associated genes. We apologize for the confusion and will correct both the figure legend and the corresponding text in our revised manuscript.

      (7) Fig 4 D-E, and F,G do not have sufficient information to draw the conclusion that Per mRNA/protein expression is increased in the Mettl5 mutant. Since both mRNA protein of this gene oscillates significantly throughout the day, it is still possible that the single time point shown in this figure might indicate a disruption in cycling rather than overall expression level. Please first indicate what time of day the tissue was collected, second, consider adding more time points to both assays. For the first part of this figure, A and B, per and Clock gene expression are expected to be in different phases, and so this aspect is not unexpected. However, it is notable that it is reversed in the mutant vs wild type. Again, an alternate interpretation of this finding that the authors have not considered is a change in period duration of gene cycling.

      Thank you for your suggestion. For the PER WB experiments, we have included multiple time points in the revised version to more comprehensively evaluate PER expression in the Mettl5 mutant and better understand its circadian rhythm changes. We appreciate your observation regarding the potential changes in the period duration of gene cycling. This has been discussed in the 3<sup>rd</sup> paragraph of the Discussion session of the revised version.

      (8) The data shown in Figs 4H-M does not support the conclusion that "Clock and Per genes were downstream of Mettl5" (line 236-237). The daytime sleep phenotype, in particular, appears additive between both circadian genes and mutant because the morning sleep of the double mutant is much higher than either mutant by itself. Statistical comparisons between the double mutant and each clock mutant are also noticeably missing. These data are difficult to interpret. One potential explanation is that Mettl5 alters gene expression of non-circadian genes, and that the phenotypes become additive when both clock and Mettl5 genes are missing. A full molecular analysis of clock gene cycling in the Mettl5 mutant may help improve understanding of the relationship between the circadian clock Mettl5 gene expression. It may also be worthwhile checking whether Mettl5 gene expression itself shows a daily oscillation.

      Thank you for your suggestion. In the revised version, we have included four additional time points to analyze the oscillatory expression of Per and Clock in the Mettl5 mutant, providing a more comprehensive understanding of their circadian rhythm changes. In Figs 4H-M, we are trying to use epistasis analysis to determine which gene is upstream here. Epistasis (or epistatic effect) in genetics refers to the interaction between different genes where the expression of one gene (the epistatic gene) masks or modifies the expression of another gene (the hypostatic gene). The epistatic gene (masking gene) usually functions downstream in the pathway because its effect overrides the output of the hypostatic gene. The double mutant showed the similar phenotype as downstream genes. Thus, Clk or per functions downstream of Mettl5. Statistical comparisons between the double mutant and each clock mutant are added.

      (9) In Fig 6, what time of day were the flies collected? PDF terminal morphology is known to change throughout the day; this is another piece of data that could indicate a defect in circadian function rather than a chronic change in synaptic morphology.

      The flies were collected around ZT14. We included additional dissection time points in future experiments. Differences between the control and Mettl5 mutants are observed consistently across multiple time points, suggesting that Mettl5 has an impact on synaptic plasticity.

      Minor:

      There are letter indicators, presumably for statistical comparisons, depicted in Figs 1 and 2 (panels I-L), but no explanation as to what these mean in the figure legends.

      We have added notes in the revised version.

      What is the purpose of the boxed regions shown in Fig S1A-F? There is no explanation of these in the figure legend nor in the text.

      The boxed regions highlight the significant co-localization of two proteins. We have included this explanation in the figure legend in the revised version.

      The statement (lines 310-311) that per and clock genes "exhibit more pronounced sleep rebound after sleep deprivation" is inaccurate. The article cited for this (Shaw et al 2002) showed that it was female mutants of the cycle gene which showed prolonged sleep rebound; other clock mutants were normal.

      Thank you for pointing out this. We revised the statement accordingly.

      Overall, the manuscript may benefit from editing or writing assistance to improve the language. There were many incomplete sentences, grammatical errors, etc.

      We have carefully refined the language throughout the manuscript during the revision process.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors report intracranial EEG findings from 12 epilepsy patients performing an associative recognition memory task under the influence of scopolamine. They show that scopolamine administered before encoding disrupts hippocampal theta phenomena and reduces memory performance, and that scopolamine administered after encoding but before retrieval impairs hippocampal theta phenomena (theta power, theta phase reset) and neural reinstatement but does not impair memory performance. This is an important study with exciting, novel results and translational implications. The manuscript is well-written, the analyses are thorough and comprehensive, and the results seem robust.

      Strengths:

      (1) Very rare experimental design (intracranial neural recordings in humans coupled with pharmacological intervention).

      (2) Extensive analysis of different theta phenomena.

      (3) Well-established task with different conditions for familiarity versus recollection.

      (4) Clear presentation of findings and excellent figures.

      (5) Translational implications for diseases with cholinergic dysfunction (e.g., AD).

      (6) Findings challenge existing memory models, and the discussion presents interesting novel ideas.

      Weaknesses:

      (1) One of the most important results is the lack of memory impairment when scopolamine is administered after encoding but before retrieval (scopolamine block 2). The effect goes in the same direction as for scopolamine during encoding (p = 0.15). Could it be that this null effect is simply due to reduced statistical power (12 subjects with only one block per subject, while there are two blocks per subject for the condition with scopolamine during encoding), which may become significant with more patients? Is there actually an interaction effect indicating that memory impairment is significantly stronger when scopolamine is applied before encoding (Figure 1d)? Similar questions apply to familiarity versus recollection (lines 78-80). This is a very critical point that could alter major conclusions from this study, so more discussion/analysis of these aspects is needed. If there are no interaction effects, then the statements in lines 84-86 (and elsewhere) should be toned down.

      The reviewer highlights important concerns regarding the statistical power of the behavioral effects. We address these concerns in the revised manuscript in two ways: (1) we provide a supplemental analysis using a matched number of blocks between the placebo and scopolamine conditions to avoid statistical bias related to differing trial counts, and (2) we include a supplemental figure illustrating paired comparisons between blocks.

      (2) Further, could it simply be that scopolamine hadn't reached its major impact during retrieval after administration in block 2? Figure 2e speaks in favor of this possibility. I believe this is a critical limitation of the experimental design that should be discussed.

      The reviewer raises an important methodological concern regarding the time required for scopolamine's effect to manifest and the subsequent impact on the study outcomes. Previous studies report that the average time to maximum serum concentration after intravenous (IV) scopolamine administration is approximately 5 minutes (Renner et al., 2005), with the corresponding clinical onset estimated at 10 minutes. In our study, the retrieval period in Block 2 commenced at 15 ± 0.2 post-injection across all subjects. Given this timing, there is sufficient reason to conclude that scopolamine had reached its major impact during the Block 2 retrieval phase. Furthermore, the observation of significant disruptions to theta oscillations during this same retrieval phase provides strong evidence that the drug was in full effect at that time.

      (3) It is not totally clear to me why slow theta was excluded from the reinstatement analysis. For example, despite an overall reduction in theta power, relative patterns may have been retained between encoding and recall. What are the results when using 1-128 Hz as input frequencies?

      Slow theta (2–4 Hz) was excluded from the reinstatement analysis to avoid potential confounding effects. Given the observed disruption to slow theta power following scopolamine administration, any subsequent changes in slow theta reinstatement would be causally ambiguous, potentially arising directly from the power effects. Therefore, we would be unable to determine whether changes in slow theta reinstatement were genuinely independent of changes in power.

      (4) In what way are the results affected by epileptic artifacts occurring during the task (in particular, IEDs)?

      To exclude abnormal events and interictal activity, a kurtosis threshold of 4 was applied to each trial, effectively filtering out segments exhibiting significant epileptic artifacts.

      Reviewer #2 (Public review):

      Summary:

      In this study, performed in human patients, the authors aimed at dissecting out the role of cholinergic modulation in different types of memory (recollection-based vs familiarity and novelty-based) and during different memory phases (encoding and retrieval). Moreover, their goal was to obtain the electrophysiological signature of cholinergic modulation on network activity of the hippocampus and the entorhinal cortex.

      Strengths:

      The authors combined cognitive tasks and intracranial EEG recordings in neurosurgical epilepsy patients. The study confirms previous evidence regarding the deleterious effects of scopolamine, a muscarinic acetylcholine receptor antagonist, on memory performance when administered prior to the encoding phase of the task. During both encoding and retrieval phases, scopolamine disrupts the power of theta oscillations in terms of amplitude and phase synchronization. These results raise the question of the role of theta oscillations during retrieval and the meaning of scopolamine's effect on retrieval-associated theta rhythm without cognitive changes. The authors clearly discussed this issue in the discussion session. A major point is the finding that the scopolamine-mediated effect is selective for recollection-based memory and not for familiarity- and novelty-based memory.

      The methodology used is powerful, and the data underwent a detailed and rigorous analysis.

      Weaknesses:

      A limited cohort of patients; the age of the patients is not specified in the table.

      To comply with human subject privacy protection policies, age was not reported; however, we did not find any significant effects of age on the behavioral or neural measures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Regarding dosage, did you take the patients' body weight into account? Do the effects hold when controlling for it?

      We controlled for participant weight, yet the observed effects were more strongly correlated with the absolute scopolamine dosage, irrespective of weight. This outcome indicates that scopolamine likely rapidly crosses the blood-brain barrier, producing swift effects that are not initially influenced by metabolic variability.

      (2) Line 96: Corrected for what kind of multiple comparisons?

      We apologize for this confusion. The statistical analysis presented in this line does not require multiple-comparison correction, and we will therefore remove the annotation.

      (3) Line 165: These are very interesting results. How do they relate to Rizzuto et al., NeuroImage, 2006?

      Our findings show that successful retrieval is tied to an encoding-retrieval phase match, which is a refinement and application of the Rizzuto et al. (2006) work. Rizzuto et al. showed that memory events are phase-locked; we show that maintaining a specific, matched phase relationship between encoding and retrieval events is critical for memory success, and that this process is dependent on the cholinergic system.

      Reviewer #2 (Recommendations for the authors):

      Figure 1b: It would be useful for clarity to have the cartoon of the treatment paradigm for the encoding phase (blocks 3 and 4).

      The treatment paradigm only involved a single intravenous (IV) injection of scopolamine (or saline, for the placebo condition). The injections were administered by the participant's attending nurse, with a board-certified anesthesiologist present at the time of injection and available throughout the experiment. These details are fully documented in the Methods section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      An interesting manuscript from the Carrington lab is presented investigating the behavior of single vs double GPI-anchored nutrient receptors in bloodstream form (BSF) T. brucei. These include the transferrin receptor (TfR), the HpHb receptor (HpHbR), and the factor H receptor (FHR). The central question is why these critical proteins are not targeted by host-acquired immunity. It has generally been thought that they are sequestered in the flagellar pocket (FP), where they are subject to rapid endocytosis - any Ab:receptor complexes would be rapidly removed from the cell surface. This manuscript challenges that assumption by showing that these receptors can be found all over the outer cell body and flagella surfaces, if one looks in an appropriate manner (rapid direct fixation in culture media).

      The main part of the manuscript focuses on TfR, typically a GPI1 heterodimer of very similar E6 (GPI anchored) and E7 (truncated, no GPI) subunits. These are expressed coordinately from 15 telomeric expression sites (BES), of which only one can be transcribed at a time. The authors identify a native E6:E7 pair in BES7 in which E7 is not truncated and therefore forms a GPI2 heterodimer. By in situ genetic manipulation, they generate two different sets of GPI1:GPI2 TfR combinations expressed from two different BESs (BES1 and BES7). Comparative analyses of these receptors form the bulk of the data.

      The main findings are:

      (1) Both GPI1 and GPI2 TfR can be found on the cell body/flagellar surface.

      (2) Both are functional for Tf binding and uptake.

      (3) GPI2 TfR is expressed at ~1.5x relative to GPI1 TfR

      (4) Ultimate TfR expression level (protein) is dependent on the BES from which it is expressed.

      Most of these results are quite reasonably explained in light of the hydrodynamic flow model of the Engstler lab and the GPI valence model of the Bangs lab. Additional experiments, again by rapid fixation, with HpHbR and FHR, show that these GPI1 receptors can also be seen on the cell surface, in contrast to published localizations.

      It is quite interesting that the authors have identified a native GPI2 TfR. However, essentially all of the data with GPI2 TfR are confirmatory for the prior, more detailed studies of Tiengwe et al. (2017). That said, the suggestion that GPI2 was the ancestral state makes good evolutionary sense, and begs the question of why trypanosomes prefer GPI1 TfR in 14 of 15 ESs (i.e., what is the selection pressure?)

      Strengths and weaknesses:

      (1) BES7 TfR subunit genes (BES7_Tb427v10): There are actually three (in order 5'3'): E7gpi, E6.1 and E6.2. E6.1 and E6.2 have a single nucleotide difference. This raises the issue of coordinate expression. If overall levels of E6 (2 genes) are not down-regulated to match E7 (1 gene), this will result in a 2x excess of E6 subunits. The most likely fate of these is the formation of non-functional GPI2 homodimers on the cell surface, as shown in Tiengwe et al. (2017), which will contribute to the elevated TfR expression seen in BES7.

      We would like to thank the reviewer for pointing out that there are two ESAG6 genes in BES7, we had relied on the publicly available annotation and should have known better.

      For transferrin expression levels, see the discussion in response to reviewer 1 point 3 below

      (2) Surface binding studies: This is the most puzzling aspect of the entire manuscript. That surface GPI2 TfR should be functional for Tf binding and uptake is not surprising, as this has already been shown by Tiengwe et al. (2017), but the methodology for this assay raises important questions. First, labeled Tf is added at 500 nM to live cells in complete media containing 2.5 uM unlabeled Tf - a 5x excess. It is difficult to see how significant binding of labeled TfR could occur in as little as 15 seconds under these conditions.

      The k<sub>on</sub> for transferrin is very rapid (BES1 TfR / bovine transferrin at pH7.4 = 4.5 x 10<sup>5</sup> M<sup>-1</sup>s<sup>-1</sup> (Trevor et al., 2019) and binding would occur to unoccupied receptors within 15 sec. The k<sub>off</sub> is also fast (BES1 TfR / bovine transferrin at pH7.4 = 3.6 x 10<sup>-2</sup> s<sup>-1</sup> (Trevor et al., 2019) and there would be exchange of transferrin within the time taken for endocytosis. These values are in vitro with purified proteins, the in vivo values may be affected by the VSG coat.

      The failure to bind canine transferrin (Supp. Figure 4B) acts as a control for specificity of the interaction.

      We have now performed a competition experiment as an additional control; cells in culture were supplemented with: A, 0.5 µM labelled transferrin; B, 0.5 µM labelled and 2.5 µM unlabelled transferrin; C, 0.5 µM labelled and 5 µM unlabelled transferrin, fixed after 60 s and visualised by fluorescence microscopy (Figure S4C). There was effective competition and greatly reduced binding of transferrin was seen in the presence of a 10-fold excess of unlabelled. We would like to thank the reviewer for suggesting this experiment.

      Second, Tiengwe et al. (2017) found that trypanosomes taken directly from culture could not bind labeled Tf in direct surface labelling experiments. To achieve binding, it was necessary to first culture cells in serum-free media for a sufficient time to allow new unligated TfR to be synthesized and transported to the surface. This result suggests that essentially all surface TfR is normally ligated and unavailable to the added probe.

      As part of the preliminary experiments for this paper we found that centrifugation followed by resuspension in either complete or serum free (but 1% BSA) medium resulted in a reduction is total cellular TfR and determined by western blotting. We have now included this experiment (Figure S4D). The inference from this experiment is that centrifugation and subsequently incubation will have an effect on receptor detection and endocytosis rates for a discreet time period.

      The amount of binding of labelled transferrin to cells in culture will depend on the specific activity of the labelled transferrin. This reasoning was behind the use of 0.5 µM labelled transferrin when roughly 1 in 6 molecules in the culture medium are labelled and there was only a small effect on the overall concentration of transferrin.

      Third, the authors have themselves argued previously, based on binding affinities, that all surface-exposed TfR is likely ligated in a natural setting (DOI:10.1002/bies.202400053). Could the observed binding actually be non-specific due to the high levels of fixative used?

      The absence of binding/uptake of canine transferrin argues against a non-specific interaction. In our previous publication, we did not pay enough attention to the on and off rates which allow for a degree of exchange and, here, TfR newly appearing on the cell surface has a 1 in 6 chance of binding a labelled transferrin.

      (3) Variable TfR expression in different BESs: It appears that native TfR is expressed at higher levels from BES7 compared to BES1, and even more so when compared to BES3. This raises the possibility that the anti-TfR used in these experiments has differential reactivity with the three sets of TfRs. The authors discount this possibility due to the overall high sequence similarities of E6s and E7s from the various ESs. However, their own analyses show that the BES1, BES3, and BES7 TfRs are relatively distal to each other in the phylogenetic trees, and this Reviewer strongly suspects that the apparent difference in expression is due to differential reactivity with the anti-TfR used in this work. In the grand scheme, this is a minor issue that does not impact the other major conclusions concerning TfR localization and function, nor the behavior of HpHbR and FHR. However, the authors make very strong conclusions about the role of BESs in TfR expression levels, even claiming that it is the 'dominant determinant' (line 189).

      This point is valid but exceptionally difficult to address at the protein level. As an orthogonal approach, we performed RNAseq analysis of the ‘wild type’ BES1, BES3, and BES7 cell lines to determine whether differences in receptor mRNA levels were consistent with the proposed difference in protein levels (Table S1). The analysis showed total ESAG6/7 mRNA levels to vary in a similar manner to the protein estimates with BES3 < BES1 < BES7 providing support for the differences in protein levels.

      The strongest evidence for the expression site determining the TfR level is the comparison of the cell lines in which the VSG were exchanged. This had no effect on TfR levels and so there is no evidence that the identity of the VSG alters TfR expression.

      (4) Surface immuno-localization of receptors: These experiments are compelling and useful to the field. To explain the difference with essentially all prior studies, the authors suggest that typical fixation procedures allow for clearance of receptor:ligand complexes by hydrodynamic flow due to extended manipulation prior to fixation (washing steps). Despite the fact that these protocols typically involve ice-cold physiological buffers that minimize membrane mobility, this is a reasonable possibility. Have the authors challenged their hypothesis by testing more typical protocols themselves? Other contributing factors that could play a role are the use of deconvolution, which tends to minimize weak signals, and also the fact that investigators tend to discount weak surface signals as background relative to stronger internal signals.

      We have added preliminary experiments that compared fixation protocols in two parts. First the effect on TfR levels of washing and resuspending cells discussed above (Figure S4D), and second how different fixation protocols alter apparent TfR immunolocalisation (Supp Figure S5A-B). The comparison shows that both the absence of glutaraldeyde and the use of washing alters the outcome.

      (5) Shedding: A central aspect of the GPI valence model (Schwartz et al., 2005, Tiengwe et al., 2017) is that GPI1 reporters that reach the cell body surface are shed into the media because a single dimyristoylglycerol-containing GPI anchor does not stably associate with biological membranes. As the authors point out, this is a major factor contributing to higher steady-state levels of cell-associated GPI2 TfR relative to GPI1 TfR. Those studies also found that the size/complexity of the attached protein correlated inversely with shedding, suggesting exit from the flagellar pocket as a restricting factor in cell body surface localization. The amount of newly synthesized TfR shed into the media was ~5%, indicating that very little actually exits the FP to the outer surface. In this regard, is it possible to know the overall ratio of cell surface:FP:endosomal localized receptors? Could these data not be 'harvested' from the 3D structural illumination imaging?

      A ratio could be determined but we did not do this as it would only be valid if the antibody has equal access to the internal TfR in a diluted VSG environment and the external VSG embedded in a densely packed and cross-linked VSG layer As such, we would have no confidence in the accuracy of any estimate.

      Reviewer #2 (Public review):

      The work has significant implications for understanding immune evasion and nutrient uptake mechanisms in trypanosomes.

      While the experimental rigor is commendable, revisions are needed to clarify methodological limitations and to broaden the discussion of functional consequences.

      The authors argue that prior studies missed surface-localized TfR due to harsh washing/fixation (e.g., methanol). While this is plausible, additional evidence would strengthen the claim.

      Preliminary experiments that compared fixation protocols are now included to show that method affects outcome.

      It remains unclear how centrifugation steps of various lengths (as in previous publications) can equally and quantitatively redistribute TfR into the flagellar pocket. If this were the case, it should be straightforward for the authors to test this experimentally.

      Not aware of previous studies that demonstrate equal and quantitative redistribution to the flagellar pocket. In previous reports, there is variation in cell surface/flagellar pocket localisation depending on expression levels, for example (Mussmann et al., 2003) (Mussmann et al., 2004), it’s worth noting that the increase in TfR expression in these papers is similar to the difference in the cell lines used here. In addition, most report the presence of TfR in endosomal compartments. In the experiments here, there are cells where the majority of signal from labelled transferrin is present in the flagellar pocket and the argument is that this is a stage of a continuous process in which the receptor picks up a transferrin on the cell surface and is swept towards the pocket.

      If TfR is distributed over the cell surface, live-cell imaging with fluorescent transferrin should be performed as a control. Modern detection limits now reach the singlemolecule level, and transient immobilization of live trypanosomes has been established, which would exclude hydrodynamic surface clearance as a confounding factor.

      This is non-trivial and is a longer-term aim. The immobilisation involves significant manipulation of the cells prior to restraining.

      In most images, TfR is not evenly distributed on the surface but rather appears punctate. Could this reflect localization to membrane domains? Immuno-EM with high-pressure frozen parasites could resolve this question and is relatively straightforward.

      There is a non-uniform appearance in the super-resolution images for both TfR and FHR. We cannot distinguish whether this represents random variation in receptor density over the cell surface or results from a biological phenomenon. Whatever the cause, the experiments showed unambiguous cell surface localisation.

      The authors might consider discussing whether differences in parasite life cycle stages (procyclic versus bloodstream forms) or culture conditions (e.g., cell density) affect localization. The developmentally regulated retention of GPI-anchored procyclin in the flagellar pocket might be worth mentioning.

      The aim of this paper was to determine the localisation of receptors in proliferating bloodstream form trypanosomes in culture. TfR and HpHbR are not expressed in insect stages in culture. FHR is expressed in insect stages and is present all over the cell surface (Macleod et al., 2020). A procyclin-based reporter was distributed over the whole cell surface in one report (Schwartz et al. 2005). In other reports, the retention of procyclin in the flagellar pocket of proliferating bloodstream forms is probably dependent on structure/sequence as other single GPI-anchored proteins, such as FHR (Macleod et al., 2020) and GPI-anchored sfGFP (Martos-Esteban et al., 2022) can access the surface.

      References:

      MacGregor, P., Gonzalez-Munoz, A. L., Jobe, F., Taylor, M. C., Rust, S., Sandercock, A. M., Macleod, O. J. S., Van Bocxlaer, K., Francisco, A. F., D’Hooge, F., Tiberghien, A., Barry, C. S., Howard, P., Higgins, M. K., Vaughan, T. J., Minter, R., & Carrington, M. (2019). A single dose of antibody-drug conjugate cures a stage 1 model of African trypanosomiasis. PLoS Neglected Tropical Diseases, 13(5), e0007373. https://doi.org/10.1371/journal.pntd.0007373

      Macleod, O. J. S., Bart, J.-M., MacGregor, P., Peacock, L., Savill, N. J., Hester, S., Ravel, S., Sunter, J. D., Trevor, C., Rust, S., Vaughan, T. J., Minter, R., Mohammed, S., Gibson, W., Taylor, M. C., Higgins, M. K., & Carrington, M. (2020). A receptor for the complement regulator factor H increases transmission of trypanosomes to tsetse flies. Nature Communications, 11(1), 1326. https://doi.org/10.1038/s41467-020-15125-y

      Martos-Esteban, A., Macleod, O. J. S., Maudlin, I., Kalogeropoulos, K., Jürgensen, J. A., Carrington, M., & Laustsen, A. H. (2022). Black-necked spitting cobra (Naja nigricollis) phospholipases A2 may cause Trypanosoma brucei death by blocking endocytosis through the flagellar pocket. Scientific Reports, 12(1), 6394. https://doi.org/10.1038/s41598-02210091-5

      Mussmann, R., Engstler, M., Gerrits, H., Kieft, R., Toaldo, C. B., Onderwater, J., Koerten, H., van Luenen, H. G. A. M., & Borst, P. (2004). Factors affecting the level and localization of the transferrin receptor in Trypanosoma brucei. The Journal of Biological Chemistry, 279(39), 40690–40698. https://doi.org/10.1074/jbc.M404697200

      Mussmann, R., Janssen, H., Calafat, J., Engstler, M., Ansorge, I., Clayton, C., & Borst, P. (2003). The expression level determines the surface distribution of the transferrin receptor in Trypanosoma brucei. Molecular Microbiology, 47(1), 23–35. https://doi.org/10.1046/j.13652958.2003.03245.x

      Schwartz, K. J., Peck, R. F., Tazeh, N. N., & Bangs, J. D. (2005). GPI valence and the fate of secretory membrane proteins in African trypanosomes. Journal of Cell Science, 118(Pt 23), 5499–5511. https://doi.org/10.1242/jcs.02667

      Trevor, C. E., Gonzalez-Munoz, A. L., Macleod, O. J. S., Woodcock, P. G., Rust, S., Vaughan, T. J., Garman, E. F., Minter, R., Carrington, M., & Higgins, M. K. (2019). Structure of the trypanosome transferrin receptor reveals mechanisms of ligand recognition and immune evasion. Nature Microbiology, 4(12), 2074–2081. https://doi.org/10.1038/s41564-019-0589-0

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major Recommendations:

      (1) 2 E6 gene in BES7s: This does not affect the overall conclusions, but the text should be modified to reflect the existence of the second gene, and to discuss the ramifications.

      This has been corrected

      (2) Surface binding studies: To clarify this issue, two experimental approaches are strongly recommended. First: additional excess unlabelled Tf should be added. If binding is truly receptor-mediated, it must by definition be saturable at some experimentally achievable level. Second: TfR expression should be abrogated by RNAi silencing to show that binding is TfR-dependent. Without some validation of specific binding by one or both of these approaches, these counter-intuitive results must be questioned.

      The excess unlabelled transferrin experiment is now included (we would like to thank the reviewer for this suggestion). The absence of binding of canine transferrin provides strong evidence for the specificity.

      (3) Variable TfR expression in different BESs: To make such claims, quantitative RTPCR should be performed with conserved primers to assess the actual relative expression at the transcriptional level. Absent this, the claims should be eliminated, or at the very least greatly tempered.

      This has been done using an RNAseq analysis.

      (4) Surface immuno-localization of receptors: An example of discounting weak signals as background can be seen in Figure 8 of Duncan et al. (2024). It has also been shown that at least one other GPI1 reporter (procyclin) is readily detected on the outer cell surface under ectopic expression in BSF trypanosomes (Schwartz et al., 2005) using typical fixation procedures. This could be cited, and the authors could discuss the fact that procyclin is not a receptor and may not be susceptible to hydrodynamic drag.

      Yes

      Minor issues:

      (1) Fully appreciating the data presented requires an understanding of the hydrodynamic flow and GPI valence models of the Engstler and Bangs labs, respectively. For the uninitiated,d it might perhaps be useful to include brief summaries of each in the Introduction.

      Added to the introduction

      (2) Lines 110-112: ISG65 and ISG75 both have strong localizations in endosomal compartments. This should be noted with citation of any of the work from the Field lab.

      Added

      (3) Lines 121-132: This passage presents the role of GPI anchors (1 vs 2) in a rather digital manner (in or out). Schwartz et al (2005) present a much more nuanced view of what is likely taking place. This is one reason summaries of hydrodynamic flow and GPI valence would be helpful.

      Modified

      (4) Lines 182-184: The increased size of GPI-anchored E7 is in part due to the presence of the GPI itself, as the authors state, but there are also 24 additional amino acid residues in this protein that contribute.

      Modified

      (5) Lines 212-214: Do p>0.95 and p>0.99 indicate statistical significance? This must be a typo.

      Thank you, corrected

      (6) Lines 218-219: The better references documenting GPI number in regard to turnover/shedding are Schwartz et al. 2005 and Tiengwe et al. 2017.

      Changed

      (7) Line 241 and Figures 3, 4, and 6: The transverse sections add little to the presentation. That there is signal variation in all dimensions is readily apparent from the images themselves, and similar profiles would be obtained regardless of the transect. Was there some process/rationale in the selection of the individual transects intended to make a broader point? If so, a description of the process should be provided.

      The point was to show that the signal had a pattern consistent with plasma membrane (two distal peaks) as opposed to cytoplasm (single central peak). As such, we think it is important.

      (8) Lines 582-596: Methodology for quantitation of cellular fluorescent signals should be provided.

      Has been expanded

      Reviewer #2 (Recommendations for the authors):

      (1) As a less critical but still useful control, antibody accessibility assays on live versus fixed parasites could test whether VSG coats limit detection.

      This could only be quantified by using a range of monoclonal antibodies which are not available.

      (2) The rapid transferrin uptake (15-60 seconds) could reflect fast endocytic recycling rather than stable surface residency. A pulse-chase experiment tracking receptor movement would clarify this (though I acknowledge that this is technically challenging).

      We agree that endocytic recycling is probably the main source of unoccupied TfR on the cell surface. It is hard to see how the pulse chase experiment could be performed without centrifugation which will affect the outcome – see above.

      (3) Statistical and quantitative reporting

      Added as Table S2- S4

      (4) Report confidence intervals (e.g., for fluorescence intensity comparisons in Figure 3B) to contextualize claims of "no significant difference."

      We do not claim ‘no significant difference’ and the SD overlap due to a high level of variation in the population

      (5) Specify the number of biological replicates and cells analyzed per condition in the figure legends.

      Added

      (6) The study notes that surface-exposed receptors avoid antibody detection, but does not explore how.

      We don’t claim that receptors avoid detection and have published evidence to the contrary. The cell has evolved mechanisms to reduce/minimise the effect of antibody binding.

      (7) Comparing antibody binding to TfR in VSG221 versus VSG224 coats.

      This is already present in Figure 3D

      (8) Testing whether receptor shedding or conformational masking contributes to immune evasion.

      A lifetime’s work

      (9) Evolutionary trade-offs: Discuss why T. brucei maintains ~15 TfR variants if the GPI-anchor number has minimal impact on function (Figure 3).

      The possible reason for the evolution of ~15 TfR variants was discussed in a previous publication.

      (10) How do their findings align with recent studies on ISG75 surface exposure?

      If this refers to the finding that ISG75 is an Ig Fc receptor, this has been included

      (11) Add scale bars to 3D reconstructions (Figure 5).

      Added

      (12) Include a schematic summarizing key findings in the main text.

      Chosen not to do

      (13) Explicitly state where raw microscopy images, flow cytometry data, and analysis scripts are deposited.

      Microscope Images have deposited in Bioimage Archive repository at EMBL/EBI No flow cytometry used

      (14) Correct inconsistent GPI-anchor terminology (e.g., "glycosylphosphoinositol" to "glycosylphosphatidylinositol").

      Our typo, corrected

      (15) Clarify ambiguous phrases (e.g., "subtle mechanisms" in the Discussion).

      Corrected

    1. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely appreciate your constructive feedback. Based on the comments from the three reviewers, we were able to substantially improve the manuscript. Below, we provide our point-by-point responses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study examined the functional organization of the mouse posterior parietal cortex (PPC) using meso-scale two-photon calcium imaging during visually-guided and history-guided tasks. The researchers found distinct functional modules within the medial PPC: area A, which integrates somatosensory and choice information, and area AM, which integrates visual and choice information. Area A also showed a robust representation of choice history and posture. The study further revealed distinct patterns of inter-area correlations for A and AM, suggesting different roles in cortical communication. These findings shed light on the functional architecture of the mouse PPC and its involvement in various sensorimotor and cognitive functions.

      Strengths:

      Overall, I find this manuscript excellent. It is very clearly written and built up logically. The subject is important, and the data supports the conclusions without overstating implications. Where the manuscript shines the most is the exceptionally thorough analysis of the data. The authors set a high bar for identifying the boundaries of the PPC subareas, where they combine both somatosensory and visual intrinsic imaging. There are many things to compliment the authors on, but one thing that should be applauded in particular is the analysis of the body movements of the mice in the tube. Anyone working with head-fixed mice knows that mice don't sit still but that almost invariable remains unanalyzed. Here the authors show that this indeed explained some of the variance in the data.

      Weaknesses:

      I see no major weaknesses and I only have minor comments.

      Reviewer #2 (Public review):

      Summary:

      The posterior parietal cortex (PPC) has been identified as an integrator of multiple sensory streams and guides decision-making. Hira et al observe that dissection of the functional specialization of PPC subregions requires simultaneous measurement of neuronal activity throughout these areas. To this end, they use wide-field calcium imaging to capture the activity of thousands of neurons across the PPC and surrounding areas. They begin by delineating the boundaries between the primary sensory and higher visual areas using intrinsic imaging and validate their mapping using calcium imaging. They then conduct imaging during a visually guided task to identify neurons that respond selectively to visual stimuli or choices. They find that vision and choice neurons intermingle primarily in the anterior medial (AM) area, and that AM uniquely encodes information regarding both the visual stimulus and the previous choice, positioning AM as the main site of integration of behavioral and visual information for this task.

      Strengths:

      There is an enormous amount of data and results reveal very interesting relationships between stimulus and choice coding across areas and how network dynamics relate to task coding.

      Weaknesses:

      The enormity of the data and the complexity of the analysis make the manuscript hard to follow. Sometimes it reads like a laundry list of results as opposed to a cohesive story.

      Reviewer #3 (Public review):

      Summary: This work from Hira et al leverages mesoscopic 2-photon imaging to study large neural populations in different higher visual areas, in particular areas A and AM of the parietal cortex. The focus of the study is to obtain a better understanding of the representation of different task-related parameters, such as choice formation and short-term history, as well as visual responses in large neural populations across different cortical regions to obtain a better understanding of the functional specialization of neural populations in each region as well as the interaction of neural populations across regions. The authors image a large number of neurons in animals that either perform visual discrimination or a history-dependent task to test how task demands affect neural responses and population dynamics. Furthermore, by including a behavioral perturbation of animal posture they aim to dissociate the neural representation of history signals from body posture. Lastly, they relate their functional findings to anatomical data from the Allen connectivity atlas and show a strong relation between functional correlations on anatomical connectivity patterns.

      Strengths:

      Overall, the study is very well done and tackles a problem that should be of high interest to the field by aiming to obtain a better understanding of the function and spatial structure of different regions in the parietal cortex. The experimental approach and analyses are sound and of high quality and the main conclusions are well supported by the results. Aside from the detailed analyses, a particular strength is the additional experimental perturbation of posture to isolate history-related activity which supports the conclusion that both posture and history signals are represented in different neurons within the same region. Weaknesses: The main point that I found hard to understand was the fairly strong language on functional clusters of neurons while also stating that neurons encoded combinations of different types of information and leveraging the encoding model to dissociate these contributions. Do the authors find mixed selectivity or rather functional segregation of neural tuning in their data? More details on this and some other points are below.

      We thank the three reviewers for their accurate and expert evaluations.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It wasn't clear to me why the authors focused on areas A and AM, but not RL. After all, at the beginning of the results, the authors ask: "PPC has been reported to have functions including visually guided decision-making and working memory. Do these functions differ among RL, A, and AM?".

      Thank you for the comment. The manuscript first characterizes AM as a region involved in visually guided decision-making and A as a region related to history and/or working memory. Subsequently, when discussing correlation structure, we stated the following:

      “In particular, based on the critical functional differences between A and AM that we found, A and AM may belong to distinct cortical networks that consist of different sets of densely interacting cortical areas.”

      Thus, the logical flow of our analysis is to first reveal the functional contrast between A and AM through comparative functional analyses across RL, A, and AM, and then to focus on this contrast. We speculate that RL may exhibit more distinctive functional properties in tasks that rely on whisker-based processing or related modalities. We have therefore revised the text as described below to avoid the impression that the manuscript places disproportionate emphasis on RL.

      Line 137: “PPC has been reported to have functions including visually guided decisionmaking and working memory. Do these functions differ among A, AM, and RL?”

      (2) Figures 2 E, F, and Figure 3A, could the authors indicate the trial structure better on these plots?

      Thank you for the comment. We have added explanations of the bar meanings to the figure legends.

      Figure 2:

      “(E) Representative vision neurons (ROI 1-4 in I). The red bars indicate sampling periods during video presentation, and the brown bars indicate sampling periods without video stimulation. Vertical black lines mark the onset of the sampling period. F. Representative choice neuron (ROI 5-8 in I) and a non-selective neuron (ROI 9). Light blue lines indicate the response periods in trials with left choices, and purple lines indicate the response periods in trials with right choices. Vertical black lines mark the onset of the response period.”

      Figure 3:

      “(A) The representative history neurons. Numbers correspond to that of panel B and C. Light blue lines indicate rewards delivered from the left lick port, and purple lines indicate rewards delivered from the right lick port. Vertical white lines mark the onset of the sampling period.”

      (3) There are several typos that need correcting. Also, small and big capital letters to demark the panel names in the legends have been mixed.

      Thank you for the comment. We have corrected the panel labels as described below.

      Figure 2 legend:

      “Representative choice neuron (ROI 5-8 in I) and a non-selective neuron (ROI 9)”

      Figure 3 legend:

      “..than the next choice. I. The decoding accuracy of the next choice …”

      Figure 3 legend:

      “Error bars, mean ± s.e.m. in I, 95% confidence interval in G. M, and O.”

      Supplementary Figure 6:

      “…neurons with rt ≥ 0.3 (blue) were shown. B. Trial-to-trial activity fluctuation … (rt ≥ 0.3, panel B) was color coded…”

      We thoroughly checked the manuscript for typographical errors and corrected the issues.

      (4) Many in the field still use the Paxinos nomenclature for PPC subfields, could the authors write something short about how these two nomenclatures correspond?

      We have described the relationship between our area definitions and those of Paxinos in the main text as follows.

      Line 702: “In addition to our definition, previous studies have also defined posterior parietal cortex (PPC) to include the higher visual areas A, AM, and RL (Glickfeld and Olsen, 2017; Wang et al., 2011). These areas partially overlap with the parietal association regions defined in the Paxinos atlas, including MPtA, LPtA, PtPD, and PtPR. For a detailed discussion of the correspondence and variability among these regional definitions, see Lyamzin and Benucci (2019).”

      (5) Analyzing choice history may be affected by the long fluorescence Ca transients and will depend on excellent event deconvolution. Could the authors show some more zoomed-in examples of how well their deconvolution works?

      We provide enlarged, trial-by-trial activity traces of the four example neurons shown in Figure 3A in Supplementary Figure 3G. In all neurons, multiple small calcium transients occur repeatedly throughout the delay period, which lasts longer than 10 s. If the sustained activity during the delay were simply due to a long decay time constant, one would expect a large calcium transient in the preceding trial that slowly decays over the delay period. However, such a pattern is not observed in the actual data. Also, since the decay time constant of GCaMP6s is on the order of ~1 s, signals persisting for ~10 s cannot be explained by slow decay alone.

      (6) The authors write: "the history neurons exhibited properties of working memory." However, note that this is not a working memory task since the mice don't need to keep evidence in memory, the direction to lick can be made at the very beginning of a trial.

      Behaviorally, demonstrating that an animal maintains working memory requires showing that its behavior changes based on retained information when new information is introduced, as in delayed match-to-sample tasks. In the present task, however, the correct action for the next trial is determined at the moment the action in the previous trial is completed, such that animals can simply switch to motor preparation at that point. Thus, from a strictly behavioral perspective, working memory is not required.

      On the other hand, during the inter-trial interval (ITI), information from the previous trial dominates over information from the upcoming trial (Fig. 3H), which is more consistent with retention of past information than with motor preparation. Moreover, trials in which neural activity maintained information about the previous trial’s action were associated with a higher probability of correct performance in the subsequent trial. In other words, retaining past information contributes to guiding correct behavior in the next trial.

      Based on these neural analyses, we interpret that mice retain information about their previous trial’s action history in working memory and use it to determine behavior in the subsequent trial. Accordingly, we consider ITI activity in PPC to reflect working memory rather than motor preparation. Nevertheless, we acknowledge that your concern is valid, and we have therefore revised the text as follows:

      Line 234: “These results suggest that the history neurons exhibited properties of working memory.”

      (7) In the section about the Choice History Task, the authors write: "Since the visual stimuli were randomly presented during the sampling period, the mice had to ignore the visual stimuli." Why continue to present the visual stimuli?

      Thank you for the suggestion. By designing the vision task and the history task to have identical structures, we can apply the same encoding and decoding models to both tasks, which facilitates direct comparison between them. This design makes it easier to examine how neuronal activity patterns change depending on task demands.

      Reviewer #2 (Recommendations for the authors):

      (1) I don't understand the logic of Figure S7 and the neuropil analysis in general. Neuropil activity is purported to represent input, so it seems unsurprising that nearby neurons would exhibit similar dynamics.

      Thank you for your comment. Your argument is correct, and it is not at all surprising that neuropil signals correlate with the activity of surrounding neurons. Here, we quantitatively examined the relationship between neuropil activity and the average activity of nearby neurons. In addition, in a separate analysis, we clarified the relationship between connectome information and neuropil activity. Taken together, these analyses reveal the relationship between connectome information and the local average of neuronal activity. We describe this point as follows:

      “Indeed, the trial-to-trial variation of a neuropil activity could be approximated by the average of 1,000–10,000 neurons within several hundred micrometers from the center (Figure S7).”

      Although we analyzed this phenomenon in the cases of areas A and AM, this finding should not be considered specific to A and AM but instead has broader, general significance. Accordingly, we added a new Results subsection and revised the manuscript as follows.

      Line 448: “Constraints and limits of anatomical connectivity on neuronal population activity Although we have so far focused on the differences between A and AM, our data provide broader insights into the relationship between anatomical connectivity and neuronal population activity. First, based on Figure S7 and the considerations above, anatomical input correlations strongly constrain the correlations between local averages of activity across thousands of neurons. We then asked whether this anatomical constraint extends beyond mean activity, and how anatomical input correlations relate to relationships between neuronal population activities (population vectors).

      The correlation between CC<sub>t</sub> and r<sub>anatomy</sub> was moderate (r = 0.60, Figure 6L). This moderate correlation did not change when the coupling neurons were eliminated (r = 0.61). Interestingly, the largest canonical component was the most unpredictable from the anatomical data (Figure 6M). Thus, while inter-area correlations based on the mean activity of neuronal populations are largely determined by anatomical input correlations, correlations between population vectors contain additional structure that cannot be captured by anatomical input correlations alone.

      One possible source of this additional structure is globally shared activity, which may reflect behavior, brain state, or levels of neuromodulators. To evaluate the contribution of global activity on the canonical correlation between areas, we first compared the canonical coefficient vectors (CCV). We found that the first CCV had a similar orientation, regardless of the paired areas (Figure6N). This indicates that the largest components of correlated activity in the CCA analysis are globally shared fluctuations. We also directly evaluated the correlated activity components across all 8 areas with generalized canonical correlation analysis. The first CCV also had a similar orientation to the first generalized canonical coefficient vector (GCCV) (Figure 6O). These results indicate that the largest canonical component reflects a global correlation across all cortical areas imaged. Such global correlations may be driven by factors beyond cortico-cortical or thalamo-cortical inputs, such as the animal’s behavioral state as we recently characterized (H. Imamura et al., 2025; F. Imamura et al., 2025). We also confirmed the robustness of these results by repeating analyses using only the 40% highly active neurons after denoising with non-negative deconvolution (36828 out of 91397 neurons; Figure S9).”

      (2) Furthermore, the neuropil signal likely contains signals from out-of-focus neurons that are presumably functioning similarly to the in-focus cells. Wouldn't the interesting question be to what extent the local neuropil signal in, for example, area A resembled that of neuronal activity in S1t?

      Thank you very much for your comment. We agree with your point. Based on the evaluation in Figure S7, the neuropil signal likely contains the average activity of several thousand local neurons, including out-of-focus contributions. The neuropil signal in area A may also partially reflect neuronal activity from the neighboring S1t area. In particular, neurons that show little correlation with the local population average (i.e., the neuropil signal) within the same area are sometimes referred to as “soloists” (M. Okun et al., 2015). If such soloist neurons were found to exhibit strong correlations with the neuropil signal of an adjacent area, this would be a highly interesting result. However, such an analysis would go beyond the scope of the present manuscript and would require a new line of discussion; therefore, we plan to address this issue in future work.

      (3) I generally found the final Results section (Relationship between mesoscale functional correlation and anatomical connections) to be hard to follow. The motivation for this analysis should be better explained.

      We fully incorporated your suggestion and rewrote the final section of the Results accordingly. Please refer to our responses to the two comments above.

      (4) The question of brain state/neuromodulation as a driver of the globally shared activity may be addressable by considering its correlation with pupillometry data.

      We fully agree with your suggestion. In our experiments, visual stimuli change continuously, and thus pupil diameter changes are most likely driven primarily by changes in visual input. Although state-dependent fluctuations of brain activity may also be present, they are likely masked by the larger effects induced by visual stimulation. Therefore, analyzing pupil-linked signals as a factor of globally shared activity would be more appropriately addressed in experiments without visual stimulation. We plan to investigate this issue in future studies. Here, we have added the following description regarding pupil dynamics and their associated relationships.

      Line 292: “We found that the neurons related to the tail and forepaws were similarly distributed around the parietal cortex including S1 and A, while the pupil-size related neurons were mapped around visual areas (Figure 4C). Changes in pupil diameter may influence neuronal activity through multiple mechanisms, including behavioral state or noradrenergic level [REF], nonlinear interactions with visual stimulation, and changes in the amount of light reaching the retina.”

      Minor issues

      (1) The authors deploy sophisticated mathematical techniques with essentially no explanation outside the Methods section. A brief introduction of jPCA and CCA in the main text would help the reader understand the value of these analyses.

      Thank you for the comment. We added the following explanation.

      Line 238: “In this task, left and right selection are alternated, so the activity of the history neuron is a sequence that repeats in two consecutive trials. We used jPCA<sup>49</sup> to visualize and quantify this activity pattern (Figure 3K). jPCA identifies low-dimensional projections of population activity that maximize rotational dynamics across time.”

      Line 374: “Next, to investigate r<sub>t</sub> of the population activity (r<sub>t_population</sub>), we first reduced the dimension of population activity in each area into 10 by using PCA (principal component analysis) (Figure S6B,C). Then, “fluctuation activity” was recalculated for each dimension and trial type, analogous to the single-neuron analysis described above, but here representing noise in population-level activation patterns. We applied CCA (canonical correlation analysis) to each pair of areas and obtained an average of 10 canonical correlations (CC<sub>t</sub>) as r<sub>t_population</sub>. CCA identifies pairs of linear combinations of population activity from two areas that maximize their correlation across trials, thereby capturing shared population-level fluctuations. The CC<sub>t</sub> structure between areas was similar across task types (Figure 5H) indicating that this structure reflects the underlying functional connectivity independent of the task. The CC<sub>t</sub> between A and S1t was the largest among all the pairs (Figure 5H), whereas when the CC<sub>t</sub> was averaged across all connections for each area, A and AM had the largest and second largest C<sub>t</sub>, respectively (Figure 5I). The dominance in CC<sub>t</sub> in A and AM disappeared when the neurons with r<sub>t_single</sub> >0.3 were removed. Notably, the CC<sub>t</sub> of AM and the other areas was uniform regardless of the paired areas across all 10 canonical components (Figure 5J). Thus, area AM is an integration hub of interareal communication, whereas A simply coupled with S1t, and such correlation structure at the population level critically depends on this subset of neurons.”

      (2) The manuscript contains numerous typos ("hoice"), spelling errors ("parameters", "costom"), abbreviations that are not defined (ex: RL/rostrolateral), and minor grammatical issues that should be addressed by a round of copy editing.

      We thank the reviewer for pointing this out. We have thoroughly corrected these typographical and grammatical errors, and have described the revisions in detail in our response to Reviewer 1, comment (3). In addition, we have clarified the abbreviations in the manuscript as follows.

      Line 94: “rostrolateral area (RL)”

      Figure 1 legend: “Abbreviations: RL, rostrolateral HVA; PM, posteromedial HVA; RSC, retrosplenial cortex.“

      (3) Figure 3K unlabeled axes.

      Thank you for the comment. We have added the axis labels.

      (4) Figure 3K caption, first "(right)" should be "(left)".

      Thank you very much for your careful attention to detail. We have made the requested correction.

      (5) Figure 6 is hard to read. Panel A is too small, and the interpretation of G is difficult.

      - For panel A, we added an enlarged view with images from a larger number of trials in Figure S7A.

      - G represents the connectivity matrix. The sources correspond to the injection sites, and the targets correspond to voxels in the cerebral cortex. Because the latter may not be immediately clear, we explicitly indicated in the figure that the targets are cortical voxels.

      (6) Figure S4C has a double compass.

      Thank you for the comment. We have revised the manuscript accordingly.

      Reviewer #3 (Recommendations for the authors):

      While I have some questions and additional suggestions to further improve the clarity of the manuscript, I already found it to be highly interesting and well done in its current form.

      Major points:

      (1) The t-SNE comes up rather abruptly and is not well-explained in the main text or the figure caption. It would be good to provide some more information on the rationale of this analysis and how to interpret it. In particular, I don't see clear clusters in Figure 2H although the description of the authors seems to indicate that they observe clear functional classes such as choice, stimulus, and history neurons. Similarly, in Figure 3B, I don't see a clear separation between history and choice neurons in the t-SNE map. The example cells in Figure 3A appear to be delayed or long-tailed choice neurons rather than a dedicated group of 'history neurons'. It would be helpful for the interpretation of the t-SNE plots to show different PSTHs for different regions of the t-SNE map to better illustrate what different regions within the t-SNE projection represent and what distinguishes these cells.

      Thank you for the comment. The absence of clearly defined clusters in the t-SNE map suggests that neuronal activity forms a continuum rather than discrete classes. Importantly, the purpose of the t-SNE map here is not to identify sharp clusters, but to demonstrate that the functional categorization provided by our encoding model broadly and comprehensively spans the major structures present in the unsupervised t-SNE map. We have revised the relevant text in the manuscript accordingly as follows.

      Line 158: “To examine whether the neuron groups labeled by this model broadly capture the diversity of neuronal activity, we performed unsupervised clustering of neuronal activity using t-SNE. The functional labels revealed by this encoding model were consistent with the t-SNE clusters, indicating the validity of the encoding model (Figure 2H; Figure S4B; materials and methods).”

      The issue regarding History neurons was also raised in Reviewer #1’s comment (5). We provide an enlarged view of Figure 3A in Figure S3A. Each History neuron exhibits multiple calcium transients repeatedly and asynchronously following the previous reward acquisition. Therefore, rather than being “choice neurons with a long tail,” these neurons are better interpreted as neurons whose activity is sustained during this delay period.

      (2) Although the authors mention that neurons represent a mixture of features, they then use the encoding model to isolate clusters, such as vision or choice neurons. In general, the language throughout the manuscript suggests that there are various clusters of functionally segregated neurons (vision, choice, history, or coupling neurons). However, it is not clear to me to what extent this is supported by the data. Couldn't a choice neuron also be a vision neuron if both variables make significant contributions to the model? Similarly, are 'history' and 'choice' separate labels from the encoding model, or could a cell be given multiple labels? If a cell could be given multiple labels how did the authors create the colored plots on the right-hand side of Figures 2H and 3B? The example history cells in Figure 3J also appear to be highly selective for the contralateral choice, so again this seems to argue against a clear separation of choice and history neurons.

      Each label is assigned based on whether the corresponding coefficient is significant in the encoding model, and therefore neurons that are both vision- and choice-selective do exist. The presence of mixed selectivity neurons in PPC is well established (e.g., MJ Goard et al., 2016 elife). In this manuscript, however, we focus not on functional overlap at the single neuron level, but on the spatial distribution of functional classes, and thus do not explicitly address mixed selectivity. Although the colors in Figure 2H and Figure 3B overlap, the underlying data for each are presented separately in Figure S4B and S4D, respectively. As shown there, each color generally occupies distinct regions in the t-SNE map.

      (3) The decoding analysis in Figure 3F also suggests that a potential reason why there are more choice history signals in areas S1 and A is that neural activity is simply larger rather than due to the activity of a dedicated group of history neurons. Are the authors interpreting this differently? Could the duration of stored choice information also be affected by the dynamics of the calcium indicator?

      Thank you for the comment. Simply having larger neural activity in S1t or A would not result in calcium transients with a ~1-s time constant persisting throughout a delay period lasting up to 10 seconds. As also noted in comment (1), History neurons exhibit sustained and repeated calcium transients, and therefore their activity cannot be explained merely by elevated neural activity levels. One could argue that all cortical areas carry history-related information but that the signal-to-noise ratio is higher in S1t or A, which might make such signals more detectable there. If this were the case, however, differences across areas in all forms of selectivity should similarly depend on signal-to-noise ratio. This is not what we observe in our data.

      (4) I'm confused as to why the decoding accuracy is so high for areas A and S1t at time -3 relative to the choice in Figure 3F. Shouldn't this be the same as predicting the next choice in Figure 3H? Why is the decoding accuracy lower in this case?

      Thank you for the comment. The analysis shown in Figure 3F includes only trials in which the choice was correct. This is the reason why the decoding performance in Figure 3H is lower. We have added this clarification to the main text.

      Figure 3F: “Decoding accuracy of choice, outcome, and visual stimuli by the activity of 20 neurons from each area using only correct trials, before and after the choice onset, reward delivery, and the end of the visual stimuli, respectively. Line colors corresponded to the areas shown in panel G.”

      (5) In general, the text is not very detailed about the statistics. While test scores and p-values are mentioned, it would be good to also state what is actually compared and what the n is (e.g. how many neurons, neuron pairs, areas, sessions, or animals) for each case. How do the authors account for the nested experiment design where many neurons are coming from a low number of animals?

      Thank you for the comment. In our decoding analyses, we generally treat the number of animals as the independent variable. In contrast, for the encoding model analyses, we treat the number of neurons as the independent variable. As you correctly pointed out, because we recorded activity from a large number of neurons, statistical tests that treat individual neurons as independent samples can readily yield significant p-values even with a small number of animals. We have therefore confirmed that our conclusions are not driven by a large effect from a single animal. When making qualitative claims, we rely not only on statistical significance (p-values) but also require clear differences in effect size. We have added the following clarification to the Statistics section accordingly.

      Line 1049: ”For the decoding analyses, the number of animals was treated as the independent variable, whereas for the encoding model analyses, the number of neurons was treated as the independent variable. To ensure that the results were not driven by a single animal, we repeated the statistical tests while systematically excluding data from one animal at a time and confirmed that statistical significance was preserved in all cases. Furthermore, qualitative interpretations were made only when differences in effect size were clearly observed.”

      (6) How was the grouping in Figure 2O done? Specifically, how were the thresholds for the dashed lines selected to separate PM and V1 from AM and RL as association areas? It seems to me like this grouping was done rather arbitrarily as the difference in choice decoding accuracy is not particularly large between these areas.

      This line does not have a specific quantitative basis, but we consider it useful as an illustrative aid. We have added this clarification to the figure legend.

      Figure 2O: “Decoding accuracies of time in video presentation and choice direction indicate that AM would be the best position for associating these two signals. The background color and dashed lines are provided as visual aids for illustrative purposes.”

      (7) The fact that neurons with high rt_single tend to share the same function might also indicate the approach is insufficient to remove all effects of tuning to trial types from the neural data. Since the authors subtract the average of each trial type, the average trial-type related information is removed but type-specific variations that are not equally presented in the average might remain. For choice neurons for example, attentive vs in-attentive choices could be represented differently and thus remain in the data since the average would be a mixture of both. The same goes for other factors that would drive a particular modulation in the choice - or stimulus - related part of the trial which could still tie these neurons together. One way to circumvent this concern could be to first compute the mean activity for all time points in each trial and then compute the trial-to-trial variability across all trials of the same type. Alternatively, I would be curious how the results play out when using data when the animal is not actively performing the task to compute rt_single.

      Thank you for the comment. The concern raised by the reviewer applies to all noise-correlation analyses and highlights an important limitation of this approach, namely that factors other than the observed variables are treated as noise. By subtracting the trial-averaged activity, information related to sensory input and the direction of the first lick at choice can be removed. However, other factors cannot be eliminated if they are not observed. For example, if right hindlimb movements tend to occur only in trials with visual stimulation combined with left choice, such effects cannot be removed because they are not measured. The same issue remains even when restricting the analysis to a single trial type. Based on these considerations, we have added the following text to the manuscript.

      Line 932: “Correlation of trial-to-trial variance of activity between a pair of single neurons was defined as r<sub>t_single</sub>. To calculate r<sub>t_single</sub>, we averaged the activity of individual neurons over the sampling period, and the average across each trial type was subtracted from this value. The trial types consisted of four sets of pairs of stimuli and responses, that is, the video stimulation and left choice, the video stimulation and right choice, the black screen and left choice, and the black screen and right choice. By this operation, we extracted the fluctuating components of single-neuron activity that are independent of the trial types. Although the finding that neurons with high r<sub>t_single</sub> tend to share the functional properties we propose is not a trivial consequence of the analysis. At the same time, it remains possible that high r<sub>t_single</sub> reflects the degree to which neurons share unobserved features, and that such features are correlated with our functional classification. Thus, while this analysis suggests that correlated fluctuations across cortical areas may contribute to the determination of functional types, establishing an exclusive conclusion will require more fine-grained behavioral measurements, tighter control of internal states, and causal identification through targeted interventions.”

      Minor points:

      (1) Why did the authors use the activity of 50 neurons for the decoder analysis in Figure 2K? Didn't they have many more neurons available? How were these selected?

      We found that the conclusions were identical when using datasets consisting of either 50 neurons or 20 neurons across all analyses. Because the total number of recorded PM neurons did not reach 100 in at least one mouse, we standardized the analyses to 50 neurons in order to match the number of neurons across all cortical areas and animals.

      (2) The authors mention that some PPC neurons showed complex dynamics rather than encoding a specific feature such as visual or choice information but do not mention actual numbers on this point. It would be good to quantify to what extent neurons in different regions represent such mixed selectivity and whether there are clear differences in selectivity. This would also be interesting to discuss in context to earlier work on mixed selectivity in the parietal cortex, such as Raposo et al 2015.

      Thank you for the comment. Your point is entirely valid. However, as explained in our response to your major comment, our analyses focus not on how individual neurons are classified, but rather on the spatial distribution of these functional categories.

      (3) I have a hard time understanding what the length of the bars in the right panel of Figure 2k indicates. Does this plot show more than the decoder accuracy before and after the choice? Is the bar length related to the standard deviation? The same question for the visualization in panel 2n. It looks nice but I'm confused about what it shows exactly.

      These bars represent confidence intervals. Although this is stated at the end of the Figure 2 legend, we agree that it may not be sufficiently clear, and we have therefore added this information to the Statistics section.

      Line 1046: “In Figure 2K and N, and Figure 3G, L, M, and O, the bars indicate the 95% confidence intervals. All other bars denote s.e.m., unless otherwise noted.”

      (4) Is Figure 3D showing the same association index as in Figure 2j, thus showing the same result as in the vision task or is this meant to show something new? It was not clear to me from the wording, so it would be good to clarify.

      You are correct that the magenta trace in Fig. 3D is the same as in Fig. 2J. This panel was included to explicitly illustrate that, in areas A and AM, the separation between History and Association approximately overlaps. We have added the following clarification to the figure legend accordingly.

      Figure 3D: “The percentage of history neurons and the association index (as defined in Fig. 2J) were overlaid for comparison.”

      (5) When computing the Pseudo R2 for regressor contribution, how was the null model computed? From shuffling all regressors in the model? I think this is fine but it's not fully clear what the intended effect of this procedure is. For the description of Figure 4C it would be good to add a sentence explaining how to interpret the pseudo R^2.

      The null model predicts a fixed value that is independent of the explanatory variables, i.e., it predicts only the intercept. This provides a useful correction term when performing cross-validation, particularly in cases where baseline values differ across folds. In Figure 4C, the analysis shows the contribution of adding body part positions and pupil diameter to the model for predicting neural activity. We have added the following text to the Methods section.

      Line 881: “To estimate the contribution of parameters for the left forelimb, the right forelimb, the tail, and the pupil, we repeated the same analysis with a reduced model where each set of predictors was eliminated from the full model (Figure 4B). Then, the pseudo-R<sup>2</sup> was obtained for each set of predictors by (MSE<sub>reduced</sub>MSE<sub>full</sub>) /MSE<sub>null</sub>, where MSE is the mean squared error, MSE<sub>reduced</sub> is MSE for the reduced model, MSE<sub>full</sub> is the MSE of the full model, and MSE<sub>null</sub> is the null model. The null model predicts a fixed value that is independent of the explanatory variables; specifically, it simply outputs the mean of the training data. For example, we constructed a regression model without the parameters regarding the left forelimb (green shade of Figure 4B), obtained MSE<sub>reduced</sub> for the left forelimb, and the pseudo-R<sup>2</sup> was calculated as above by comparing the MSE of the full model and the null model. This value reflects the extent to which the position of the left forelimb contributes to the prediction of neuronal activity.”

      (6) It seems surprising that the pupil-size-related neurons were mapped around visual areas although the pupil should carry clear luminance information. Is this because the luminancerelated information in the pupil can also be explained by the stimulus variable in the model?

      Pupil size changed markedly before and after visual stimulus presentation (Figure S5C), dilating during the black stimulus and constricting during the video stimulus. This likely reflects changes relative to the luminance of the gray screen presented in the absence of visual stimuli. In our encoding model, visual stimuli are included as independent regressors for each corresponding time window. Therefore, pupil fluctuations that are temporally locked to visual stimulation are explained by these visual regressors. Neuronal activity that is better explained by pupil size changes not accounted for by the visual regressors is classified as pupil-related. At least three mechanisms may underlie the influence of pupil size on neuronal activity. First, fluctuations in pupil diameter have been linked to behavioral state or noradrenergic level [REF], which can act as variables independent of visual stimulation. Second, pupil fluctuations may be amplified in a stimulus-dependent manner, reflecting nonlinear interactions between visual input and brain state. Third, changes in pupil diameter alter the amount of light reaching the retina, which can modulate activity in visual cortical areas. The latter two mechanisms are therefore expected to predominantly affect visual areas and may explain why pupil-related neurons are more frequently observed there. The first mechanism is likely related to global brain state, and its association with behavior may account for the presence of pupil-related neurons in S1. However, these interpretations require confirmation through more refined causal manipulations. Accordingly, we limited the addition to the manuscript to the following statement.

      Line 292: “We found that the neurons related to the tail and forepaws were similarly distributed around the parietal cortex including S1 and A, while the pupil-size related neurons were mapped around visual areas (Figure 4C). Changes in pupil diameter may influence neuronal activity through multiple mechanisms, including behavioral state or noradrenergic level [REF], nonlinear interactions with visual stimulation, and changes in the amount of light reaching the retina.”

      (7) What is meant by 'external control parameters such as a video frame' when explaining the encoding model?

      Thank you for the comment. We added the following explanation.

      Line 151: “In the encoding model, the activity of each neuron was fitted by a weighted sum of external control parameters, such as video frames, and behavioral parameters, such as choice and reward direction. Because the visual stimulus changes continuously over time, sliding time windows were placed during the visual stimulus period.”

      (8) What does the trace in Figure 2G show? Is this a single-cell example? What are the axes here?

      We added an explanation to the figure legend.

      Figure 2G: “Schematic of our encoding model. The bottom right panel shows an example of single-neuron activity with an overlay of the fitting obtained by the encoding model.”

      (9) There seems to be a word missing in the sentence that describes the results for Figure 3O in the main text.

      Thank you for the comment. We added the following description related to Fig. 3O.

      Line 247: “resulting in the decoding accuracy of time after a specific choice being lower than in A (Figure 3O).”

      (10) The abbreviation RP is used when describing Figure S5A. It should be mentioned that this refers to the response period.

      Thank you for the comment. We added the following description related to Figure S5A.

      Line 283: “We found that the angle of the tail was significantly different from the baseline values several seconds after the response period (RP) (Figure S5A)”

      (11) I can't see the color difference between the traces in Figure 2E. There are probably red and green but this is hard to see for readers with red-green color blindness. Does the black indicate the time of visual stimulation? Is the line in Figure 2F the time when the spouts move in?

      Thank you for the comment. In Fig. 2E, we improved visibility by changing the line opacity. In addition, the vertical line in Fig. 2E indicates the onset of the visual stimulus, and the vertical line in Fig. 2F indicates the onset of the response period. We have added the following explanations to the figure legend.

      Figure 2: E. “Representative vision neurons (ROI 1-4 in I). The red bars indicate sampling periods during video presentation, and the brown bars indicate sampling periods without video stimulation. Vertical black lines mark the onset of the sampling period. F. Representative choice neuron (ROI 5-8 in I) and a non-selective neuron (ROI 9). Light blue lines indicate the response periods in trials with left choices, and purple lines indicate the response periods in trials with right choices. Vertical black lines mark the onset of the response period.”

      (12) It might be useful to provide a short explanation in the results or methods of why the harmonic mean was used for the computation of the association index. I think it makes sense but since it is not commonly used this could be helpful for the reader to understand the approach.

      Thank you for the comment. We added the following explanation to the main text.

      Line 869: “The association index was determined by the harmonic mean of the rates of vision neurons and choice neurons. The harmonic mean approaches the arithmetic mean when the two values are similar, but becomes closer to the smaller value when the two values differ substantially. Therefore, the association index takes a large value when both vision neurons and choice neurons are abundant.”

      (13) I don't fully understand how coupling diversity is computed. If there are six preference vectors, what is meant by taking the average of angles between all pairs of the two vectors?

      Which two are meant here?

      Thank you for the comment. We revised the explanation as follows.

      Line 950: “To quantify the diversity of coupling patterns across clusters, we computed the angle between every pair of preference vectors. We then averaged these pairwise angles and defined this quantity as the “coupling diversity.”

      (14) The results text states that the high correlation between r_anatomy and r_neuropil (Figure 6I) is evidence for the functional correlations being driven by cortico-cortical connectivity. However, Figure 6J shows that correlations for either cortico-cortical or thalamo-cortical connectivity are below 0.94 and generally higher for thalamo-cortical connectivity. This doesn't negate the general point of the authors but it would be good to clarify this section so it is easier to understand if r_anatomy includes both cortico-cortical and thalamo-cortical data and how the results in Figure I and J go together with the description in the results section.

      You are correct. We have revised the text to clarify that the analysis reflects the combined effects of both cortico-cortical and thalamo-cortical inputs.

      Line 436: “This correspondence suggests that the mesoscale interarea correlation is determined by the cortico-cortical and thalamo-cortical common input at mesoscale. Figure S8: A. Using Allen connectivity atlas, the axonal density of cortico-cortical and thalamo-cortical projection was analyzed.”

      (15) I'm not very familiar with canonical correlation analysis and found this part hard to follow. Some additional explainer sentences would be helpful here. For example, what does it mean to take the average of the top 10 canonical correlations as rt_population? What exactly are the canonical correlation vectors? It was also not clear to me what exactly the results in Figure 5J signify.

      Thank you for the comment. We have clarified the description in the main text related to CCA and the associated analyses as follows.

      Line 374: “Next, to investigate r<sub>t</sub> of the population activity (r<sub>t_population</sub>), we first reduced the dimension of population activity in each area into 10 by using PCA (principal component analysis) (Figure S6B,C). Then, “fluctuation activity” was recalculated for each dimension and trial type, analogous to the single-neuron analysis described above, but here representing noise in population-level activation patterns. We applied CCA (canonical correlation analysis) to each pair of areas and obtained an average of 10 canonical correlations (CC<sub>t</sub>) as r<sub>t_population</sub>. CCA identifies pairs of linear combinations of population activity from two areas that maximize their correlation across trials, thereby capturing shared population-level fluctuations. The CC<sub>t</sub> structure between areas was similar across task types (Figure 5H) indicating that this structure reflects the underlying functional connectivity independent of the task. The CC<sub>t</sub> between A and S1t was the largest among all the pairs (Figure 5H), whereas when the CC<sub>t</sub> was averaged across all connections for each area, A and AM had the largest and second largest CC<sub>t</sub>, respectively (Figure 5I). The dominance in CC<sub>t</sub> in A and AM disappeared when the neurons with r<sub>t,single</sub> >0.3 were removed. Notably, the CC<sub>t</sub> of AM and the other areas was uniform regardless of the paired areas across all 10 canonical components (Figure 5J). Thus, area AM is an integration hub of interareal communication, whereas A simply coupled with S1t, and such a correlation structure at the population level critically depends on this subset of neurons.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      In the manuscript, Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca2+ and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry. Overall, this is manuscript argues for an important mechanism of a 'rapid' cellular entry pathway of S.aureus that is dependent on lysosomal exocytosis and acid sphingomyelinase and links the intracellular fate of bacterium including phagosomal dynamics, cytosolic replication and host cell death to different modes of uptake.

      Key strength is the nature of the idea proposed, while continued reliance on inhibitor treatment combined with lack of phenotype for genetic knock out is a major weakness.

      We agree with the reviewer that a S. aureus invasion phenotype in ASM K.O. cells would unequivocally demonstrate the importance of ASM for the process. In the revised manuscript, we report an invasion phenotype in ASM K.O. cells. The absence of an invasion phenotype in ASM K.O. cells in our original experiments was likely caused by SM accumulation in ASM-depleted cells originating from FBS (see Figure 2I, in the revised manuscript).

      We thus cultured cells for up to three days in 2% FBS and then reduced the concentration to 1% FBS one day prior to experimentation. Under these conditions reduced S. aureus invasion in ASM K.O.s was observed when compared to wildtype cells.

      This was not detected when we cultured the cells in medium containing the common concentration of 10% FBS. Our new data supports the results we acquired with three different ASM inhibitors.

      The invasion defect in ASM K.O.s cultured in low FBS was more pronounced at 10 min p.i. when compared to the 30 minute time point (Figure 2K), further corroborating that the ASM-dependent invasion pathway is relevant early in infection. This is consistent with the invasion dynamics we observed upon interference with lysosomal Ca<sup>2+</sup> signaling [TPC1 K.O. (Figure 1C), BAPTA-AM (Figure 3D)], lysosomal exocytosis [Syt7 K.O. (Figure 2F), Ionomycin (Figure 3D)] and ASM activity by inhibitor treatment (Figure 3D).

      Originally, we had hypothesized that changes in the sphingolipidome induced by absence of ASM may have caused the lack of an S. aureus invasion phenotype. We thus compared the sphingolipidome of ASM K.O.s cultured in 1% and 10% FBS. Indeed, SM accumulation was less severe when we cultured the cells in 1% FBS (Figure 2M and Supp. Figure 3). Hence, we think that strong SM accumulations in ASM K.O. cells cultured in 10% FBS may facilitate ASM-independent invasion mechanisms and thus, the absence of ASM-dependent invasion could not be detected by analyzing the number of invaded bacteria. This is supported by experiments, where we treated ASM K.O.s with the ASM inhibitor ARC39, which only slightly affected S. aureus invasion, whereas we detected a strong reduction of internalized bacteria by ARC39 treatment of WT cells (Figure 2 J). We think that this experiment and the reduced invasion in ASM K.O.s rule out an ASM/SM-independent effect of the inhibitors.

      - While the authors argue a role for undetectable nano-scale Cer platforms on the cell surface caused by ASM activity, results do not rule out a SM independent role in the cellular uptake phenotype of ASM inhibitors.

      We agree with reviewer that we do not show formation of ceramide-enriched platforms, and we thus changed the manuscript accordingly (see below).

      - The authors have attempted to address many of the points raised in the previous revision. While the new data presented provide partial evidence, the reliance on chemical inhibitors and lack of clear results directly documenting release of lysosomal Ca2+, or single bacterial tracking, or clear distinction between ASM dependent and independent processes dampen the enthusiasm.

      We shared the reviewer’s desire to discriminate between ASM-dependent and ASM-independent processes, but we are limited by cell biology and the simultaneous occurrence of processes - here the uptake of bacteria by multiple pathways.

      However, we were able to address ASM-dependency of our rapid uptake mechanism by observing a genetic phenotype in SMPD1 knockout-cells.

      We here do not make any assumptions on the centrality of the pathway and its importance in vivo. As scientists we were interested in the fact that such an ASM dependent pathway existed. In different as of yet still unidentified cell lines such a pathway may pose the main entry point for bacteria. Or maybe it represent an ASM-dependent mode of receptor uptake which we have identified with the bacteria piggy-backing into the cells.

      - I acknowledge the author's argument of different ASM inhibitors showing similar phenotypes across different assays as pointing to a role for ASM, but the lack of phenotype in ASM KO cells is concerning. The author's argument that altered lipid composition in ASM KO cells could be overcoming the ASM-mediated infection effects by other ASM-independent mechanisms is speculative, as they acknowledge, and moderates the importance of ASM-dependent pathway. The SM accumulation in ASM KO cells does not distinguish between localized alterations within the cells. If this pathway can be compensated, how central is it likely to be?

      We are convinced that our new genetic evidence of an S. aureus invasion phenotype in ASM K.O.s will eliminate the reviewer’s concerns about the role of ASM during the bacterial invasion.

      The new lipidomics data of ASM K.O.s cultured in 1% and 10% FBS (Figure 2, M, Supp. Figure 3) and inhibitor-treated WT cells (Figure 2L, Supp. Figure 3) show a correlation between SM accumulation and the invasion phenotype.

      We agree with the reviewer, however, that the reason why changes in sphingolipidome increase ASM-independent S. aureus internalization by host cells remains elusive. One possible explanation is a dysfunction of the lipid raft-associated protein caveolin-1 upon strong SM accumulation, which was previously shown to appear in ASM-deficient cells (1, 2). A lack of caveolin-1 results in strongly increased host cell entry of S. aureus (3, 4). Characterization of the mechanism behind these observations requires further experimentation and is beyond the scope of the current manuscript.

      Host cells possess mechanisms to prevent infections, while pathogens developed strategies to circumvent these defense processes. In the present scenario, a physiological membrane composition of the host cell represents such a pathogen defense mechanism (as shown e.g. for caveolin-1 that restricts invasion of S. aureus in healthy cells). If a defense mechanism is disabled (as we speculate it is the case upon strong SM accumulation in ASM K.O.s cultured in 10%FBS), infection is facilitated. In healthy WT cells, these mechanisms (e.g. caveolin-1) are functional and, hence, we would not expect a “compensation” of ASM-dependent invasion. We here analyze invasion events that cannot be prevented by host defense mechanisms as they occur in untreated WT cells and are absent upon interfering with the ASM-dependent invasion pathway (by inhibitors and genetic K.O.). Thus, we think the ASM-dependent pathway, which mediates 50-70% of bacteria internalized by healthy WT cells 10 min p.i., is central for the infection.

      - The authors allude to lower phagosomal escape rate in ASM KO cells compared to inhibitor treatment, which appears to contradict the notion of uptake and intracellular trafficking phenotype being tightly linked. As they point out, these results might be hard to interpret.

      We measured phagosomal escape of S. aureus JE2 in ASM K.O. cells cultured in 1% FBS. Again, we infected cells for 10 or 30 min and determined the escape rates 3h p.i. However, the results are similar to escape rates determined with 10% FBS (Author response image 1).

      Escape rates of S. aureus were significantly decreased in absence of ASM regardless of the FBS concentration in the medium. We therefore think that prolonged absence of ASM has other side effects. For instance, certain endocytic pathways could be up- or down-regulated to adapt for the absence of ASM or could be affected by other changes in the lipidome (that can be minimized but not completely prevented by culturing cells in 1% FBS). This could, for instance, affect maturation of S. aureus-containing phagosomes and hence phagosomal escape.

      Author response image 1.

      As it is unclear how prolonged absence of ASM can affect cellular processes, we think other experiments investigating the role of ASM-dependent invasion for phagosomal escape are more reliable. Most importantly, bacteria that enter host cell early during infection (and thus, predominantly via the “rapid” ASM-dependent pathway) possess lower phagosomal escape rates than bacteria that entered host cells later during infection (Figure 5, D and E). This is confirmed by higher escapes rates upon blocking ASM-dependent invasion with Vacuolin-1 (Figure 4E) and three different ASM inhibitors (Figure 4C and D). We further demonstrate that sphingomyelin on the plasma membrane during invasion influences phagosomal escape, while sphingomyelin levels in the phagosomal membrane did not change phagosomal escape (Figure5 a and b). This is summarized in Figure 5F.

      - Could an inducible KD system recapitulate (some of) the phenotype of inhibitor treatment ? If S. aureus does not escape phagosome in macrophages, could it provide a system to potentially decouple the uptake and intracellular trafficking effects by ASM (or its inhibitor treatment)?

      Inducible knock-downs in our laboratory are based on the vector pLVTHM in cells co-expressing the repressor TetR fused to a KRAB domain. It needs to be stated that for optimal knock-downs the induction has to be performed by doxycycline supplementation in the medium for 7 days thus leading to several days of growth of the cells, which will allow the cells to adapt their lipid metabolism thus reflecting a situation that we encounter for the K.O.s.

      ASM-dependent uptake of S. aureus in macrophages has been demonstrated before (5). However, the course of infection in macrophages differs from non-professional phagocytes (6). E.g. in macrophages, S. aureus replicates within phagosomes, whereas in non-professional phagocytes replicates in the host cytosol. Absence of ASM therefore may influence the intracellular infection of macrophages with S. aureus in a distinct manner.

      - The role of ASM on cell surface remains unclear. The hypothesis proposed by the authors that the localized generation of Cer on the surface by released ASM leads to generation of Cer-enriched platforms could be plausible, but is not backed by data, technical challenges to visualize these platforms notwithstanding. These results do not rule out possible SM independent effects of ASM on the cell surface, if indeed the role of ASM is confirmed by controlled genetic depletion studies.

      We agree with the reviewer that we do not show generation of ceramide-enriched platforms. We thus changed Figure 6F in the revised manuscript to make clear that it remains elusive whether ceramide-enriched platforms are formed. We also added a sentence to the discussion (line 615) to emphasize that the existence of these microdomains is still debated in lipid research.

      We think that the following observations support SM-dependent effects of ASM during S. aureus invasion:

      (i) reduced invasion upon removing SM from the plasma membrane (Figure 2N, Supp. Figure 2M)

      (ii) increased invasion in TPC1 and Syt7 K.O. (Figure 2, P) in presence of exogenously added SMase.

      However, we agree with the reviewer that we do not directly demonstrate ASM-mediated SM cleavage during S. aureus invasion. Hence, we added a sentence to the discussion that mentions a possible SM-independent role of ASM for invasion (line 556) that reads:

      “Since it remains elusive to which extent ASM processes SM on the plasma membrane during S. aureus invasion, one may speculate that ASM could also have functions other than SM metabolization during host cell entry of the pathogen. However, we did not detect a direct interaction between S. aureus and ASM in an S. aureus-host interactome screen (7).”

      - The reviewer acknowledges technical challenges in directly visualizing lysosomal Ca2+ using the methods outlined. Genetically encoded lysosomal Ca2+ sensor such as Gcamp3-ML1 might provide better ways to directly visualize this during inhibitor treatment, or S. aureus infection.

      We thank the reviewer for this suggestion. We included the following section in our discussion (line 593):

      “Since fluorescent calcium reporters allow to monitor this process microscopically (8, 9) ,future experiments may visualize this process in more detail and contribute to our understanding of the underlying signaling. mechanisms.”

      References

      (1) J. Rappaport, C. Garnacho, S. Muro, Clathrin-mediated endocytosis is impaired in type A-B Niemann-Pick disease model cells and can be restored by ICAM-1-mediated enzyme replacement. Mol Pharm 11, 2887-2895 (2014).

      (2) J. Rappaport, R. L. Manthe, C. Garnacho, S. Muro, Altered Clathrin-Independent Endocytosis in Type A Niemann-Pick Disease Cells and Rescue by ICAM-1-Targeted Enzyme Delivery. Mol Pharm 12, 1366-1376 (2015).

      (3) C. Hoffmann et al., Caveolin limits membrane microdomain mobility and integrin-mediated uptake of fibronectin-binding pathogens. J Cell Sci 123, 4280-4291 (2010).

      (4) L.-P. Tricou et al., Staphylococcus aureus can use an alternative pathway to be internalized by osteoblasts in absence of β1 integrins. Scientific Reports 14, 28643 (2024).

      (5) C. Li et al., Regulation of Staphylococcus aureus Infection of Macrophages by CD44, Reactive Oxygen Species, and Acid Sphingomyelinase. Antioxid Redox Signal 28, 916-934 (2018).

      (6) A. Moldovan, M. J. Fraunholz, In or out: Phagosomal escape of Staphylococcus aureus. Cell Microbiol 21, e12997 (2019).

      (7) M. Rühling, F. Schmelz, A. Kempf, K. Paprotka, J. Fraunholz Martin, Identification of the Staphylococcus aureus endothelial cell surface interactome by proximity labeling. mBio 0, e03654-03624 (2025).

      (8) D. Shen et al., Lipid storage disorders block lysosomal trafficking by inhibiting a TRP channel and lysosomal calcium release. Nat Commun 3, 731 (2012).

      (9) L. C. Davis, A. J. Morgan, A. Galione, NAADP-regulated two-pore channels drive phagocytosis through endo-lysosomal Ca(2+) nanodomains, calcineurin and dynamin. EMBO J 39, e104058 (2020).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The authors report the structure of the human CTF18-RFC complex bound to PCNA. Similar structures (and more) have been reported by the O'Donnell and Li labs. This study should add to our understanding of CTF18-RFC in DNA replication and clamp loaders in general. However, there are numerous major issues that I recommend the authors fix. 

      Strengths: 

      The structures reported are strong and useful for comparison with other clamp loader structures that have been reported lately. 

      Comments on revisions: 

      The revised manuscript is greatly improved. The comparison with hRFC and the addition of direct PCNA loading data from the Hedglin group are particular highlights. I think this is a strong addition to the literature.

      We thank the reviewer for their positive comments.  

      I only have minor comments on the revised manuscript. 

      (1) The clamp loading kinetic data in Figure 6 would be more easily interpreted if the three graphs all had the same x axes, and if addition of RFC was t=0 rather than t=60 sec.

      We now analyze and plot EFRET as a function of time after complex addition, effectively setting the loader addition to t = 0 for each trace (Figure 6 and Figs S10-14 in the new manuscript). Baseline (Ymin) and plateau (Ymax) EFRET values were obtained by averaging the stable signal regions immediately before and after clamp-loader addition, respectively. Traces are normalized to their own dynamic range before fitting.

      (2) The author's statement that "CTF18-RFC displayed a slightly faster rate than RFC" seems to me a bit misleading, even though this is technically correct. The two loaders have indistinguishable rate constants for the fast phase, and RFC is a bit slower than CTF18-RFC in the slow phase. However, the data also show that RFC is overall more efficient than CTF18-RFC at loading PCNA because much more flux through the fast phase (rel amplitudes 0.73 vs 0.36). Because the slow phase represents such a reduced fraction of loading events, the slight reduction in rate constant for the slow phase doesn't impact RFC's overall loading. And because the majority of loading events are in the fast phase, RFC has a faster halftime than CTF18-RFC. (Is it known what the different phases correspond to? If it is known, it might be interesting to discuss.)

      We removed the quoted statement. We avoid comparing amplitude partitions (A₁/A_T) for CTF18-RFC because (i) a substantial fraction of the reaction occurs within the <7 s dead time, and (ii) single- vs double-exponential identifiability differs across complexes. Instead, we report model-minimal progress times: RFC t<sub>0.5</sub> ≤ 7 s (faster onset), CTF18-RFC ~ 8 s, CTF18<sup>Δ165–194</sup>-RFC ~ 12 s; completion (t<sub>0.95</sub>): RFC ≈ 77 s, CTF18-RFC ≈ 77 s, mutant ≈ 145 s. This shows RFC has the steeper onset, while CTF18-RFC catches up in completion, and the mutant is slower overall. We briefly note that RFC’s phases have been assigned in prior stopped-flow work and are consistent with a rapid entry step and a slower repositioning/complex release phase; we do not assign phases for CTF18-RFC here and instead rely on model-minimal timing comparisons to avoid over-interpretation. 

      (3) AAA+ is an acronym for "ATPases Associated with diverse cellular Activities" rather than "Adenosine Triphosphatase Associated". 

      Corrected to ATPases Associated with diverse cellular Activities (AAA+).

      Reviewer #2 (Public review): 

      Summary 

      Briola and co-authors have performed a structural analysis of the human CTF18 clamp loader bound to PCNA. The authors purified the complexes and formed a complex in solution. They used cryo-EM to determine the structure to high resolution. The complex assumed an auto-inhibited conformation, where DNA binding is blocked, which is of regulatory importance and suggests that additional factors could be required to support PCNA loading on DNA. The authors carefully analysed the structure and compared it to RFC and related structures. 

      Strength & Weakness 

      Their overall analysis is of high quality, and they identified, among other things, a humanspecific beta-hairpin in Ctf18 that flexible tethers Ctf18 to Rfc2-5. Indeed, deletion of the beta-hairpin resulted in reduced complex stability and a reduction in a primer extension assay with Pol ε. Moreover, the authors identify that the Ctf18 ATP-binding domain assumes a more flexible organisation. 

      The data are discussed accurately and relevantly, which provides an important framework for rationalising the results. 

      All in all, this is a high-quality manuscript that identifies a key intermediate in CTF18-dependent clamp loading. 

      Comments on revisions: 

      The authors have done a nice job with the revision. 

      We thank the reviewer for their very positive comments.

      Reviewer #3 (Public review): 

      Summary: 

      CTF18-RFC is an alternative eukaryotic PCNA sliding clamp loader which is thought to specialize in loading PCNA on the leading strand. Eukaryotic clamp loaders (RFC complexes) have an interchangeable large subunit which is responsible for their specialized functions. The authors show that the CTF18 large subunit has several features responsible for its weaker PCNA loading activity, and that the resulting weakened stability of the complex is compensated by a novel beta hairpin backside hook. The authors show this hook is required for the optimal stability and activity of the complex. 

      Relevance: 

      The structural findings are important for understanding RFC enzymology and novel ways that the widespread class of AAA ATPases can be adapted to specialized functions. A better understanding of CTF18-RFC function will also provide clarity into aspects of DNA replication, cohesion establishment and the DNA damage response. 

      Strengths: 

      The cryo-EM structures are of high quality enabling accurate modelling of the complex and providing a strong basis for analyzing differences and similarities with other RFC complexes. 

      Weaknesses: 

      The manuscript would have benefited from a more detailed biochemical analysis using mutagenesis and assays to tease apart the differences with the canonical RFC complex. Analysis of the FRET assay could be improved. 

      Overall appraisal: 

      Overall, the work presented here is solid and important. The data is mostly sufficient to support the stated conclusions.

      We thank the reviewer for their mainly positive assessment. Following this reviewer suggestion, we have re-analysed the FRET assay data and amended the manuscript accordingly.

      Comments on revisions: 

      While the authors addressed my previous specific concerns, they have now added a new experiment which raises new concerns. 

      The FRET clamp loading experiments (Fig. 6) appear to be overfitted so that the fitted values are unlikely to be robust and it is difficult to know what they mean, and this is not explained in this manuscript. Specifically, the contribution of two exponentials is floated in each experiment. By eye, CTF18-RFC looks much slower than RFC1-RFC (as also shown previously in the literature) but the kinetic constants and text suggest it is faster. This is because the contribution of the fast exponential is substantially decreased, and the rate constants then compensate for this. There is a similar change in contribution of the slow and fast rates between WT CTF18 and the variant (where the data curves look the same) and this has been balanced out by a change in the rate constants, which is then interpreted as a defect. I doubt the data are strong enough to confidently fit all these co-dependent parameters, especially for CTF18, where a fast initial phase is not visible. I would recommend either removing this figure or doing a more careful and thorough analysis. 

      We appreciate the reviewer’s concern regarding potential overfitting of the kinetic data in Figure 6. To address this, we performed a model-minimal re-analysis designed specifically to avoid parameter covariance and over-interpretation (Figure 6 and Figs S11-14 in the new manuscript). Only data recorded after the instrument’s <7 s dead time were included in the fits, thereby excluding the partially obscured early region of the reaction. For each clamp loader complex, we selected the minimal kinetic model that produced residuals randomly distributed about zero. This approach yielded a single-exponential fit for CTF18-RFC, whereas RFC and CTF18<sup>Δ165–194</sup>-RFC required double-exponential fits; single-exponential models for the latter two complexes left structured residuals, clearly indicating the presence of an additional kinetic phase.

      Rather than relying on co-dependent amplitude and rate parameters, we quantified the reactions by reporting progress times (t<sub>0.5</sub>, t<sub>0.90</sub>, t<sub>0.95</sub>), which provide a model-independent measure of reaction speed. This directly addresses the reviewer’s concern and allows a fair comparison of the relative kinetics among the complexes.

      From this analysis, RFC exhibited the fastest onset (t<sub>0.5</sub> ≤ 7 s; lower bound), while CTF18RFC and CTF18<sup>Δ165–194</sup>-RFC showed progressively slower half-times of approximately 8 s and 12 s, respectively. Completion times further emphasized these differences: both RFC and CTF18-RFC reached 95 % completion at ~77 s, whereas the mutant required ~145 s. Despite these kinetic distinctions, CTF18-RFC and its β-hairpin deletion mutant achieved similar EFRET plateaus, indicating that the mutation slows reaction progression but does not reduce the overall extent of PCNA loading.

      Finally, we emphasize that our interpretation is deliberately conservative. We do not assign distinct kinetic phases to CTF18-RFC, as their molecular basis remains unresolved. RFC’s phases have been characterized in prior stopped-flow studies, but CTF18-RFC likely follows a distinct or simplified pathway. Our conclusions are thus limited to what the data unambiguously support: deletion of the Ctf18 β-hairpin decreases the rate—but not the extent—of PCNA loading, consistent with the reduced stimulation of Pol ε primer extension observed under single-turnover conditions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      General assessment of the work:

      In this manuscript, Mohr and Kelly show that the C1 component of the human VEP is correlated with binary choices in a contrast discrimination task, even when the stimulus is kept constant and confounding variables are considered in the analysis. They interpret this as evidence for the role V1 plays during perceptual decision formation. Choice-related signals in single sensory cells are enlightening because they speak to the spatial (and temporal) scale of the brain computations underlying perceptual decision-making. However, similar signals in aggregate measures of neural activity offer a less direct window and thus less insight into these computations. For example, although I am not a VEP specialist, it seems doubtful that the measurements are exclusively picking up (an unbiased selection of) V1 spikes. Moreover, although this is not widely known, there is in fact a long history to this line of work. In 1972, Campbell and Kulikowski ("The Visual Evoked Potential as a function of contrast of a grating pattern" - Journal of Physiology) already showed a similar effect in a contrast detection task (this finding inspired the original Choice Probability analyses in the monkey physiology studies conducted in the early 1990's). Finally, it is not clear to me that there is an interesting alternative hypothesis that is somehow ruled out by these results. Should we really consider that simple visual signals such as spatial contrast are *not* mediated by V1? This seems to fly in the face of well-established anatomy and function of visual circuits. Or should we be open to the idea that VEP measurements are almost completely divorced from task-relevant neural signals? Why would this be an interesting technique then? In sum, while this work reports results in line with several single-cell and VEP studies and perhaps is technically superior in its domain, I find it hard to see how these findings would meaningfully impact our thinking about the neural and computational basis of spatial contrast discrimination.

      We agree that single cell measurements allow for a spatially more detailed analysis, but they are not feasible in humans. Assuming we value insights into the relationship between neural activity and decision making in the human as well as non-human brain, we are restricted to non-invasive measurements such as EEG, which inevitably showcase the neural underpinnings of decision making at a coarser level of analysis. This was the challenge we met with our paradigm design. For example, we chose contrast as the task-relevant stimulus feature in this study because monotonic contrast response functions exist for sensory neurons throughout the visual system, and the aggregated measures that we could attain with EEG would reflect that contrast-sensitivity and hence provide a window onto the encoding of the main decision-relevant quantity. We were specifically interested in initial afferent, contrast-dependent V1 activity reflected in the C1 component (80-90 ms). As we point out in the Introduction, the C1 is unusual among EEG signals in the extent to which it is dominated by a single visual area, V1 (Jeffreys & Axford, 1972; Clark et al., 1994; Di Russo et al., 2002; Ales et al., 2010; Mohr et al., 2024), and even if other downstream areas also make a minor contribution in the C1 time period, it still represents a very low-level sensory response early in the sensory analysis pipeline, appropriate for addressing our primary question of whether such a low-level signal is used in the formation of perceptual decisions. The alternative hypothesis, that early responses are passed over in decision readout, relates to a fundamental debate about whether early sensory responses are separated from cognition. The possibility that late, but not early, representations are correlated with choices does not imply that the later sensory representations are divorced from the earlier ones, only that there is a noise component that is not shared between the two, such as that produced by the ensuing computations that generate the later representations. Instead, a lack of choice probability in early representations would imply that decision readout is selective in where it sources sensory evidence from, with some possible reasons being to maintain high quality standards for sensory evidence or to impose a layer of separation between cognition and sensation.

      As the reviewer points out, the animal literature is highly mixed on the topic of choice probability in V1. Even for orientation discrimination tasks where V1 is ostensibly highly suited given the existence of orientation columns in V1, and even when measurements are taken from V1 neurons with good neurometric performance and/or aggregated across a V1 population (Jasper et al 2019), some studies have reported little to no V1 choice probability. If our alternative hypothesis of no EEG-indexed V1 choice probability flies in the face of well-established anatomy and function of visual circuits, then so also do these empirical findings in the animal neurophysiology literature. 

      Although there are important aspects of choice probability that are accessible in single cell studies but not in EEG (e.g. noise correlations, details of circuit physiology), our EEG measurements tap into the same phenomenon, just at a different level of analysis, i.e. the neural population level. At this level, we have been able to address whether the full body of sensory responses at a particular stage of visual analysis is systematically related to perceptual decision outcomes. Very similar questions are in fact sometimes addressed in the animal neurophysiology literature; for example, Kang and Maunsell (2020) aggregated single-cell choice probability measurements within visual areas to investigate whether choice probability strength at the level of an entire visual area was sensitive to task demands. The global vantage point of EEG comes with the additional benefit of picking up signatures of other potentially mediating processes such as attention and being able to control for them in our analysis. Our human study thus provides a valuable complementary viewpoint alongside animal neurophysiology work in this area.

      Summary of substantive concerns:

      (1) The study of choice probability in V1 cells is more extensive than portrayed in the paper's introduction. In recent years, choice-related activity in V1 has also been studied by Nienborg & Cumming (2014), Goris et al (2017), Jasper et al (2019), Lange et al (2023), and Boundy-Singer et al (2025). These studies paint a complex picture (a mixture of positive, absent, and negative results), but should be mentioned in the paper's introduction.

      We thank the reviewer for highlighting these papers bearing on choice-related activity in V1, only two of which we had cited. The three additional studies do indeed lend further support to our description of the complex picture around V1-CP effects in the literature and we have now included them.

      (2) The very first study to conduct an analysis of stimulus-conditioned neural activity during a perceptual decision-making task was, in fact, a VEP study: Campbell and Kulikowski (1972). This study never gained the fame it perhaps deserves. But it would be appropriate to weave it into the introduction and motivation of this paper.

      We are aware of this paper, and indeed we ourselves have shown steady-state VEP (SSVEP) correlations with timing and selection of decision reports (O'Connell et al 2012; Grogan et al 2023), but SSVEPs do not provide an index of initial afferent V1 activity in the way that the C1 of the transient VEP does. SSVEPs are evoked by a rapid sequence of stimulus onsets, so that activity cannot be attributed to a particular stimulus onset nor its bottom-up latency resolved, and, being a response to an ongoing stimulus, it combines top-down and bottom-up influences from striate and extra striate areas (Di Russo et al 2007). Indeed, in Campbell and Kulikowski (1972) the SSVEP was almost entirely eliminated when the stimulus was undetected. This is in keeping with robust modulations of the SSVEP by spatial attention (Muller and Hillyard 2000). Cognitive influences of this magnitude are never observed in the C1, and in fact are often not observed at all even when later VEP components show robust modulations (Luck et al 2000), which motivated a recent meta-analysis to address the issue (Qin et al 2022). This highlights the important distinction between the earliest transient VEP activity reflecting mainly the initial afferent response in V1, and steady-state sensory activity reflecting a mix of bottom-up and top-down influences across visual cortex. Because of the importance of this distinction, we have added a reference to the above SSVEP papers to the 3rd paragraph of the introduction along with a statement about the distinction.

      (3) What are interesting alternative hypotheses to be considered here? I don't understand the (somewhat implicit) suggestion here that contrast representations late in the system can somehow be divorced from early representations. If they were, they would not be correlated with stimulus contrast.

      This same conundrum applies to single-cell studies of choice probability. Do studies showing choice probability in V4 but not V1 for example demonstrate that V4 is divorced from V1? In such studies, measurements are typically taken from large representative samples of neurons from both areas with good neurometric performance in both cases and the task often (though not always) involves a target stimulus feature that is encoded in V1 such as orientation. Why then should V4 but not V1 show choice probability when we know the vast majority of input to the visual cortex passes through V1? It must be that feature representation and choice formation are different things with one not inferring the other. This is true for an EEG study as much as it is for a single-cell study.

      The alternative hypothesis in our study is that the early sensory responses indexed by the C1 are not directly used in the formation of the perceptual decision at hand. As outlined in our comments above, this does not imply that those early responses are divorced from later responses. Of course, both are correlated with stimulus contrast and so would correlate with each other across changing contrast but this does not necessitate that their noise is correlated when contrast is held constant because new instantiations of noise can be generated by the computations performed at each stage of visual processing. Thus, the interesting alternative hypothesis is that information contained in the sensory representation generated during initial afferent V1 activity is not used directly to form decisions, and instead, decisions are read out from the outputs of computations performed further downstream. Such an outcome, if it had arisen in our data, would have been consistent with a separation between cognition and early visual processing. Instead, our results suggest a certain level of cognitive interfacing at the lowest and earliest cortical levels of visual processing. We have now added text to the Introduction to highlight the distinction between sensory representation and decision readout in order to make the alternative hypothesis clearer.

      (4) I find the arguments about the timing of the VEP signals somewhat complex and not very compelling, to be honest. It might help if you added a simulation of a process model that illustrated the temporal flow of the neural computations involved in the task. When are sensory signals manifested in V1 activity informing the decision-making process, in your view? And how is your measure of neural activity related to this latent variable? Can you show in a simulation that the combination of this process and linking hypothesis gives rise to inverted U-shaped relationships, as is the case for your data?

      We thank the reviewer for this suggestion of a simulation, which we carried out using the Matlab code. We have also included new Figure 1-Figure Supplement 1 in the revised manuscript.

      In our view, sensory signals in V1 are informing the decision-making process in this task from at least as early as the initial afferent response. The main point about C1 latency in relation to the response-time contingency of the choice probability effect is that the more time that elapses without a decision made (and therefore the more additional sensory processing that contributes to the decision), the more diluted is the contribution of the C1 to the decision by contributions from later representations, and thus choice probability reduces. Likewise, when response times are too quick for C1 evidence to contribute, choice probability is also absent, hence the inverted-U-shaped curve. Moreover, if the C1-choice correlation is mediated by a top-down factor such as attention rather than readout, the inverted-U-shaped curve is not expected because in such a case the relative timing of the C1 and choice commitment would not be relevant.

      Reviewer #2 (Public review):

      Summary:

      Mohr and Kelly report a high-density EEG study in healthy human volunteers in which they test whether correlations between neural activity in the primary visual cortex and choice behavior can be measured non-invasively. Participants performed a contrast discrimination task on large arrays of Gabor gratings presented in the upper left and lower right quadrants of the visual field. The results indicate that single-trial amplitudes of C1, the earliest cortical component of the visual evoked potential in humans, predict forced-choice behavior over and beyond other behavioral and electrophysiological choice-related signals. These results constitute an important advance for our understanding of the nature and flexibility of early visual processing.

      Strengths:

      (1) The findings suggest a previously unsuspected role for aggregate early visual cortex activity in shaping behavioral choices.

      (2) The authors extend well-established methods for assessing covariation between neural signals and behavioral output to non-invasive EEG recordings.

      (3) The effects of initial afferent information in the primary visual cortex on choice behavior are carefully assessed by accounting for a wide range of potential behavioral and electrophysiological confounds.

      (4) Caveats and limitations are transparently addressed and discussed.

      We would like to thank the reviewer for these positive remarks.

      Weaknesses:

      (1) It is not clear whether integration of contrast information across relatively large arrays is a good test case for decision-related information in C1. The authors raise this issue in the Discussion, and I agree that it is all the more striking that they do find C1 choice probability. Nevertheless, I think the choice of task and stimuli should be explained in more detail.

      We thank the reviewer for raising this point about the large stimulus arrays. As we said in our Discussion, it would seem that aggregation across a large stimulus region would be better suited to a downstream visual area with larger receptive fields, yet our setting of a strict deadline would put the emphasis back on earlier sensory representations. We now elaborate on this matter in the discussion, to say that although the small receptive fields and short, slow horizontal connections in V1 mean that the aggregation necessary for performing the task is unlikely to happen within V1 during the C1 timeframe, the aggregation would be readily achieved simply by convergence of the outputs of all relevant V1 neurons for a given stimulus array on the same decision process. In this sense, the design of our paradigm was such that the globally-measured C1 component on the scalp reflected the same aggregated evidence input as the summed V1 readout that we suppose would be entering the decision process.  

      We have also added further rationale in the Methods section on the practical benefits of the stimulus design, as the reviewer anticipates in their subsequent point, of yielding robust C1 signals. This concern was paramount in the design of this study because we expected the C1 difference metric that was of interest to be very small. We also needed a robust C1 to be measured in both the upper and lower visual field in as many individuals as possible and, in our experience, this is true less often when using smaller stimuli, even with a pre-mapping procedure.

      It also helped to homogenize C1 topography across individuals and ensure that topographies from the upper and lower visual field had sufficient overlap that there were electrodes with strong loading from both topographies where the C1 difference as a function of which array was brighter would be maximal.

      We have updated the methods section to provide these rationales while we describe the stimulus design.

      (2) In a similar vein, while C1 has canonical topographical properties at the grand-average level, these may differ substantially depending on individual anatomy (which the authors did not assess). This means that task-relevant information will be represented to different degrees in individuals' single-trial data. My guess is that this confound was mitigated precisely by choosing relatively extended stimulus arrays. But given the authors' impressive track record on C1 mapping and modeling, I was surprised that the underlying rationale is only roughly outlined. For example, given the topographies shown and the electrode selection procedure employed, I assume that the differences between upper and lower targets are mainly driven by stimulus arms on the main diagonal. Did the authors run pilot experiments with more restricted stimulus arrays? I do not mean to imply that such additional information needs to be detailed in the main article, but it would be worth mentioning.

      We thank the reviewer for their thoughtful consideration of this issue about individual variability in C1 retinotopy. Indeed, as the reviewer anticipated we expected the large stimulus coverage to mitigate this issue and we think that our response to the point above and the changes we made to the manuscript in response address this point also. Although we did not show this in the manuscript, we did in fact find that C1 topography was much more similar across individuals than it has been in previous C1 experiments we have carried out with smaller stimuli.

      However, we acknowledge the reviewer’s point that the signal measured at a specific electrode likely has a variable loading strength from the various gratings in the stimulus array and that the gratings of maximal loading may indeed vary from subject to subject. Such inter-subject variability cannot confound the choice probability effects because the latter are measured within-subject. Nevertheless, it could be a source of noise. We believe the impact of this is unlikely to be substantial for the following reasons:

      i) We designed the spatial spread of contrasts in such a way as to encourage participants to aggregate across the full array. In essence, to match the property of the C1 as an aggregate measure of V1 activity, we designed a task that involved aggregating across stimulus elements. Therefore, the decision weighting applied to any particular grating should be representative of the weighting applied to all gratings and, as such, the specific gratings that contribute most to the C1 signal for a particular participant should be relatively inconsequential.

      ii) By avoiding the horizontal and vertical meridians we avoided the regions of space where the shifts in C1 topography are largest.

      (3) Also, the stimulus arrangement disregards known differences in conduction velocity between the upper and lower visual fields. While no such differences are evident from the maximal-electrode averages shown in Figure 1B, it is difficult to assess this issue without single-stimulus VEPs and/or a dedicated latency analysis. The authors touch upon this issue when discussing potential pre-C1 signals emanating from the magnocellular pathway.

      Indeed, there are important differences in V1 properties between the upper and lower visual fields, visual acuity being another example in addition to conduction velocity as the reviewer points out. However, these differences appeared to be quite minimal in this case (Figure 1B does in fact include a single-stimulus VEP – the “1-stim” entry in the legend). Perhaps this is also due to the large stimulus array which may include a range of conduction velocities within it and thereby blur overall differences between the upper and lower visual field. The variability of contrast within each array was also quite high (+/-20% from the midpoint), which would have further increased within-array conduction velocity variability and blurred differences between arrays.

      Our staircasing procedure may have also helped in this regard to some extent as it included a bias parameter between the arrays to account for any behavioural response biases. Although the small contrast changes it usually incurred are likely much too small to change conduction velocities, it corrected for any effect on behaviour they may have.

      (4) I suspect that most of these issues are at least partly related to a lack of clarity regarding levels of description: the authors often refer to 'information' contained in C1 or, apparently interchangeably, to 'visual representations' before, during, or following C1. However, if I understand correctly, the signal predicting (or predicted by) behavioral choice is much cruder than what an RSA-primed readership may expect, and also cruder than the other choice-predictive signals entered as control variables: namely, a univariate difference score on single-trial data integrated over a 10 ms window determined on the basis of grand-averaged data. I think it is worth clarifying and emphasizing the nature of this signal as the difference of aggregate contrast responses that *can* only be read out at higher levels of the visual system due to the limited extent of horizontal connectivity in V1. I do not think that this diminishes the importance of the findings - if anything, it makes them more remarkable.

      This is true that a univariate measure may stick out in a field increasingly favouring multivariate analyses with the spread of machine learning, and so we have added a short qualifier in the methods section where we describe the C1 measurement to explicitly state that it is a scalar variable. What we have done in using this univariate measure is leverage the rich prior knowledge about V1 anatomy and neurophysiology, rather than trust in data-driven classifiers; interestingly, we found that such a classifier trained on all electrodes discriminates choices less well than our informed univariate measure during the C1 time-frame. 

      We also thank the reviewer for raising an interesting point about the nature of aggregation and readout in the context of our stimulus. We agree that it is not feasible that V1 activity would be aggregated locally in V1 across such large regions of space prior to being readout within the C1 time period. As we say above, the aggregation may instead be carried out through convergent transmission of the parallel, spatially-local V1 information to the decision process.

      (5) Arguably even more remarkable is the finding that C1 amplitudes themselves appear to be influenced by choice history. The authors address this issue in the Discussion; however, I'm afraid I could not follow their argument regarding preparatory (and differential?) weighting of read-outs across the visual hierarchy. I believe this point is worth developing further, as it bears on the issue of whether C1 modulations are present and ecologically relevant when looking (before and) beyond stimulus-locked averages.

      We thank the reviewer for their positive appraisal of this additional finding, which we also found remarkable. We agree that our description of our interpretation was too brief and lacked clarity. We have reworded it and expressed it in terms of the speed accuracy trade-off, with the new explanation given below. However, it is important to remember that this account is speculative and serves only to explain the response-time contingency of the bias. That the bias was present and constitutes a modulation of the C1 does not rest on this argument:

      […] “to explain the RT contingency for the C1 bias, we speculate that the speed-accuracy trade-off could fluctuate from trial to trial and that the corresponding decision bound fluctuations (Heitz and Schall 2012) could be implemented by pre-determining decision weights across visual areas. For example, to achieve faster decisions, the sensory evidence requirement could be reduced by placing greater emphasis on initial afferent V1 evidence. In such a case, the RT contingency of the above choice history bias could be explained if the C1 bias is exerted in proportion with the planned emphasis of C1 evidence for the upcoming decision.”

      Recommendations to the Authors:

      Reviewer #2 (Recommendations for the authors):

      (1) As someone whose first language is not English, I am somewhat hesitant to bring this up, but I found the use of 'readout' as both noun and verb somewhat confusing. I thought read-out was defined as 'that which is read out'.

      We agree that this dual use of the word readout may cause confusion. To avoid this, we have edited the manuscript to replace verbal forms of the word “readout” with “read out”.

      (2) I found it difficult to follow the reasoning for why intermediate RTs should be the ones most affected by C1-related information. Perhaps this could be described in more detail for the uninitiated reader.

      We appreciate that our reasoning for why intermediate RTs should be the ones most affected by C1-related information was difficult to follow. We have now added a simulation to showcase this rationale more clearly - see response to reviewer 1, and new figure supplement to figure 1. 

      (3) It would be interesting to compare the effect sizes observed here to those seen in single-cell studies and to discuss this comparison with regard to differences in the nature of EEG signals and single-cell firing rates.

      While we agree that such a comparison would be interesting if feasible, it would have to be for the same task settings, which have not been used in a single-cell study, and  the very different nature and extent of noise between the two recording modalities would make such a comparison difficult to interpret, e.g. background noise in EEG from ongoing processes unrelated to the task. 

      (4) Figure 1: It may be worth mentioning in the legend that only parts of the peripheral stimulus grid are shown for better visibility, as the Methods speak of 9 x 9 grids. Also, in panel B, it should be mentioned that waveshapes are calculated using individually selected maximal-difference electrodes.

      We thank the reviewer for spotting these. We have updated the caption for this figure to reflect these two observations.

      (5) Figure 4: The different shades of green may be difficult to distinguish when printed.

      Although this may be true, we chose shades of green that differ in luminance so they should still be distinguishable. Different colours may in fact be less distinguishable if they had the same luminance and the print was black-and-white. We chose different shades of the same colour to reflect the fact that we were plotting the same signals at different difficulty levels. In our opinion, this takes precedence since eLife is an online journal so the majority of readers will likely read it digitally.

      (6) Methods/Task: While the ITI of 780 ms is substantial, I was wondering why the authors decided against jittering this interval? It would be helpful to briefly discuss whether contrast adaptation for slow periodic stimulation may have affected the findings.

      We opted against jittering the ITI to avoid an additional source of inter-trial variability. While this may allow for adaptation effects of this source, this would be approximately constant across trials and therefore less of a concern for our design. We have added text to the methods section to state this rationale.

      (7) Methods/Stimuli: The authors convincingly argue that focusing on single arms of the stimuli is an unlikely strategy, but did they ask for participants' strategies during debriefing?

      We are glad that the reviewer found our argument about whether or not participants may have focused on a single arm of the stimuli convincing. We did not ask participants about their strategies but even with such a debriefing, there would still remain a possibility that a participant may have used that strategy but were unaware that they were doing so. In any case, if participants were doing this it would have dampened the strength of our choice probability result. 

      (8) Methods/Procedure, Difficulty Titration: Why did the authors opt for manually adapting the difficulty level in a separate session rather than constantly and automatically titrating difficulty?

      We did this because calculating choice probability requires a comparison of trials with different choice outcomes but the same stimulus so continuously staircasing difficulty level during the experiment would have created a confound. Although this could have been corrected for in our regression, this would have entailed greater noise that we could avoid by staircasing in advance.

    1. Author response:

      General Statements

      We thank the reviewers for their thoughtful and constructive comments, which will substantially improve our manuscript. In response, we will revise the text and figures throughout to address the points raised. Specifically, we will:

      i. Refine our definition of Inactivation/Stability Centers (I/SCs): We will limit this designation to loci where both Allelic Expression Imbalance (AEI) and Variable Epigenetic Replication Timing (VERT) are detected, either in the present study or in previously published work.

      ii. Expand methodological clarity: We will provide detailed descriptions of how VERT regions were identified, annotated, and quantified, including thresholds for allelic imbalance, replication timing variability, and sampling depth. We also justify the ≥80% AEI cutoff, which is based on recent studies showing that modest allelic biases can have biological and clinical significance.

      iii. Enhanced benchmarking and validation: In addition to the analysis of X inactivation in female ACP cells, we will include comparisons between imprinted and non-imprinted regions to benchmark the magnitude of allelic replication timing imbalance, demonstrating that the magnitude of imbalance observed at imprinted loci is comparable to that at the non-imprinted VERT regions.

      iv. Address tissue specificity and sampling limitations: We will discuss the limited number of clones, tissues, and individuals analyzed, emphasizing that while our data identify robust AEI and VERT patterns, additional tissues and individuals will be required to capture the full diversity of I/SC regulation.

      v. Clarify biological relevance: We will expand our discussion to highlight the consistency of AEI findings across cell types, including examples of genes implicated in neurodevelopmental and neurodegenerative disorders, and we will clarify our model of how I/SC regulation may contribute to haploinsufficiency, variable expressivity, and incomplete penetrance in human disease.

      vi. Improved figures and supplemental data: We will update figure legends for clarity, add a new supplementary figure comparing imprinted and non-imprinted regions, and cross-reference all supplemental tables.

      We believe these revisions strengthen the manuscript conceptually and experimentally, and we thank the reviewers and editors for their valuable feedback.

      Description of the planned revisions

      Reviewer #1:

      The existence of VERT regions is well supported, but the number of regions called as ISCs may be inflated by permissive thresholds (e.g., AEI {greater than or equal to} 0.8 or {less than or equal to} 0.2 in a single clone). This risks conflating transient stochastic differences with stable ISCs.

      We selected the >80% (or <20%) allelic imbalance threshold, along with the requirement of at least one biallelic clone, as our criterion for significant AEI. This choice was guided by a recent study demonstrating that allelic imbalance as low as a 65%/35% is enough to effect disease penetrance in humans (Nature 2025; 637:1186–1197). For completeness, results obtained using more stringent thresholds (>90% and >95% imbalance) are presented in Supplementary Table 2.

      Furthermore, it is unlikely that transient stochastic differences in allelic expression, such as those detected by single-cell RNA sequencing assays (Nat. Rev. Genet. 2015; 16:653–664), would be captured by our approach. Each clone in our study was expanded from a single cell to over one million cells before both RNA-seq and Repli-seq analysis, effectively averaging out transient transcriptional and/or replication fluctuations, and thus reflecting stable, mitotically heritable epigenetic states.

      More robust approaches would include using magnitude of imbalance, annotating VERTs by genomic location, applying stricter thresholds for replication timing, and benchmarking AEI distributions against the X chromosome.

      All VERT regions identified in this study were annotated according to both the magnitude of allelic imbalance and their genomic coordinates, using 250 kb windows for the human samples and 50 kb windows for the mouse samples (see Supplementary Tables 1 and 6). Figure 1c directly compares the magnitude of imbalance, defined as outliers in the standard deviation, for both allelic replication timing and allelic expression across autosomal and X-linked loci in female ACP cells.

      In addition, we will benchmark the magnitude of replication timing imbalance using autosomal imprinted regions as a second internal control. We detected allelic replication imbalance at 13 known imprinted loci, and the standard deviation of replication timing at these loci, measured in 250 kb windows, is comparable to that observed across the >350 VERT regions detected at non-imprinted sites. To illustrate this comparison, we will include a supplementary figure directly comparing imprinted and non-imprinted regions.

      Figures and text would benefit from improved clarity: axis labels are missing in places (e.g., Fig. 1c, Fig. 2g), legends should explain chromosome arm colors, and cluttered figures such as Fig. 1j could be re-visualized for interpretability.

      Figure labels will be added to Figs. 1c and 2g, and legends will be modified for clarity.

      “the claim of cell-type specificity is not convincingly demonstrated given the small sample size (n=4) and strong batch confounding between lymphoblastoid and cartilage progenitors.” And “Hierarchical clustering is confounded by batch and based on presence/absence calls that lack quantitative resolution.”

      We agree that the limited number of individuals and clones, as well as the comparison between only two distinct tissue types (LCLs and ACPs), have quantitative limitations. Our primary intent was to evaluate whether any I/SCs were shared between independently derived clonal datasets and to determine whether there is evidence of tissue-specific I/SC usage, rather than to make quantitative claims about global cell-type specificity.

      To address this concern, we will replace the hierarchical clustering analysis currently shown in Figure 1i with a Venn diagram that more directly illustrates the overlap and tissue-specific distribution of VERT regions detected in the different clonal sets. This revised representation avoids assumptions about clustering relationships and removes batch-driven bias, while still conveying the key observation that many VERT regions are shared across tissues and others appear tissue-restricted.

      While syntenic VERT regions across mouse and human are intriguing, they complicate interpretation of strong clustering by cell type. Sampling depth may also have exaggerated allelic imbalance calls.

      We note that the human LCLs used in our study are B cells, and immunoglobulin gene rearrangements were used to confirm the clonal uniqueness of each line. Similarly, the mouse replication timing data analyzed here was generated from pre-B cells, which also undergo immunoglobulin gene rearrangement. Thus, both the human LCL and mouse pre-B cell datasets were derived from B-cell lineages, providing a consistent cellular context for comparative analysis.

      Sequencing depth is an important consideration for all variant base calls. Without fully haplotype-resolved genomes, previous studies relied on calculating per-SNP calls of allelic imbalance based on reads covering a single nucleotide locus. To improve sequencing depth supporting the identification of VERT and AEI regions, we utilized fully haplotype-resolved genomes that allowed all informative allele-specific reads to be pooled across all heterozygous SNPs within genomic windows or expressed genes. For AEI, we set a minimum threshold of 20 informative allele-specific reads per gene, a minimum FDR-corrected p-value of <=0.05, and a minimum of 80% vs 20% allelic imbalance. Importantly, a recent study has shown that allelic imbalance as low as a 65%/35% is enough to effect disease penetrance in humans (Nature 2025; 637:1186–1197). We reiterate that more stringent thresholds (>90% and >95% imbalance) are presented in Supplementary Table 2.

      Gene set enrichment analysis should be restricted to avoid inflated significance from overly broad categories.

      Reviewer #2:

      Some of the GO terms presented are too broad to suggest any biological significance to the result, even if there is statistical significance (for example, the top term for LCL clones 'Cytoplasm' is associated with 12,000 genes, and the second term for mouse clones 'Membrane' is associated with 10,000). It would be helpful to focus on GO terms lower in the GO hierarchy.

      We will include our complete Gene Ontology analysis, with more specific biological categories, in Supplemental Table 5.

      Allelic imbalance has been referred to as AI, MAE (monoallelic expression), RMAE (random monoallelic expression) etc. The paper whose mouse data the authors make use of uses Asynchronous Stochastic Replication Timing (ASRT) instead of VERT to refer to the same phenomenon. Creating unnecessary jargon makes the paper more difficult to read and adds needless complexity to an already complex field.

      While we agree that allelic expression imbalance has been described by different investigators using many different phrases, we believe that MAE, RMAE and AI do not represent an accurate description of the phenomenon. In our study [and our previous study; Nat Commun. 2022; 13(1):6301] we used clonal analysis of allele-specific expression and found that while some clones display equivalent levels of expression between alleles of a given gene (i.e. bi-allelic expression) other clones express only one allele (i.e. mono-allelic expression), and yet other clones have undetectable expression (i.e. silent on both alleles). This pattern of allele-restricted expression indicates that each allele independently adopts either an expressed or silent state. Importantly, because these expression states are mitotically stable, allele-autonomous, and independent of parental origin, we refer to the choice of the expressed allele as stochastic. Given this variability, we believe that the phrase “Allelic Expression Imbalance” (AEI) represents a more accurate descriptor for this phenomenon. We also point out that “Allelic Expression Imbalance” has been used >120 times in the Pubmed database.

      In addition, the replication asynchrony that exists at these loci is not consistent with purely ASynchronous Replication Timing (ASRT) between alleles. We found that each allele can independently adopt either earlier or later replication timing in different clones. This variability results in some clones exhibiting pronounced asynchrony between alleles, while in others, the two alleles replicate synchronously, with both adopting either the earlier or later timing state. As reported in our previous study (Nat. Commun. 2022; 13:6301), this behavior reflects a stochastic and allele-autonomous process, leading us to describe these loci as exhibiting Variable Epigenetic Replication Timing (VERT), which we believe is a more accurate descriptor of this phenomenon.

      The point that allelic imbalance is enriched in VERTs would be enhanced if the authors could present the allelic ratio for all genes found in all VERTs, demonstrating how replication timing on either chromosome affects the allelic ratio.

      The stochastic nature of allelic expression and replication timing observed at VERT loci indicates that each allele independently acquires its epigenetic state. Specifically, the expressed or silent status of one allele does not predict the replication timing or expression status of the opposite allele. Accordingly, the Early/Late pattern of replication timing that we detect, both in this study and in our previous work (Nat. Commun. 2022; 13:6301), is not correlated with which allele is transcriptionally active. This supports our conclusion that asynchronous replication timing is not a downstream consequence of monoallelic transcription, but rather an independent epigenetic feature of I/SCs. Regardless, we will provide the combined expression ratios for all transcripts that are located within the VERT regions in a Supplemental Table.

      In addition, our analysis of imprinted loci reveals that even at genomic regions with parent-of-origin–specific expression, replication timing does not align with allelic activity: both early- and late-replicating alleles can be transcriptionally active, depending on the gene. This observation is consistent with the complex organization of many imprinted domains, where genes on opposite alleles exhibit reciprocal expression patterns. To illustrate this point, we will include a new supplemental figure demonstrating that imprinted loci harbor genes expressed from both the earlier- and later-replicating alleles.

      Figure 3 highlights the association of related gene clusters with VERTs but the VERTs are assigned based on variable replication timing in just 1 or 2 clones. This is an interesting observation, but to make the point that "VERT regions frequently coincide with gene clusters in the human genome" there needs to be a systematic assessment of replication timing at all gene clusters across all clones, and a statistical test for significance.

      Our intent in Figure 3 was not to suggest that all gene clusters are subject to VERT and AEI, but rather to highlight that several well-characterized multigene families that are known to exhibit random AEI, such as olfactory receptor and HLA gene clusters, coincide with VERT regions at their genomic locations. These examples serve as representative illustrations demonstrating that I/SC-associated regulation occurs at established AEI loci organized in gene clusters.

      To clarify this point, we will revise the text to explicitly state that Figure 3 presents illustrative examples of known AEI-associated gene clusters overlapping with VERT regions, rather than a comprehensive or statistically exhaustive analysis of all gene clusters across the genome.

      It is an interesting hypothesis that VERTs are conserved between species at synentic loci. If such regions are really conserved, one would expect that replication timing at these sites would be consistently asynchronous. However the data presented shows that in human clones these VERTs can be specific to an individual donor (as in 5A) or an individual clone (as in 5H).

      As discussed in our Limitations section, our analysis was restricted to a limited number of cell types, clones, and individuals, which may not capture the full diversity of I/SC usage across tissues and populations. While our dataset was sufficient to identify robust patterns of AEI and VERT, it likely represents only a subset of the broader landscape of I/SC regulation in both humans and mice. We anticipate that future studies incorporating a wider range of tissues, individuals, and clonal analyses will uncover an even greater degree of conservation and diversity in I/SC usage across genomes.

      In order to support the claim that neurodevelopmental disease associated genes reside in asynchronously replicating regions, and are thus more prone to allelic imbalance, the authors would need to demonstrate this phenomenon in neuronal cells.

      We make two points that address this critique: First, many of the neurodevelopmental disease genes located within or adjacent to VERT regions are not exclusively expressed in neuronal cells and have already been shown to exhibit AEI in non-neuronal contexts. For example, Gimelbrant and Chess (Science, 2007; 318:1136–1140) demonstrated AEI of the Parkinson disease genes SNCA and LRRK2 in lymphoblastoid cell lines (LCLs), and in our previous study, we detected AEI of DNAJC6, another Parkinson disease gene, in LCL cells (Nat. Commun. 2022; 13:6301). In the present study that used ACP cells, we identified VERT and AEI of several epilepsy-associated genes, including SCN1A, SCN2A (Fig. 6b), GABRA1(Fig. 6e), and SAMD12 (Fig. 6j), as well as a gene implicated in autism and neurodevelopmental disorders, SEMA5A (Fig. 5c).

      Second, independent studies from the E. Heard laboratory have provided further evidence that AEI occurs in neuronal lineages. Using mouse neural progenitor cells (NPCs), they identified genes subject to AEI (Dev. Cell, 2014; 28:366–380) and they later evaluated AEI of syntenic human neurodevelopmental disease genes, including Snca, App, Eya4, and Grik2 (Nat. Commun. 2021; 12:5330). In addition, they used the phrase “Allelic Expression Imbalance” to describe the epigenetic expression biases at these genes.

      Together, these findings reinforce that AEI, and by extension I/SC regulation, is not restricted to specific cell types, but rather represents a generalizable mechanism of stochastic epigenetic regulation that includes genes relevant to neurodevelopment and disease.

      However, the authors consistently lean on thin evidence (i.e. a single clone) within a modestly sized dataset (4 clones from 2 donors each) to propose a new model for haploinsufficiency in human disease. The consistent focus on limited elements in the data and perhaps an overreach in the interpretation makes it difficult to appreciate what is in fact a very good experiment.

      We agree that our analysis was conducted on a modest number of clones and individuals, which we explicitly acknowledge as a limitation of the present study. However, several key points support the robustness and broader relevance of our conclusions:

      i. Clonal Design and Replication: The strength of our approach lies in its clonal resolution. Each clone represents a single-cell–derived population expanded to over a million cells, enabling direct detection of stable, mitotically heritable allele-specific epigenetic states that would not be apparent in population-averaged data. Importantly, many of the VERT regions we identified are shared between independent clones from different donors and across distinct cell types (ACP and LCL), demonstrating reproducibility and biological consistency.

      ii. Cross-Species Validation: We further identified syntenic VERT regions in mouse pre-B cell clones, including at loci known to exhibit AEI in prior studies, providing independent validation and evolutionary conservation of the phenomenon.

      iii. Integration with Published Evidence: Our findings extend prior observations of AEI and variable replication timing (e.g. Gimelbrant et al. Science 2007; Heskett et al. Nat. Commun. 2022) and are fully consistent with known stochastic allelic expression imbalance of autosomal genes. We also draw parallels with the absence of cellular selection mechanisms that dictate dominant inheritance patterns for loss of function alleles for X linked disease genes (reviewed in: J Clin Invest, 2008, 20-23; and Nat Rev Genet. 2025, 26, 571–580). Our proposed model linking I/SC regulation to haploinsufficiency is therefore a synthesis of our results with an extensive body of published data, not an inference drawn from isolated observations.

      iv. Scope and Framing: We will revise the manuscript to clarify that our proposed model represents a mechanistic framework, not a definitive or exclusive explanation, for how stochastic allelic regulation could contribute to dosage-sensitive disease phenotypes. We will also explicitly discuss the need for larger datasets and additional tissues to refine and test this model.

      In summary, while we recognize the limited sampling inherent to clonal analyses, the consistency of our observations across donors, cell types, and species, together with prior corroborating studies, supports the validity of the conclusions and justifies the broader conceptual implications.

      Description of analyses that authors prefer not to carry out

      Reviewer #1:

      Cell-type specificity and mitotic stability both require stronger evidence; the latter is inferred indirectly from clonal expansion rather than shown directly, and orthogonal experiments (e.g., allele-specific ChIP-seq, DNA methylation) would be required.

      We disagree with this reviewer that the mitotic stability of the epigenetic states are “inferred indirectly from clonal expansion rather than shown directly”. Our experimental design inherently captures mitotically stable, allele-specific states because each clonal line is derived from a single progenitor cell and expanded to millions of cells before analysis. The allele-specific replication timing and expression profiles observed in these clones therefore reflect epigenetic states that are stably inherited across many cell divisions, rather than transient or stochastic fluctuations. This approach was also validated in our previous study (Nat. Commun. 2022; 13:6301), where the same clonal strategy demonstrated stable allele-restricted replication and expression patterns over extended passages.

      We agree that orthogonal assays such as allele-specific ChIP-seq or DNA methylation analyses would provide additional mechanistic detail on the nature of I/SC-associated regulation. However, these experiments fall outside the scope of the present study, which was designed specifically to identify and map autosomal loci that exhibit coordinated AEI and VERT, the defining epigenetic features of I/SCs. While we fully acknowledge that defining the precise molecular marks (e.g., histone modifications, DNA methylation, chromatin accessibility) that underlie I/SC regulation will be an important future direction, our current data provide a genome-wide, allele-resolved foundation upon which such mechanistic studies can build.

      In summary, the current dataset achieves the central goal of defining the genomic distribution and conservation of I/SCs based on functional readouts of replication timing and expression. Future work will extend these findings using allele-specific epigenomic profiling to characterize the epigenetic modifications associated with I/SC stability and cell-type specificity.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Kolb and Hasseman et al. introduces a significantly improved GABA sensor, building on the pioneering work of the Janelia team. Given GABA's role as the main inhibitory neurotransmitter and the historical lack of effective optical tools for real-time in vivo GABA dynamics, this development is particularly impactful. The new sensor boasts an enhanced signal-to-noise ratio (SNR) and appropriate kinetics for detecting GABA dynamics in both in vitro and in vivo settings. The study is well-presented, with convincing and high-quality data, making this tool a valuable asset for future research into GABAergic signaling.

      Strengths:

      The core strength of this work lies in its significant advancement of GABA sensing technology. The authors have successfully developed a sensor with higher SNR and suitable kinetics, enabling the detection of GABA dynamics both in vitro and in vivo.

      This addresses a critical gap in neuroscience research, offering a much-needed optical tool for understanding the most important inhibitory neurotransmitter. The clear representation of the work and the convincing, high-quality data further bolster the manuscript's strengths, indicating the sensor's reliability and potential utility. We anticipate this tool will be invaluable for further investigation of GABAergic signaling.

      Weaknesses:

      Despite the notable progress, a key limitation is that the current generation of GABA sensors, including the one presented here, still exhibits inferior performance compared to state-of-the-art glutamate sensors. While this work is a substantial leap forward, it highlights that further improvements in GABA sensors would still be highly beneficial for the field to match the capabilities seen with glutamate sensors.

      We thank Reviewer 1 for the positive assessment. We agree that further improvements in GABA sensor performance remain desirable. We acknowledge this limitation and outline directions for future development in the Discussion paragraph beginning "There are several promising avenues that could be taken to further optimize iGABASnFR."

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents the development and characterization of iGABASnFR2, a genetically encoded GABA sensor with markedly improved performance over its predecessor, iGABASnFR1. The study is comprehensive and methodologically rigorous, integrating high-throughput mutagenesis, functional screening, structural analysis, biophysical characterization, and in vivo validation. iGABASnFR2 represents a significant advancement in GABA sensor engineering and application in imaging GABA transmission in slice and in vivo. This is a timely and technically strong contribution to the molecular toolkit for neuroscience.

      Strengths:

      The authors apply a well-established sensor optimization pipeline and iterative engineering strategy from single-site to combinatorial mutants to engineer iGABASnFR2. The development of both positive and negative going variants (iGABASnFR2 and iGABASnFR2n) offers experimental flexibility. The structure and interpretation of the key mutations provide insights into the working mechanism of the sensor, which also suggest optimization strategies. Although individual improvements in intrinsic properties are incremental, their combined effect yields clear functional gains, enabling detection of direction-selective GABA release in the retina and volume-transmitted GABA signaling in somatosensory cortex, which were challenging or missed using iGABASnFR1.

      Weaknesses:

      With minor revisions and clarifications, especially regarding membrane trafficking, this manuscript will be a valuable resource for probing inhibitory transmission.

      We thank Reviewer 2 for the positive assessment. Regarding membrane trafficking, we appreciate the suggestion to test different trafficking motifs. While such optimization represents a valuable direction for future development, it was beyond the scope of the present study and not feasible with the available time and resources. A different imaging modality would be needed to assess membrane trafficking efficiency or membrane-restricted expression, as the images presented in the manuscript (Figure 2a) are wide-field epifluorescence images, which lack the axial resolution required to distinguish membrane-localized signal from cytosolic fluorescence.

      We expect that the current characterization of iGABASnFR2 will nevertheless provide a strong foundation for future efforts to optimize membrane targeting and expression using alternative trafficking strategies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) We noted an interesting inconsistency in the response of iGABASnFR1 and iGABASnFR2 when expressed as purified protein versus in mammalian cells. Such discrepancies are not uncommon for proteins exhibiting different behaviors in E. coli versus mammalian expression systems. We appreciate the authors' diligent effort in performing screening within a neuronal context. Similarly, the stark difference between the absolute affinity in purified form (∼0.778 μM) and on-cell measurements (6.4 μM) warrants further discussion. The authors may consider commenting on these observations in the discussion section.

      We have revised the Discussion (lines 401-410 in the ‘Tracked Changes’ document) to address the discrepancy between measurements obtained with purified protein and those from expression on the neuronal surface. As noted by the reviewer, such discrepancies are common, and our revision is intended to convey our empirical experience with this phenomenon rather than to offer a definitive mechanistic explanation.

      One factor to appreciate is that, when on the surface of neurons, the sensor is tethered to the membrane by an additional 60 amino acids. In addition to altering the local chemical environment, membrane tethering could impose entropic or mechanical constraints on the sensor. These constraints may damp conformational motions that underlie ligand binding and fluorescence changes. Beyond this, the local environment experienced by a membrane-anchored sensor differs substantially from that of soluble protein. There are potential electrostatic and steric effects arising from the plasma membrane and extracellular matrix, as well as post-translational modifications associated with mammalian expression. These effects on sensor performance are not readily predictable in either magnitude or direction, as illustrated by iGluSnFR, which exhibits a higher apparent affinity when membrane-tethered than in soluble form (Aggarwal et al 2023). For these reasons, we place greater emphasis on neuronal measurements as the most informative indicator of in vivo sensor performance.

      (2) Although iGABASnFR2 fluorescence exhibits pH dependence, its response appears less pH-dependent compared to the first-generation sensor. To enhance clarity, we suggest plotting the normalized response of both sensors across different pH values. This visual representation would be highly informative for readers.

      Thank you - we have implemented this, now showing the (F_sat - F_apo)/F_apo response as a function of pH for all three sensors in Fig 4 fig. supp 3b. This visualization nicely illustrates that the apo-to-sat response of iGABASnFR1 is much more influenced by pH than either iGABASnFR2 or iGABASnFR2n, which we note on lines 252-253 of the ‘Tracked Changes’ document.

      (3) To provide a more comprehensive characterization of the sensors, we recommend including a quantification of the decay times for all three versions of the sensors in Figure 2, specifically after panel 2c.

      Thank you - we now provide this in Fig 2d.

      (4) For improved readability of Figure 3a, we suggest adding distinct labels for iGABASnFR1 and iGABASnFR2 with corresponding colors.

      Good suggestion - we matched the color of the backbones to the rest of the manuscript (orange and green). We also added labels on the figure to ensure clarity.

      (5) The GABA released by SAC cells in Figure 5 looks amazing! We propose a minor modification to the cartoon in Figure 5b: mirroring the image horizontally (left to right). Given that the subsequent panels (e, h, and k) set the preferred direction of SAC movement as rightward, the current cartoon in Figure 5b inadvertently suggests stronger inhibition by SAC-released GABA when the spot moves left. Mirroring the image would align the cartoon more accurately with the subsequent data representations.

      Thanks - this is a nice streamlining. We have implemented the change.

      Reviewer #2 (Recommendations for the authors):

      (1) As sensor performance differs substantially between purified protein and neurons, a summary table comparing key properties (e.g., EC50, ∆F/F <sub>ax</sub>, response amplitude to # of AP) across purified protein and neurons would be highly informative.

      We discuss differences in sensor performance between purified protein and neurons in the Discussion (lines 401-410 in ‘Tracked Changes document) and, for the reasons outlined there, consider neuronal measurements to be far more predictive of in vivo performance. We therefore chose not to include a summary table directly comparing purified protein and neuronal data, as this would risk over-emphasizing in vitro measurements that we view primarily as qualitative signposts rather than more directly informative indicators of functional performance.

      (2) The authors should comment on the observed differences in performance between purified protein and neuronal expression. Would HEK293 cell measurements serve as a better predictor of in vivo performance than in vitro titrations? Insights here would benefit future sensor development pipelines.

      We have revised the Discussion to address this point (lines 401-410 in the ‘Tracked Changes’ document). We often observe differences in sensor performance between purified protein measurements and cellular or in vivo contexts. In our experience, titrations in primary neurons provide a better predictor of in vivo performance than in vitro protein titrations, as they more closely reflect relevant cellular factors. We do not have direct evidence that expression in heterologous systems such as HEK293 cells is generally more predictive, although this seems plausible; however, predictions inevitably become less reliable as sensors are translated to fully in vivo conditions.

      (3) Improved membrane localization likely contributes to the enhanced sensitivity of iGABASnFR2 in neurons beyond changes in EC50. In Figure 2a, membrane trafficking appears suboptimal. The authors should explore alternative trafficking motifs (e.g., ER2, Kv2.1, or motifs from other sensors) to further improve the membrane expression and consider adding a second fluorescent protein for quantifying membrane-localized brightness.

      Figure 2a presents wide-field epifluorescence images, which lack the axial resolution required to distinguish membrane-localized signal from cytosolic fluorescence. We therefore do not consider this imaging modality suitable for assessing membrane trafficking efficiency or membrane-restricted expression.

      We appreciate the suggestion to test different trafficking motifs to attempt to better capture biological signals. While such optimization represents a valuable direction for future development, it was beyond the scope of the present study and not feasible with the available time and resources. We expect that the current characterization of iGABASnFR2 will nevertheless provide a strong foundation for future efforts to further optimize membrane targeting and expression using alternative trafficking strategies.

      (4) Figure 4 - Supplement 2: The apparent EC50 of iGABASnFR2 seems affected by buffer composition and the presence of high concentrations of unrelated compounds. The authors should comment on this.

      We thank the reviewer for raising this point. Upon closer inspection, the EC50 of iGABASnFR2 in Fig 4 Supp 2 is measured at 1.4 μM, while in Fig 4a it is 1.1 μM - these mean values are quite close to one another, and within the range of experimental variability we expect for experiments done weeks or months apart. What differs most noticeably in this dataset is the shape of the dose–response curve rather than the EC50 itself; the origin of this difference is currently unclear. We have revised the Results text (lines 226-231 in ‘Tracked Changes document) to clarify this point and to emphasize that the key observation of Fig. 4–figure supplement 2 is that none of the additional compounds tested substantially impair GABA binding, indicating that they do not act as strong non-competitive allosteric antagonists or inhibitors.

      (5) The negative-going variant, iGABASnFR2n, is introduced but only briefly characterized. Including additional data or even a conceptual use case would clarify its potential utility.

      We have modified the discussion to provide more examples of conceptual use cases, clarifying how such a sensor could indeed be highly impactful. The full passage is lines 372-387 in the ‘Tracked Changes’ document; to summarize: a key application of the negative-going sensor is detecting decreases in ‘GABA tone’, which plays a key role in setting the excitation-inhibition balance across brain circuits. Reductions in extrasynaptic GABA are a well-documented feature of several biologically important brain-state transitions, including arousal, experience-dependent plasticity, and stress-related modulation of inhibition, and iGABASnFR2n could be an important tool for investigating these processes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      BK channels are widely distributed and involved in many physiological functions. They have also proven a highly useful tool for studying general allosteric mechanisms for gating and modulation by auxiliary subunits. Tetrameric BK channels are assembled from four separate alpha subunits, which would be identical for homozygous alleles and potentially of five different combinations for heterozygous alleles (Geng et al., 2023, https://doi.org/10.1085/jgp.202213302). Construction of BK channels with concatenated subunits in order to strictly control heteromeric subunit composition had not yet been used because the N-terminus in BK channels is extracellular, whereas the C-terminus is intracellular. In this new work, Chen, Li, and Yan devise clever methods to construct and assemble BK channels of known subunit composition, as well as to fix the number of γ1 axillary subunits per channel. With their novel molecular approaches, Chen, Li and Yan report that a single γ1 axillary subunit is sufficient to fully modulate a BK channel, that the deep conducting pore mutation L312A exhibited a graded effect on gating with each addition mutated subunit replacing a WT subunit in the channel adding an additional incremental left shift in activation, and that the V288A mutation at the selectivity filter must be present on all four alpha subunits in order to induce channel inactivation. Chen, Li, and Yan have been successful in introducing new molecular tools to generate BK channels of known stoichiometry and subunit composition. They validate their methods and provide three examples of their use with useful observations.

      Strengths:

      Powerful new molecular tools for the study of channel gating have been developed and validated in the study.

      Weaknesses:

      (1) One example each of auxiliary, deep pore, and selectivity filter allosteric actions is presented, but this is sufficient for the purposes of the paper to establish their methods and present specific examples of applicability.

      We sincerely thank Reviewer #1 for the thoughtful and supportive evaluation of our work. We greatly appreciate the reviewer’s clear summary of the study and the recognition of the novelty and utility of our molecular concatemer strategy for controlling BK channel subunit composition and stoichiometry.

      We also appreciate the reviewer’s positive assessment that the three examples (auxiliary subunit modulation, deep pore mutation, and selectivity filter mutation) are sufficient to establish the method and demonstrate its applicability. We are encouraged that the reviewer found the new molecular tools to be powerful and well validated.

      We have no further changes to make in response to this review, but we are grateful for the reviewer’s constructive and encouraging comments.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes novel BK channel concatemers as a tool to study the stoichiometry of the gamma subunit and mutations in the modulation of the channel. Taking advantage of the modular design of the BK channel alpha subunit, the authors connected S1-S6/1st RCK as two- and four-subunit concatemers and coexpressed with S0-RCK2 to form normal function channels. These concatemers avoided the difficulty that the extracellular N-terminus of S0 was unable to connect with the cytosolic C-terminus of the gamma subunit, allowing a single gamma subunit to be connected to the concatemers. The concatemers also helped reveal the required stoichiometry of mutant BK subunits in modulating channel function. These include L312A in the deep pore region that altered channel function additively with each additional subunit harboring the mutation, and V288A at the selectivity filter that altered channel function cooperatively only when all four subunits were mutated. These results demonstrate that the concatemers are robust and effective in studying BK channel function and molecular mechanisms related to stoichiometry. The different requirement of the gamma subunit and the mutations stoichiometry for altering channel function is interesting, which may relate to the fundamental mechanism of how different motifs of the channel protein control function.

      Strengths:

      The manuscript presents well-designed experiments with high-quality data, which convincingly demonstrate the BK channel concatemers and their utility. The results are clearly presented.

      Weaknesses:

      This reviewer did not identify any major concerns with the manuscript.

      We sincerely thank Reviewer #2 for the careful reading of our manuscript and for the highly positive and supportive comments. We appreciate the reviewer’s detailed summary of our concatemer design strategy and its use in studying gamma subunit stoichiometry and mutation-dependent modulation of BK channel function.

      We are especially grateful for the reviewer’s recognition that the experiments are well designed, the data are of high quality, and the results demonstrate the robustness and utility of the concatemer approach. We also appreciate the reviewer’s thoughtful note on the mechanistic implications of the distinct stoichiometric requirements observed for the gamma subunit, L312A, and V288A.

      We are pleased that the reviewer identified no major concerns. We have no further changes to make in response to this review, and we thank the reviewer again for the positive evaluation.

      Recommendations for the authors:

      Reviewing Editor Comments:

      While the study presents a great methodological advancement, the phenomenological examples described could perhaps benefit from a little more mechanistic description/discussion. In particular, the functional effect of the V288A mutant is very novel. It could be useful to discuss whether this mutant impacts channel selectivity/conductance. It could be beneficial to also contrast the subunit dependence of V288A with that of the W434F mutant of the Shaker channel. In the latter, C-type inactivation gating is accelerated even when the mutant is present in a single subunit, which contrasts with the effect in V288A.

      We greatly appreciate the editor’s and reviewers’ thorough and constructive evaluation, and we have revised the manuscript accordingly.

      We added discussion with citation about the potential effect of V288A on selectivity (lines 348349). We also added the reported stoichiometric effects of mutations in Shaker and hERG1 channels on C-inactivation in discussion (lines 336-351). From these studies and our findings with V288A in BK channels, it is interesting to note that the stoichiometric effects of these mutations varies and those located near or within selectivity filter signature exhibited an all-or-none effect in both hERG1 and BK channels.

      The authors might also want to consider performing and showing immunoblots with the alpha_deltaM fragment co-expressed with the other channel fragments. Together with the GFP tag, this alpha_deltaM would perhaps be a ~90 kDa protein. It should be captured by anti-V5 IP and resolved on an SDS-PAGE gel (at least with the quad construct).

      We added supplemental data (Fig.1 – figure supplement 1) to show co-expression and co-IP of the α<sup>ΔM</sup>-GFP construct and a FLAG-tagged α<sub>M</sub> construct. The α<sup>ΔM</sup>-GFP displayed right size on SDS-PAGE. It is of note that the single unit α<sub>M</sub> construct tended to oligomerize even under denatured condition on SDS-PAGE.

      For Figure 4, providing details about the inter-pulse intervals and interpulse holding voltage would be helpful. I was not able to find this information in the methods or text.

      The inter-pulse intervals and holder voltage are now added in Fig. 4 legend (line 638).

      Reviewer #1 (Recommendations for the authors):

      (1) Submitted papers should have page numbers to facilitate reviewing.

      Both page and line numbers are added.

      (2) The designation of the various channel types, such as BKα and BKαM should be identical in the text and figures, so either drop BK in the text or add BK in the figures. Maybe drop BK in the text, as it is known that BK channels are the topic of this study.

      We appreciate the suggestion to be consistent in text and figures. We have dropped “BK” for “BKα<sub>M</sub>” throughout the text.

      (3) "Single Boltzmann fits of G-V curves" would be consistent with a homogenous channel population but do not necessarily suggest a single homogenous channel population of BK channels, as was shown by Geng et al. (2023) (https://doi.org/10.1085/jgp.202213302) where the G-V curve for simultaneous expression of five BK channel types with different V1/2s for each channel type was well approximated by a single Boltzmann function. The dogma that a single Boltzmann fit suggests one channel type needs to be reset. So wave a red flag here: whereas a single Boltzmann fit is consistent with a single channel type, it does not establish a single channel type nor even suggest a single channel type.

      We fully agree that a good Single Boltzmann fit doesn’t mean homogenous channel population. We have changed “suggesting” to “consistent with” (line 203) and “reflecting” to “agreeing with” (line 205).

      (4) Geng et al. (2023) demonstrated that the pore mutation G375R in BK channels gave a left shift in activation linearly related to the number of WT subunits replaced with mutant subunits. This should incremental shift in activation for G375R should be mentioned, as it is consistent with the incremental effects of the L312A deep pore mutation on activation as reported by the authors in their Figure 3D.

      We appreciate the pointing-out of this highly relevant publication. We have now included this reference and discussed together with L312A mutation (lines 309-313).

      (5) I went back and looked at the Lingle laboratory papers on the gamma subunit. An additional sentence or two on what the Lingle lab found and didn't find would be useful here for readers.

      In the Introduction, we have listed the Lingle lab’s findings and the limitations of their experimental methods that warrants the development of a concatenated construct method as proposed in this study (lines 84-88). We prefer to not discuss further in the Discussion as it will be redundant.

      (6) For the two examined mutations L312A and V288A, include in the Methods a 21 amino acid sequence for each mutation with the amino acid to be mutated (L or V) in the center, with beginning and end numbering at the beginning and end of each list. This will allow the reader/experimenter to readily locate the mutated residue on their BK amino acid sequences, which may have different numbering than U11058. Interestingly, for the so-called canonical sequence Q12791 · KCMA1_HUMAN that I found in UniProt starting with U11058, there is an L312, but I found no V288, but an F288. Am I doing this correctly? Do I have the correct sequence/isoform? The only sure way to identify an AA is with an extensive pre and post-sequence so that the chance of misidentification approaches zero.

      We verified that the listed Gene Bank IDs of U11058 for cDNA and AAB65837 for protein should point to the right sequences. In the section of Results, we have now included the peptide sequences of the selectivity filter signature motif and part of the S6 TM where V288 and L312A are located, respectively (lines 179 and 220).

      Reviewer #2 (Recommendations for the authors):

      The different stoichiometry of the gamma subunit and the mutations in regulating channel function raise important questions. For instance, what are the structural and energetic bases for their different stoichiometric requirements? Does the structure motif, such as the selectivity filter or deep pore, act as a unit? Or does a specific residue, such as V288 or L312, act individually to determine the different stoichiometric requirements? What molecular interactions are involved for these residues and subunit to influence the cooperativity among the four alpha subunits in channel function? Some of these questions are discussed in the manuscript, but it may help the readers to clarify what aspects of the mechanistic bases for the findings in this manuscript are known and what aspects remain to be studied.

      We agree that these are all important questions. We have now cited more previous studies on C-inactivation in other K<sup>+</sup> channels and on deep pore mutations in BK channels in terms of subunit stoichiometry (lines 336-351). The results appear to be consistent, suggesting shared properties among residues within the selectivity filter motif or among residues in deep pore region.

      Some minor comments are as follows.

      (1) Page 7, 2nd paragraph: "Page 2B" change to "Page 3B"? Also, "delay in deactivation" is not precise. The term "Delay" in channel kinetics has a specific meaning, and the use of this word here causes some confusion. The authors may want to delete "substantial delay in deactivation evident as a”.

      Corrected by changing Fig. 2B to Fig. 3B and deleting “a substantial delay in deactivation evident as” (line 191).

      (2) Page 9, 1st paragraph: "used in the voltage protocol used". Drop one of the instances of used".

      Corrected by deleting the first “used” (line 246).

      (3) Page 12, 1st paragraph: "Nonetheless, the tight inter-subunit cooperativity observed at the selectivity filter makes it a plausible candidate for serving as the activation gate, a property not yet demonstrated for the lower S6 segment." This seems to be an interesting idea. However, it is not clearly explained. The authors may want to clarify how the cooperativity is related to the activation gate.

      We have now added a sentence with citations to discuss the requirement of intersubunit cooperativity for an activation gate to function (lines 354-357).

      Other major changes: We updated immunoblot figures Fig1C and Fig2C for better presentation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The manuscript by Ma et al. provides robust and novel evidence that the noctuid moth Spodoptera frugiperda (Fall Armyworm) possesses a complex compass mechanism for seasonal migration that integrates visual horizon cues with Earth's magnetic field (likely its horizontal component). This is an important and timely study: apart from the Bogong moth, no other nocturnal Lepidoptera has yet been shown to rely on such a dual-compass system. The research therefore expands our understanding of magnetic orientation in insects with both theoretical (evolution and sensory biology) and applied (agricultural pest management, a new model of magnetoreception) significance.

      The study uses state-of-the-art methods and presents convincing behavioural evidence for a multimodal compass. It also establishes the Fall Armyworm as a tractable new insect model for exploring the sensory mechanisms of magnetoreception, given the experimental challenges of working with migratory birds. Overall, the experiments are well-designed, the analyses are appropriate, and the conclusions are generally well supported by the data.

      Strengths

      (1) Novelty and significance: First strong demonstration of a magnetic-visual compass in a globally relevant migratory moth species, extending previous findings from the Bogong moth and opening new research avenues in comparative magnetoreception.

      (2) Methodological robustness: Use of validated and sophisticated behavioural paradigms and magnetic manipulations consistent with best practices in the field. The use of 5-minute bins to study the dynamic nature of the magnetic compass which is anchored to a visual cue but updated with a latency of several minutes, is an important finding and a new methodological aspect in insect orientation studies.

      (3) Clarity of experimental logic: The cue-conflict and visual cue manipulations are conceptually sound and capable of addressing clear mechanistic questions.

      (4) Ecological and applied relevance: Results have implications for understanding migration in an invasive agricultural pest with an expanding global range.

      (5) Potential model system: Provides a new, experimentally accessible species for dissecting the sensory and neural bases of magnetic orientation.

      Weaknesses

      While the study is strong overall, several recommendations should be addressed to improve clarity, contextualisation, and reproducibility:

      We thank Reviewer #1 for the positive and encouraging evaluation of our study. We appreciate the recognition of our work’s strengths and are grateful for the constructive feedback on the remaining weaknesses, which will guide and strengthen our revisions.

      Structure and presentation of results

      Requires reordering the visual-cue experiments to move from simpler (no cues) to more complex (cue-conflict) conditions, improving narrative logic and accessibility for non-specialists.

      Thank you for this thoughtful suggestion. While we appreciate the rationale for presenting results from simpler to more complex conditions, we kept the original sequence because it aligns with the logic of our study. Our initial aim was to determine whether fall armyworms use a magnetic compass integrated with visual cues, as shown in the Bogong moth. After establishing this phenotype, we then examined whether visual cues are required for maintaining magnetic orientation. We have also clarified in the Introduction that magnetic orientation in the Bogong moth relies on integration with visual cues, which provides readers with clearer context and improves the overall narrative flow.

      Ecological interpretation

      (a) The authors should discuss how their highly simplified, static cue setup translates to natural migratory conditions where landmarks are dynamic, transient or absent.

      Thank you for raising this important point. We agree that natural migratory environments provide visual information that is often dynamic, transient, or intermittently absent, in contrast to the simplified and static cue used in our indoor experiments. Our intention in using a minimal, static cue was to isolate and test the fundamental presence of magnetic–visual integration in fall armyworms under fully controlled conditions.To address the reviewer’s concern, we have added a brief note in the Discussion indicating that fall armyworms may encounter both static and dynamic luminance-based visual cues in nature, such as light–dark gradients created by terrain features or more stable celestial patterns. Although these natural cues differ from our simplified laboratory stimulus, they may similarly provide asymmetric visual structure that can be integrated with magnetic information. We also note that determining which natural visual cues support the magnetic–visual compass will be an important direction for future work.

      (b) Further consideration is required regarding how the compass might function when landmarks shift position, are obscured, or are replaced by celestial cues. Also, more consolidated (one section) and concrete suggestions for future experiments are needed, with transient, multiple, or more naturalistic visual cues to address this.

      Thank you for this constructive suggestion. We appreciate the reviewer’s point that additional consideration of how the compass might function under shifting, obscured, or celestial visual cues would strengthen the manuscript. Given the limited evidence currently available for this species, we have incorporated a concise and appropriately cautious discussion addressing these possibilities.

      Methodological details and reproducibility

      (a) It would be better to move critical information (e.g., electromagnetic noise measurements) from the supplementary material into the main Methods.

      Thank you for this helpful suggestion. In the revised manuscript, we have added the key electromagnetic noise measurements information to the main Methods section.

      (b) Specifying luminance levels and spectral composition at the moth's eye is required for all visual treatments.

      Thank you for this helpful comment. We have clarified in the Methods as well as the legend of Fig. S3 that both luminance levels and spectral composition were measured at the position corresponding to the moth’s head.

      (c) Details are needed on the sex ratio/reproductive status of tested moths, and a map of the experimental site and migratory routes (spring vs. fall) should be included.

      Thanks. We have added the reproductive status of the tested moths in the Methods, specifying that all individuals used were unmated 2-day-old adults.

      (d) Expanding on activity-level analyses is required, replacing "fatigue" with "reduced flight activity," and clarifying if such analyses were performed.

      Thank you for this comment. In this context, the term “fatigue” referred to the possibility that moths might gradually lose motivation or attention to orient when flying for an extended period in a simplified, artificial environment with limited sensory cues. Such a decrease in orientation motivation over time could, in theory, lead to a loss of individual orientation and consequently to the observed loss of group orientation. To test this possibility, we analyzed the orientation performance of each individual moth across different phases using the Rayleigh test. The r-value was used as a measure of individual directedness (higher r-values indicate stronger orientation). Our results showed that mean r-values did not differ significantly among the experimental phases (multiple comparisons, Table S2). This indicates that 25min measurement itself was not responsible for the loss of orientation. We did not perform a quantitative activity-level analysis in this study. However, as mentioned in Methods, flight activity was continuously monitored during the experiments by observing fluctuations in the pointer values on the experimental software, which corresponded to the moth’s rotational movements. If the pointer values remained unchanged for more than 10 seconds, the experimenter checked for wing vibrations by sound; if the moth had stopped flying, gentle tapping on the arena wall was used to stimulate renewed flight. Only individuals that maintained active flight throughout the experiment, with fewer than four instances of wingbeat cessation, were included in the analysis. We also mentioned that activity level analysis was not performed due to technical difficulties in the revised manuscript.

      Figures and data presentation

      (a) The font sizes on circular plots should be increased; compass labels (magnetic North), sample sizes, and p-values should be included.

      Thank you for this helpful suggestion. Regarding the compass labels and statistical reporting, our analysis provides significance levels as ranges rather than exact p-values; therefore, we clarified in the figure legends that the two dashed circles correspond to thresholds for statistical significance p = 0.05 and p = 0.01, respectively. Sample sizes are already indicated within each panel. To avoid visual clutter caused by displaying both magnetic North and South, we show only the magnetic South direction (mS) consistently across panels, which can improve readability.

      (b) More clarity is required on what "no visual cue" conditions entail, and schematics or photos should be provided.

      Thank you for this comment. In our study, the “no visual cue” condition refers to the absence of the black triangular landmark inside the flight simulator. To improve clarity, we have updated the legend of Fig. 4 to explicitly state this and have referred readers to the schematic in Fig. 1, which illustrates the structure of the flight simulator. These additions clarify what the “no visual cue” condition entails without requiring additional schematics.

      (c) The figure legends should be adjusted for readability and consistency (e.g., replace "magnetic South" with magnetic North, and for box plots better to use asterisks for significance, report confidence intervals).

      Thank you. Regarding the choice of compass labeling, we intentionally used magnetic South (mS) rather than magnetic North (mN) because the main population tested in our experiments represents the autumn migratory generation. During autumn, fall armyworms orient southward when visual and magnetic cues are aligned. Using magnetic South in the plots therefore provides a clearer representation of cue alignment in this season and avoids potential confusion when interpreting the combined visual–magnetic information.

      Conceptual framing and discussion

      (a) Generalisations across species should be toned down, given the small number of systems tested by overlapping author groups.

      Thank you for this valuable comment. In the revised manuscript, we have softened such statements in both abstract and maintext.

      (b) It requires highlighting that, unlike some vertebrates, moths require both magnetic and visual cues for orientation.

      Thank you for this helpful suggestion. We have added a sentence to the Discussion explicitly highlighting that, unlike some vertebrates capable of using magnetic information in the absence of visual cues, moths require the integration of both magnetic and visual cues for accurate orientation. This clarification emphasizes the distinct multimodal nature of compass use in migratory moths.

      (c) It should be emphasised that this study addresses direction finding rather than full navigation.

      Thank you for this important clarification. We have now made it explicit in the manuscript that our experiments address direction finding (i.e., orientation) rather than full navigation. This distinction is stated in both the Introduction and Discussion to clearly define the scope of the study.

      (d) Future Directions should be integrated and consolidated into one coherent subsection proposing realistic next steps (e.g., more complex visual environments, temporal adaptation to cue-field relationships).

      Thank you for this constructive suggestion. We agree that outlining realistic next steps is valuable. However, given the limited scope of the current data, we have only slightly expanded the existing forward-looking statements in the Discussion.

      (e) The limitations should be better discussed, due to the artificiality of the visual cue earlier in the Discussion.

      Thank you for this comment. We agree that the artificiality of the visual cue is an important limitation of the present study. Rather than extending speculative discussion, we have clarified this limitation in the revised Discussion and highlighted the key questions that future work must address.

      Technical and open-science points

      Appropriate circular statistics should be used instead of t-tests for angular data shown in the supplementary material.

      Thank you for this comment. We have addressed this point (Fig. S1) in the revised supplementary material.

      Details should be provided on light intensities, power supplies, and improvements to the apparatus.

      Thank you. Light intensities are reported as spectral irradiance measurements in Supplementary Materials, which provide full wavelength-resolved information for the illumination used, although a separate measurement of total illuminance (lux) was not performed. We have also added the requested information on the power supplies.

      The derivation of individual r-values should be clarified.

      Thanks. We have clarified in the revised manuscript.

      Share R code openly (e.g., GitHub).

      Thanks. We are in the process of organizing the relevant R code, but have not been able to upload it to GitHub before the current revision deadline. The code is available from the corresponding author upon request.\

      Some highly relevant - yet missing - recent and relevant citations should be added, and some less relevant ones removed..

      Thanks. We added one recent relevant reference to the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This work provided experimental evidence on how geomagnetic and visual cues are integrated, and visual cues are indispensable for magnetic orientation in the nocturnal fall armyworm.

      Strengths:

      Although it has been demonstrated previously that the Australian Bogon moth could integrate global stellar cues with the geomagnetic field for long-distance navigation, the study presented in this manuscript is still fundamentally important to the field of magnetoreception and sensory biology. It clearly shows that the integration of geomagnetic and visual cues may represent a conserved navigational mechanism broadly employed across migratory insects. I find the research very important, and the results are presented very well.

      We thank Reviewer #2 for the positive and encouraging evaluation of our study. We appreciate the recognition of our work’s strengths.

      Weaknesses:

      The authors developed an indoor experimental system to study the influence of magnetic fields and visual cues on insect orientation, which is certainly a valuable approach for this field. However, the ecological relevance of the visual cue may be limited or unclear based on the current version. The visual cues were provided "by a black isosceles triangle (10 cm high, 10 cm 513 base) made from black wallpaper and fixed to the horizon at the bottom of the arena". It is difficult to conceive how such a stimulus (intended to represent a landmark like a mountain) could provide directional information for LONG-DISTANCE navigation in nocturnal fall armyworms, particularly given that these insects would have no prior memory of this specific landmark. It might be a good idea to make a more detailed explanation of this question.

      We appreciate the constructive feedback on the weaknesses, which will guide and strengthen our revisions. To address the reviewer’s concern, we have added a brief note in the Discussion indicating that fall armyworms may encounter both static and dynamic luminance-based visual cues in nature, such as light–dark gradients created by terrain features or more stable celestial patterns. Although such natural cues differ from our simplified laboratory stimulus, they may represent intermittently sampled visual inputs that can be optimally integrated with magnetic information, whether the cues are static or changing, and brief periods without them may still allow the subsequent recovery of a stable long-distance orientation strategy.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major to Medium Suggestions

      (a) Reordering of Visual Cue Tests

      The manuscript currently presents cue-conflict experiments before the simpler "no visual cue" tests. For non-specialist readers, it would be more logical to start with the basic condition (no visual cues) and then move to progressively more complex ones. This provides a clearer and more logically sound narrative.

      For example, the results could first demonstrate that without visual cues, the moths fail to orient (both in darkness and uniform light), and then show that introducing a single salient cue (a triangle on the horizon) restores directed behaviour. This would help readers understand the logic of the progression and should be better integrated throughout the Results and Discussion.

      Thanks. We have responded this comment in Public Reviews.

      (b) Translating Key Findings to Realistic Scenarios (LL 333-344 or where suitable in Discussion, and mentioning that we utilised a reductionist principle first in Intro, but clearly articulated that it is very simplified)

      The main text (eg Discussion) should address how these findings translate to real-world conditions. The experimental design used a single, highly salient, and static cue, always aligned with the migratory direction. In nature, such a consistent landmark is unlikely-mountains or other features would shift position relative to the moth's trajectory as it flies.

      Key questions arise which need to be addressed:

      - How would the compass system adapt to changing landmark positions as the moth moves?

      - What happens when no landmarks are visible (e.g. over flat plains or cloudy nights)?

      - Would stellar or other cues take over in such cases? Your hypotheses, please.

      Addressing these points - and proposing specific future experiments (e.g. with transient or multiple visual cues)-would strengthen the ecological relevance of the findings and show a clear way forward.

      Thanks for your kind comments. We now explicitly state in the Introduction that our study employs a reductionist approach using a simplified visual environment to isolate magnetic-visual interactions. As the ecological questions raised by the reviewer cannot be addressed with the current dataset, we avoid extended speculation but have added brief clarification in the Discussion and addressed these points in the Public Reviews response. We also indicate that future work will need to examine the types of visual cues that can support magnetic orientation and how such cues couple with geomagnetic information.

      Technical and Methodological Points

      (a) Incomplete Methods Section

      Critical technical information (e.g. electromagnetic noise measurements) currently appears only in supplementary figure legends. All such details should be included in the main Methods section if the word count allows (or include a short section in the main text with reference to more details in the supplementary material).

      Thanks for your kind comments. We have addressed this as suggested in the Public Reviews.

      (b) Lighting Conditions

      Specify luminance levels (the amount of light emitted and passing through in quanta per unit of surface, eg m2) at the moth's eye and indicate whether spectral composition was consistent between treatments (with and without the visual cue).

      Thanks for your comments. We have responded to this point in the Public Reviews.

      (c) Figures

      - Increase font sizes on circular histograms.

      - Add compass labels (ideally magnetic North, mN, not south, etc, as it is usual in pertinent literature), sample sizes, and p-values on each panel.

      - Replace "magnetic South" (mS) indicators with magnetic North (mN) to align with convention.

      Thanks for your comments. We have responded to this point in the Public Reviews.

      (d) Migratory Expectations

      Include expected compass bearings for spring and autumn migrations (with citations) to relevant figures (Figure 2, 4, S2).

      Thanks for your comments. We have added the information that “We recently found that fall armyworms from the year-round range in Southwest China (Yunnan) exhibit seasonally appropriate migratory headings when flown outdoors in virtual flight simulators, heading northward in the spring and southward in the fall, and this seasonal reversal is controlled by photoperiod (Chen et al., 2023).” in Introduction. Thus, we didn’t offer expected seasonal compass bearings in Results section.

      (e) Add a map showing the experimental site and known migratory routes, clearly labelling spring vs fall routes. It would help justify expected headings.

      Thank you for this suggestion. At present, there are no experimentally validated migratory routes (e.g., through mark-release-recapture or tracking approaches) for the specific fall armyworm population used in our study. Because these routes have not been biologically confirmed, we didn’t offer a presumed migratory map that may imply unwarranted certainty.

      (f) Composition of Test Groups

      Indicate sex ratios and reproductive status (mated/unmated) of tested moths, if known or comment if unknown, as both can affect migratory motivation and behaviour.

      Thank you for this suggestion. We have responded to this point in the Public Reviews.

      (g) Role and Nature of Visual Cues

      While the results clearly show that orientation disappears without visual cues, the triangle cue is highly artificial. Well-studied Bogong moths are known to rely on views of Australian mountain ranges during their nocturnal migrations, but there is no evidence that armyworms use a similar strategy. Even for bogongs, it is not just one salient mountain always in front of them on migration. Discuss whether Fall Armyworm would encounter comparable natural cues in the field along their migratory route, or whether the triangle might simply provide a frame of reference rather than a true landmark.

      Thank you for this comments. We have responded to this point in the Public Reviews.

      (h) Future work could test:

      - More naturalistic sky cues (moonlight, star fields).

      - Varying the landmark's position relative to the magnetic field - slowly moving along - transient landmarks. Also, less salient landmarks and a more complex skyline, as it is usually more complex than just a single salient peak.

      Thank you for this comments. We have responded to this point in the Public Reviews. Brief discussion as suggested has been added to the revised manuscript.

      Minor Comments and Line-by-Line Suggestions

      L70 - Check citation (possibly Mouritsen 2018). Missing in the list of references.

      Thanks. This point has been addressed.

      L75 - Consider citing the new and highly relevant preprint:

      Pakhomov, A., Shapoval, A., Shapoval, N., & Kishkinev, D. (2025). Not All Butterflies Are Monarchs: Compass Systems in the Red Admiral (Vanessa atalanta). bioRxiv.

      Thanks. We have cited this reference.

      LL81-82 - Clarify vague phrasing; specify criteria for "good" vs "poor" orientation ability. Or reword/leave out.

      Thanks for your comments.

      L85 - "but one," not "bar one." 

      Thanks. Corrected.

      L124 - The 2 genetic citations are weakly linked to magnetoreception. We do not have a clear understanding of the insect magnetoreceptor and its underlying mechanism, so we simply cannot interpret genetic associations very well to underpin them to magnetoreception. For example, does noctuid's magnetic sense require a magnetised-based receptor and genes involved in biomineralization? Consider removing or softening claims. 

      Thanks. Adressed.

      LL123-126 - Define what for YOU constitutes "strong evidence" for magnetoreception (e.g. adaptive directional behaviour consistent with migratory orientation?). Is there such a thing as strong evidence at all?

      Thanks for your comments. We agree that terms such as “confirmed” or “strong evidence” can overstate the certainty of magnetoreception findings, given the ongoing debates in the field. In the revised manuscript, we have toned down.

      L153 - Indicate whether coils in NMF condition were powered or inactive.

      Thanks for your comments. Addressed.

      L163 - Justify use of multiple 5-min phases (e.g. temporal resolution of behaviour). It is confusing at the start, where first mentioned, and becomes clearer only towards the end, but it should be clearer at the start.

      Thanks for your comments. The assay was divided into these 5-min segments to provide the temporal resolution needed to detect changes in flight orientation as the relative alignment of magnetic and visual cues was systematically altered. We now clarify this earlier in the Results.

      LL167-171 - This is a good place where you can provide a map (main or supplementary with referencing) showing the study site and migration routes.

      Thanks for your suggestion. We have responded to this point in the Public Reviews.

      L174 - Avoid repetition of "expected."

      Thanks. Addressed.

      LL176-177 - Report 95% confidence intervals or equivalent and clarify which test (e.g. Moore's paired test) each p-value refers to.

      Thanks for your suggestion.

      LL189-191 - explain what fatigue means. I would remove fatigue and substitute it with "lowered flight activity". Also, the same statement comes later, so avoid repetitiveness and remove it in one place. The analysis of directedness is good throughout, but what about the analysis of activity level? Could you explain whether you did it or not, and if not, why, or if angular changes can serve as an activity proxy? Replace "fatigue" with "reduced flight activity." Avoid repetition. Clarify if activity level analysis was performed or if it was not, e.g. due to technical difficulties.

      Thanks for your comments. We have responded to this point in the Public Reviews.

      L196 - Note whether 95% CI overlaps with the expected direction. This is a crucial outcome.

      Thanks for your comments.

      LL203-205 - unclear, better to stick to "congruency", especially "initial congruency for the relationship between mN and visual cue" throughout.

      Thanks for your suggestions.

      L206 - Better to introduce a new subheading: "Laboratory-Reared Animals.".

      Thanks for your suggestion. A new subheading has been added in the revised manuscript.

      LL207-208 - Clarify which cues were available in Chen et al. (2023) and how they differ here.

      Thanks for your comments. In Chen et al. (2023), the moths oriented under an artificial starry sky together with optic flow cues. In contrast, our experiments intentionally removed both the starry-sky pattern and optic flow to avoid introducing additional visual information when testing magnetic-visual integration for orientation. We have added further clarification regarding the conditions used in Chen et al. (2023) in the revised manuscript.

      L228 - Use "lab-reared" consistently throughout the entire MS. Do not mix with lab-raised.

      Thanks. Addressed by consistently using “lab-raised”.

      Figure 2 - Confusing in parts, especially for people coming from birds and other vertebrates orientation background. At 12 o'clock, you usually expect either mN / gN (magnetic or geographic North) or the animal's own initial directional response used as control to compare the same animal's direction post-treatment. Here, your 6 o'clock is magnetic South in the first place - non-conventional. At 12 o'clock, better use mN or gN. Avoid using non-conventional references such as magnetic south. Remind readers of seasonally appropriate headings and refer to the map.

      Thanks. We have responded to this point in the Public Reviews.

      LL232-234 - Emphasize that cue-magnetic congruency is key. Highlight the most important point that the congruency between the seasonal migratory direction and visual cues is key, not that in spring/fall, visual cues must be towards or opposite to the migratory goal. But the visual cue could be in the migratory direction or opposite, or at an angle - this is for future direction.

      Thanks. We have responded to this point in the Public Reviews.

      Figure 2 and associated main text - highlight that you only tested the designs when in all seasons the salient and single visual cue was in the migratory direction (in spring it coincided with mN but in fall it was towards the magnetic south). Other directions of visual cues have not been tested, but for simplicity and consistency, you chose to do these ones as the first step, perhaps.

      Thank you for this insightful comment. Yes, our experiments tested only the conditions in which the salient and single visual cue was aligned with the migratory direction. Other angular relationships between visual cues and the magnetic field were not examined in this study. For simplicity and consistency, we focused on this alignment as a first step toward understanding magnetic-visual cue integration in migratory orientation. We now highlight this in the Fig. 2 legend.

      Figures captures/legends - hard to tell from the main text now, better to italicize figure caption text and visually space them from the main text.

      Thanks for your suggestions.

      LL 250-251 - mention to people more familiar with r - lowercase - what is the expected range for R uppercase. It is not bound 0-1 as r. Could it be negative? How large can it be?

      Thanks. Thanks for the comment. After revisiting Moore (1980) we think that R* cannot take negative values. However, since R* = R*/N^ (3/2), it is not bounded between 0 and 1. We didn’t find any concept of an upper bound in the paper (https://doi.org/10.2307/2335330).

      Figure 3 - Consider adding a horizontal line indicating the 5% significance threshold.

      Thanks for your suggestions.

      L 261 - need to have some narrative after the subheading before you insert Figure 3.

      Thanks. Addreseed.

      LL274-275 - highlight that the timeline of this congruency between mN and a landmark and the effect of this on directedness is not explored here, but worth doing in future. How long does a new congruency or a relationship between mN and a visual cue need to be exposed to the animal to regain its directional response? Clearly, it is just a question of time of exposure so that a new association is established. Suggest future work on time-dependent adaptation to new cue-field relationships.

      Thanks for your suggestion. We have now included this point as a future direction in the revised Discussion.

      Figure 4 & S4 - Replace letters with asterisks/brackets for significance. The use of the letter is confusing and unconventional.

      Thanks for your suggestion.

      Figure 4 caption - Clarify the main takeaway.

      Thanks for your suggestion.

      Figure 4 - bare minimum is confusing. I understand that you wanted to avoid "no visual cues" because, as long as the animal sees things, there are things to be used as visual cues, even if this is not the intention of the experimenter. However, it needs clarification and rewording. Better to be more specific, like "no black triangle and horizon were used, just the uniformly white cylinder", or something like that.

      Thanks for your comments. In our setup it accurately describes the intentional removal of both the black triangle and the horizon, leaving only the uniformly white cylinder as the visual environment. This wording was chosen to reflect the practical limitations of producing a perfectly symmetrical flight simulator under laboratory conditions, and we therefore prefer to retain the original phrasing.

      L328 - Remove Xu et al. (2021) citation (not relevant). This is an in vitro study with a protein which may not work exactly as it is claimed in the paper in vivo.

      Thanks. Citation removed.

      L349-350 - Clarify what "no visual cue" means (e.g., uniformly white cylinder, no horizon line). Include a photo or a schematic of the inner surface of the cylinder for this condition in the Supplementary Materials.

      Thanks. We have responded to this point in the Public Reviews.

      L380 & throughout - Replace "barely minimum visual cues" (BMVC) with "no visual cues", clarifying limitations in Methods, meaning that you can explain that absolutely no visual cues is practically impossible because, as long as there is light, animals can use some asymmetries as cues even if this is not the intention of the experimenter.

      Thank you for this comment. We have decided to retain the term “barely minimum visual cues (BMVC)” because it accurately describes our experimental condition, which is distinct from a true “no visual cues” environment. In the revised Figure legend, we now clarify that BMVC refers to conditions in which obvious visual cues (i.e., features such as the black triangle in Fig. 1) were removed, while acknowledging that complete elimination of all visual information is not possible under illuminated conditions.

      L396 - Be cautious when generalizing from two species tested by a research group that is not absolutely independent (some authors in bogong and armyworm works overlap). We saw examples in diurnal migratory butterflies (Monarchs), a more studied species than the armyworm, that the findings do not entirely translate to Red Admirals (Pakhomov et al. 2025 preprint mentioned). Suggestion to tone down any claims of broad generalisation throughout the manuscript.

      Thank you for this comment. We have responded to this point in the Public Reviews.

      LL402-407 - Note that, unlike birds (e.g. European robins), moths appear to require both magnetic and visual cues for orientation, whereas birds, mole rats and some other animals can use magnetic cues alone.

      Thank you for this comment. We have responded to this point in the Public Reviews.

      L410 - Specify that this is correct only in the Northern Hemisphere.

      Thank you for this comment. Addressed.

      LL415-416 - Acknowledge artificiality of single-cue setup (see the major comments above); integrate earlier in the Discussion.

      Thank you for this comment. We have responded to this point in the Public Reviews.

      LL420-425 - Consolidate Future Directions into a single subsection; include more concrete experimental ideas, for example, using more naturalistic, numerous transient landmarks (could be done in a virtual maze with LEDs on the wall of the cylinder with cues moving with time). Multiple visual cues. Manipulating with salience of cues - less simplistic, less salient.

      Thank you for this comment. We have responded to this point in the Public Reviews.

      L431 - Does this paper support this statement? I think it just tested the use of stellar cues in a zero magnetic field. It also dealt with direction finding, not navigation, which is a position-finding ability - a much more complex feat and might not be the ability of moths (requires further studies like with geographic and magnetic displacements, etc). Reword and check this. Show the distinction between direction finding and navigation.

      Thank you for this comment. We have reworded the relevant sentence to use “orientation” instead of “navigation”.

      L436-437 - Specify "global visual cues" (stellar, lunar, etc.) and merge all future directions into one coherent section.

      Thank you for this comment. Addressed.

      LL443-446 - A bit early to plan such studies because migratory direction could well be a complex multigenetic trait, so that you cannot approach it simply with the knock out of a single gene. The genetic basis of magnetic direction needs to be first demonstrated, which leads you to the Future Directions section.

      Thank you for this helpful comment. We fully agree that migratory direction is likely a complex multigenic trait, and our intention was not to imply that knocking out a single gene would be sufficient to explain magnetic or migratory orientation. Our statement aimed only to highlight that identifying candidate genes is an important first step toward understanding the genetic basis of magnetic orientation.

      Line 496 - Clarify whether optic flow was used (unlike previous studies).

      Thank you for pointing this out. Clarified.

      LL499-511 - Clarify the improvements done in Chen's system and their relevance.

      Thank you for pointing this out. We reworded this sentence “The Flash flight simulator system was developed based on the early design of the Mouritsen-Frost flight simulator and adapted for our experiments in Yuanjiang”.

      Line 531 - Report and compare light intensities between indoor and outdoor experiments.

      Thanks for this comment. Unfortunately, due to the sensitivity limits of our current equipment, we were unable to reliably measure outdoor light intensities at night. However, we did not perform any open-top outdoor flight-simulator experiments; instead, we used field-captured moths but conducted all behavioral tests indoors.

      L549 - Add make/model of power supplies.

      Thanks. Addressed.

      LL582-585 - Specify whether R code will be shared; recommend open access (e.g., GitHub, other open repositories). Reiterate the importance of open science and sharing all scripts. Also here, add citations to some studies where MMRT has been used recently.

      Thank you for this comment. We have responded to this point in the Public Reviews.

      Line 592 - Explain how individual r-values were derived from optical encoder data.

      Thank you for this comment. Addressed.

      L842-843 - t-tests are inappropriate for angular data; use circular tests (Watson-Williams, Mardia-Watson-Wheeler, etc.).

      Thank you for this comment. Addressed.

      L865 - Reword to avoid repetition of "fall." Example: "In field captured armyworms during fall migration".

      Thank you for this comment. Addressed.

      LL882-885 - Improve phrasing and language here. Confirming that - no colon after. "Both the acrylic plate and diffusion paper." Confirm relevance of spectra to moth visual sensitivity - add relevant citation to original studies showing that.

      Thank you for this comment. Addressed.

      L886 - Reword "uniform" - does not look uniform to me.

      Thank you for this comment. Addressed.

      Reviewer #2 (Recommendations for the authors):

      The first two sentences of the abstract ("The navigational mechanisms employed by nocturnal insect migrants remain to be elucidated in most species. Nocturnal insect migrants are often considered to use the Earth's geomagnetic field for navigation, yet the underlying mechanisms of magnetoreception in insects remain elusive") are somewhat redundant. The authors may consider rewriting them.

      Thank you for pointing this out. We have rewritten this opening to provide a more concise and non-repetitive introduction.

    1. Author response:

      We would like to thank the reviewers for their supportive comments which largely agree with our main finding that a heterogeneous population of dendritic cells and Th2-skewed macrophages interact with the PDPN+ niche at the cribriform plate during EAE neuroinflammation. Additionally, they have provided several meaningful critiques to our study which we are now working on addressing in a newly revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      This paper formulates an individual-based model to understand the evolution of division of labor in vertebrates. The model considers a population subdivided in groups, each group has a single asexually-reproducing breeder, other group members (subordinates) can perform two types of tasks called "work" or "defense", individuals have different ages, individuals can disperse between groups, each individual has a dominance rank that increases with age, and upon death of the breeder a new breeder is chosen among group members depending on their dominance. "Workers" pay a reproduction cost by having their dominance decreased, and "defenders" pay a survival cost. Every group member receives a survival benefit with increasing group size. There are 6 genetic traits, each controlled by a single locus, that control propensities to help and disperse, and how task choice and dispersal relate to dominance. To study the effect of group augmentation without kin selection, the authors cross-foster individuals to eliminate relatedness. The paper allows for the evolution of the 6 genetic traits under some different parameter values to study the conditions under which division of labor evolves, defined as the occurrence of different subordinates performing "work" and "defense" tasks. The authors envision the model as one of vertebrate division of labor.

      The main conclusion of the paper is that group augmentation is the primary factor causing the evolution of vertebrate division of labor, rather than kin selection. This conclusion is drawn because, for the parameter values considered, when the benefit of group augmentation is set to zero, no division of labor evolves and all subordinates perform "work" tasks but no "defense" tasks.

      Strengths:

      The model incorporates various biologically realistic details, including the possibility to evolve age polytheism where individuals switch from "work" to "defense" tasks as they age or vice versa, as well as the possibility of comparing the action of group augmentation alone with that of kin selection alone.

      Weaknesses:

      The model and its analysis are limited, which in my view makes the results insufficient to reach the main conclusion that group augmentation and not kin selection is the primary cause of the evolution of vertebrate division of labor. There are several reasons.

      (1) First, although the main claim that group augmentation drives the evolution of division of labor in vertebrates, the model is rather conceptual in that it doesn't use quantitative empirical data that applies to all/most vertebrates and vertebrates only. So, I think the approach has a conceptual reach rather than being able to achieve such a conclusion about a real taxon.

      We appreciate the reviewer’s point that our model does not incorporate quantitative empirical data across vertebrate taxa. This is indeed a limitation and reflects the current lack of fine-scale datasets on task division, the influence of life-history traits, and the fitness consequences of different cooperative activities in vertebrates. One of our aims, however, is precisely to stimulate such empirical work by highlighting the value of examining division of labor in species inhabiting harsh environments, considering age/size/dominance structure when evaluating variation in cooperative activities, and incorporating defense behaviors more consistently into analyses of helping, especially since defenders are often overlooked relative to the classic helpers-at-the-nest that provision offspring. The model therefore remains directly relevant to vertebrate systems because it departs from insect-inspired approaches that focus on fitness outcomes based solely in maximizing colony productivity. Instead, it incorporates direct fitness benefits to group members, an essential feature of vertebrate cooperative breeding and of other systems with fertile “workers,” as we clarified in the discussion.

      (2) Second, I think that the model strongly restricts the possibility that kin selection is relevant. The two tasks considered essentially differ only by whether they are costly for reproduction or survival. "Work" tasks are those costly for reproduction and "defense" tasks are those costly for survival. The two tasks provide the same benefits for reproduction (eqs. 4, 5) and survival (through group augmentation, eq. 3.1). So, whether one, the other, or both helper types evolve presumably only depends on which task is less costly, not really on which benefits it provides. As the two tasks give the same benefits, there is no possibility that the two tasks act synergistically, where performing one task increases a benefit (e.g., increasing someone's survival) that is going to be compounded by someone else performing the other task (e.g., increasing that someone's reproduction). So, there is very little scope for kin selection to cause the evolution of labor in this model. Note synergy between tasks is not something unusual in division of labor models, but is in fact a basic element in them, so excluding it from the start in the model and then making general claims about division of labor is unwarranted. In their reply, the authors point out that they only consider fertility benefits as this, according to them, is what happens in cooperative breeders with alloparental care; however, alloparental care entails that workers can increase other's survival *without group augmentation*, such as via workers feeding young or defenders reducing predator-caused mortality, as a mentioned in my previous review but these potentially kin-selected benefits are not allowed here.

      We understand the reviewer’s concern that our model restricts the scope for kin-selected benefits by not including task-specific synergy effects—specifically, help that directly increases the survival of group members (e.g., load-lightening via feeding young, or predator defense that reduces mortality of breeders or offspring independently of group augmentation). We agree that such effects can occur in some cooperative breeders, and that they can, in principle, generate indirect fitness benefits. However, even when helpers increase the survival of breeders or reduce parental investment per offspring, these effects generally translate into higher breeder productivity—either via increased fecundity, increased survival to the next breeding attempt, or increased investment in subsequent broods. Thus, although we treat benefits in terms of enhanced breeder productivity, this formulation implicitly captures a range of help-related effects that ultimately improve the reproductive output of the breeders, including those mediated through increased survival. For this reason, we believe that the model remains relevant for vertebrate systems despite not representing each pathway separately.

      (3) Third, the parameter space is understandably little explored. This is necessarily an issue when trying to make general claims from an individual-based model where only a very narrow parameter region of a necessarily particular model can be feasibly explored. As in this model the two tasks ultimately only differ by their costs, the parameter values specifying their costs should be varied to determine their effects. In the main results, the model sets a very low survival cost for work (yh=0.1) and a very high survival cost for defense (xh=3), the latter of which can be compensated by the benefit of group augmentation (xn=3). Some limited variation of xh and xn is explored, always for very high values, effectively making defense unevolvable except if there is group augmentation. In this revision, additional runs have been included varying yh and keeping xh and xn constant (Fig. S6), so without addressing my comment as xn remains very high. Consequently, the main conclusion that "division of labor" needs group augmentation seems essentially enforced by the limited parameter exploration, in addition to the second reason above.

      As we have explained in previous revisions, the costs associated with work and defense are not directly comparable because they affect different fitness components: work costs reduce dominance, whereas defense costs reduce survival. Whether a particular cost is “high” or “low” can only be evaluated by examining the evolved reaction norms and identifying the ranges over which these norms change. For this reason, we focused on parameter ranges that actually generate shifts in reaction norms rather than presenting large regions of parameter space where nothing changes.

      We also reiterate that we did in fact explore broader parameter ranges than those shown in the main text. Additional analyses, including those specifically designed to identify conditions under which division of labor evolves under kin selection alone, are provided in the Supplementary Material. Specifically, Figure S1 addresses the point raised by the “need” of group augmentation benefits for defense to evolve, by increasing the baseline survival x<sub>0</sub>.

      We now include one additional figure in the Supplementary Material with a lower value for the benefit of group size (x<sub>n</sub> = 1 instead of x<sub>n</sub> = 3), and we extended the range of x<sub>h</sub> to include lower values (x<sub>h</sub> = 1). As we can see in Figure S7 and Table S8, group augmentation benefits are still the primary reason for individuals to group (see dispersal values). For low benefits of group augmentation, defense evolves in harsh environments in the absence of kin selection, and in benign environments when both direct and indirect fitness benefits take place. We have also now expanded the results section to include these last results. Note that we also checked even lower values for x<sub>h</sub> under the only kin selection implementation, with results being qualitatively similar, but chose not to include them in the manuscript since it is already a very long Supplementary Material. Here are the averages for two examples with x<sub>h</sub> = 0.1 and when we promote division of labor:

      Author response table 1.

      In short, the conclusion that division of labor requires group augmentation is not an artifact of limited parameter exploration. It arises because kin selection alone favors division of labor only under highly restrictive parameter combinations, whereas including direct fitness benefits substantially expands the conditions under which division of labor evolves. This pattern is consistent across the full set of parameter combinations we examined.

      (4) Fourth, my view is that what is called "division of labor" here is an overinterpretation. When the two helper types evolve, what exists in the model is some individuals that do reproduction-costly tasks (so-called "work") and survival-costly tasks (so-called "defense"). However, there are really no two tasks that are being completed, in the sense that completing both tasks (e.g., work and defense) is not necessary to achieve a goal (e.g., reproduction). In this model there is only one task (reproduction, equation 4,5) to which both helper types contribute equally and so one task doesn't need to be completed if completing the other task compensates for it; instead, it seems more fitting to say that there are two types of helpers, one that pays a fertility cost and another one a survival cost, for doing the same task. So, this model does not actually consider division of labor but the evolution of different helper types where both helper types are just as good at doing the single task but perhaps do it differently and so pay different types of costs. In this revision, the authors introduced a modified model where "work" and "defense" must be performed to a similar extent. Although I appreciate their effort, this model modification is rather unnatural and forces the evolution of different helper types if any help is to evolve.

      In previous models of division of labor in eusocial insects, the implicit benefit is also colony-level productivity (see Beshers & Fewell, 2001, for a review of division of labor in insects). Even in humans, division of labor functions as a means to increase efficiency toward achieving a shared goal. Our model adopts this same interpretation, as outlined in the Introduction, but extends it by considering that different tasks may impose different fitness costs, an aspect that has been largely overlooked in the existing literature. It is precisely because fitness outcomes are not fully shared among group members in vertebrates that distinguishing these cost structures matters. Unlike eusocial insects with sterile workers, vertebrate helpers can obtain direct fitness benefits, and the model explicitly accounts for these direct benefits—something absent from most insect-inspired approaches even when direct fitness benefits can also arise in some of those systems. Thus, our framework is not simply evolving “two types of helpers doing the same task,” but instead evolving specialization in different cooperative roles that carry different fitness consequences. It is therefore suitable for our model to treat contributions to breeder productivity as a common currency, while allowing individuals to specialize in different cost-distinct forms of help.

      Finally, regarding synergy: with the extension introduced in the previous revision, we now incorporate the requirement that multiple forms of help must be performed for the group to achieve maximal reproductive output. This directly addressed the reviewer’s concern about synergistic dependencies between tasks and aligns our framework with the kinds of complementarity highlighted in other models of division of labor.

      In summary, the structure of the model is consistent with both the theoretical literature on division of labor and the biological realities of vertebrate cooperative systems. We believe it is important for future models to explicitly consider the different fitness benefits and costs associated with distinct cooperative behaviors, and hope that our framework encourages more targeted empirical research on division of labor in vertebrates (e.g. inclusion of data on defense, life-history traits and environmental challenges) to better inform future modelling efforts.

      I should end by saying that these comments don't aim to discourage the authors, who have worked hard to put together a worthwhile model and have patiently attended to my reviews. My hope is that these comments can be helpful to build upon what has been done to address the question posed.

      We appreciate the reviewer’s thoughtful and constructive comments, as well as the time invested in evaluating our work. These insights have greatly helped us improve the clarity and overall quality of the manuscript. We hope that the revisions and additional clarifications we have provided adequately address all remaining concerns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The authors aimed to characterize neurocomputational signals underlying interpersonal guilt and responsibility. Across two studies, one behavioral and one fMRI, participants made risky economic decisions for themselves or for themselves and a partner; they also experienced a condition in which the partners made decisions for themselves and the participant. The authors also assessed momentary happiness intermittently between choices in the task. Briefly, results demonstrated that participants' self-reported happiness decreased after disadvantageous outcomes for themselves and when both they and their partner were affected; this effect was exacerbated when participants were responsible for their partner's low outcome, rather than the opposite, reflecting experienced guilt. Consistent with previous work, BOLD signals in the insula correlated with experienced guilt, and insula-right IFG connectivity was enhanced when participants made risky choices for themselves and safe choices for themselves and a partner.

      Strengths:

      This study implements an interesting approach to investigating guilt and responsibility; the paradigm in particular is well-suited to approach this question, offering participants the chance to make risky v. safe choices that affect both themselves and others. I appreciate the assessment of happiness as a metric for assessing guilt across the different task/outcome conditions, as well as the implementation of both computational models and fMRI.

      We thank Reviewer 1 for their positive assessment of our manuscript.

      Weaknesses:

      In spite of the overall strengths of the study, I think there are a few areas in which the paper fell a bit short and could be improved.

      We thank Reviewer 1 for their comments, which we have used to improve our manuscript. We hope that these changes address the issues raised by the Reviewer.

      (1) While the framing and goal of this study was to investigate guilt and felt responsibility, the task implemented - a risky choice task with social conditions - has been conducted in similar ways in past research that were not addressed here. The novelty of this study would appear to be the additional happiness assessments, but it would be helpful to consider the changes noted in risk-taking behavior in the context of additional studies that have investigated changes in risky economic choice in social contexts (e.g., Arioli et al., 2023 Cerebral Cortex; Fareri et al., 2022 Scientific Reports).

      We certainly agree that several previously published studies have relied on risky choice tasks with social conditions. In this revised version, we now mention these two studies in the substantially revised Introduction.

      (2) The authors note they assessed changes in risk preferences between social and solo conditions in two ways - by calculating a 'risk premium' and then by estimating rho from an expected utility model. I am curious why the authors took both approaches (this did not seem clearly justified, though I apologize if I missed it). Relatedly, in the expected utility approach, the authors report that since 'the number of these types of trials varied across participants', they 'only obtained reliable estimates for [gain and loss] trials in some participants' - in study 1, 22 participants had unreliable estimates and in study 2, 28 participants had unreliable estimates. Because of this, and because the task itself only had 20 gains, 20 losses, and 20 mixed gambles per condition, I wonder if the authors can comment on how interpretable these findings are in the Discussion. Other work investigating loss aversion has implemented larger numbers of trials to mitigate the potential for unreliable estimates (e.g., Sokol-Hessner et al., 2009).

      We agree that we have not clearly justified why we have taken two approaches to assess risk preferences. In short, while the expected utility approach is a more comprehensive method to model a participant’s choices, we had not sufficiently considered the need for the large number of trials required to fit such models when designing our experiment. Calculating the risk premium was the less comprehensive, simpler alternative that we could calculate for all participants. We have now mentioned this fact in the Results section. As the only difference in risk aversion across conditions was found in Study 1 using the expected utility method, which could only be successfully applied in a minority of participants, we believe that this difference should not be taken as a strong finding. We have now mentioned this fact in the revised Discussion.

      (3) One thing seemingly not addressed in the Discussion is the fact that the behavioral effect did not replicate significantly in study 2.

      We agree that we had not sufficiently discussed the fact that there were (slight but significant) differences in risk preferences between the Solo and Social conditions in Study 1 but not in Study 2. We now do so in the revised Discussion, and write the following:

      “Participants made slightly more risk-seeking choices when deciding for themselves than for both themselves and the partner in Study 1, but this difference disappeared in Study 2. The ρ parameter on which this finding in Study 1 is based could only be estimated in a minority of participants due to a relatively low number of trials, which suggests that this finding may not be very reliable. The simpler and more robust method (evaluation of a risk premium) showed no difference in risk aversion across conditions in either study. Overall, we believe that we do not have strong evidence of differences in risk preferences across conditions.”

      (4) Regarding the computational models, the authors suggest that the Reponsibility and Responsibility Redux models provided the best fit, but they are claiming this based on separate metrics (e.g., in study 1, the redux model had the lowest AIC, but the responsibility only model had the highest R^2; additionally, the basic model had the lowest BIC). I am wondering if the authors considered conducting a direct model comparison to statistically compare model fits.

      We agree that we should run formal, direct model comparison tests. We now ran likelihood-ratio tests which showed that the Responsibility model was the best. We now report this in the Results section, just below Table 1:

      “A likelihood ratio test (Equation 9) revealed that the Responsibility model fitted better than all the other models, including the Responsibility Redux model (Study 1: all LR ≥ 47.36, p < 0.0001; Study 2: all LR ≥ 77.83, p < 0.0001).”

      (5) In the reporting of imaging results, the authors report in a univariate analysis that a small cluster in the left anterior insula showed a stronger response to low outcomes for the partner as a result of participant choice rather than from partner choice. It then seems as though the authors performed small volume correction on this cluster to see whether it survived. If that is accurate, then I would suggest that this result be removed because it is not recommended to perform SVC where the volume is defined based on a result from the same whole-brain analysis (i.e., it should be done a priori).

      As indicated in the manuscript, the small insula cluster centered at [-28 24 -4] and shown in Figure 4F survived corrections for multiple tests within the anatomically-defined anterior insula (based on the anatomical maximum probability map described in Faillenot et al., 2017), which is independent of the result of our analysis. Functionally defining the small volume based on the same data would indeed be circular and misleading “double-dipping”. We have most certainly NOT done this. The reason why we selected the anterior insula is because it is one of the regions most frequently associated with guilt (see the explanations in our Introduction, which refers for example to Bastin et al., 2016; Lamm & Singer, 2010; Piretti et al., 2023). Thus we feel that performing small-volume correction within the anatomically-defined anterior insula is a valid analysis. We fully acknowledge that, independently of any correction, the effect and the cluster are small. We now write:

      “We found a weak response in a small cluster within the left anterior insula (peak T = 3.95, d = 0.59, 22 voxels, peak intensity at [-28 24 -4]; Figure 4F). Given the documented association between anterior insula and guilt (see Introduction), we proceeded to test whether this result survived correction for family-wise errors due to multiple comparisons restricted to the left anterior insula gray matter [defined anatomically and thus independently from our findings, as the anterior short gyrus, middle short gyrus, and anterior inferior cortex in an anatomical maximum probability map (Faillenot et al., 2017)]. This correction resulted in a p value of 0.024. This result, although it is only a small effect in a small cluster, is consistent with the mixed model analysis reported earlier.”

      Reviewer #2 (Public review):

      Summary

      This manuscript focuses on the role of social responsibility and guilt in social decision-making by integrating neuroimaging and computational modeling methods. Across two studies, participants completed a lottery task in which they made decisions for themselves or for a social partner. By measuring momentary happiness throughout the task, the authors show that being responsible for a partner's bad lottery outcome leads to decreased happiness compared to trials in which the participant was not responsible for their partner's bad outcome. At the neural level, this guilt effect was reflected in increased neural activity in the anterior insula, and altered functional connectivity between the insula and the inferior frontal gyrus. Using computational modeling, the authors show that trial-by-trial fluctuations in happiness were successfully captured by a model including participant and partner rewards and prediction errors (a 'responsibility' model), and model-based neuroimaging analyses suggested that prediction errors for the partner were tracked by the superior temporal sulcus. Taken together, these findings suggest that responsibility and interpersonal guilt influence social decision-making.

      Strengths

      This manuscript investigates the concept of guilt in social decision-making through both statistical and computational modeling. It integrates behavioral and neural data, providing a more comprehensive understanding of the psychological mechanisms. For the behavioral results, data from two different studies is included, and although minor differences are found between the two studies, the main findings remain consistent. The authors share all their code and materials, leading to transparency and reproducibility of their methods.

      The manuscript is well-grounded in prior work. The task design is inspired by a large body of previous work on social decision-making and includes the necessary conditions to support their claims (i.e., Solo, Social, and Partner conditions). The computational models used in this study are inspired by previous work and build on well-established economic theories of decision-making. The research question and hypotheses clearly extend previous findings, and the more traditional univariate results align with prior work.

      The authors conducted extensive analyses, as supported by the inclusion of different linear models and computational models described in the supplemental materials. Psychological concepts like risk preferences are defined and tested in different ways, and different types of analyses (e.g., univariate and multivariate neuroimaging analyses) are used to try to answer the research questions. The inclusion and comparison of different computational models provide compelling support for the claim that partner prediction errors indeed influence task behavior, as illustrated by the multiple model comparison metrics and the good model recovery.

      We thank the reviewer very much for their comprehensive description of our study and the positive assessment of our study and approach.

      Weaknesses

      As the authors already note, they did not directly ask participants to report their feelings of guilt. The decrease in happiness reported after a bad choice for a partner might thus be something else than guilt, for example, empathy or feelings of failure (not necessarily related to guilt towards the other person). Although the patterns of neural activity evoked during the task match with previously found patterns of guilt, there is no direct measure of guilt included in the task. This warrants caution in the interpretation of these findings as guilt per see.

      We fully agree that not directly asking participants about feelings of guilt is a clear limitation of our study. While we already mention this in our Discussion, we have expanded our discussion of the consequences on the interpretation of our results along the lines described by the reviewer in the revised manuscript. We would like to thank the reviewer for proposing these lines of thought, and have now made the following changes to the text:

      In the first paragraph of the discussion, we now write: “Being responsible for choosing a lottery that yielded a low outcome for a partner made our participants feel worse than witnessing the same outcome resulting from their partner’s choice, which we interpret as interpersonal guilt; although we note that we have not asked participants specifically about which emotion they felt in these situations.

      Later on, in the third paragraph focusing on the anterior insula, we now write: “This replicates a large body of evidence associating aIns with feelings of guilt evoked during social decisions (see Introduction). Because we have neither asked our participants specifically what they felt in these situations, nor specifically whether they experienced guilt, we cannot exclude the possibility that they have instead or in addition felt empathy for their partner, a feeling of failure or bad luck, or some other emotion.”

      As most comparisons contrast the social condition (making the decision for your partner) against either the partner condition (watching your partner make their decision) or the solo condition (making your own decision), an open question remains of how agency influences momentary happiness, independent of potential guilt. Other open questions relate to individual differences in interpersonal guilt, and how those might influence behavior.

      How agency influences momentary happiness or variations thereof during the course of an experiment such as ours is an interesting question in itself. We now ran linear mixed models assessing agency (i.e. we compared happiness in conditions Solo & Social conditions vs. Partner condition), which revealed lower happiness in Solo and Social conditions (i.e. when it was the participant’s turn to decide) in both studies. This is interesting in itself and may reflect the drive behind responsibility aversion reported by Edelson et al.’s 2018 study: being assigned the role of the decider in a social setting may make people slightly unhappy, perhaps due to “weight of the responsibility”. We now report these findings in the Results section, including this proposed explanation; because we were not specifically interested in responsibility aversion, we do not discuss this further in the Discussion. The edited text is under the new subsection entitled ‘Momentary happiness: effects of agency, responsibility and guilt’, on page 12:

      “Next, we assessed whether happiness varied depending on the participant’s agency (Social + Solo vs. Partner), and found happiness to be lower when the participant chose, independent of the outcome (Study 1: t(3600) = -3.92, p = 0.00009, β = -0.14, 95% CI = [-0.20 -0.07]; Study 2: t(2870) = -6.07, p = 0.000000001, β = -0.24, 95% CI = [-0.31 -0.16]). . This is interesting in itself and may reflect the drive behind responsibility aversion reported by Edelson et al.’s 2018 study: being assigned the role of the decider in a social setting may make people slightly unhappy, perhaps due to “weight of the responsibility”. To specifically search for a sign of interpersonal guilt, [...]”

      Regarding individual differences: this is a very interesting topic that we have not addressed here due to the (relatively) small number of participants in our studies, but we might consider this for future follow-up studies, which we mention in the Discussion paragraph regarding open questions.

      This manuscript is an impressive combination of multiple approaches, but how these different approaches relate to each other and how they can aid in answering slightly different questions is not very clearly described. The authors could improve this by more clearly describing the different methods and their added value in the introduction, and/or by including a paragraph on implications, open questions, and future work in the discussion.

      We thank the reviewer for their appreciation of our complementary approach, and agree that we had not sufficiently explained the reasons why we used several methods. We have now added a paragraph explaining this at the end of the Introduction (page 5):

      “We analysed our behavioural data using several complementary methods: choices were modelled with mixed-effects regressions serving as manipulation checks; risk preferences expressed in choices were assessed using a comprehensive expected utility model as well as with a simpler, more robust “risk premium” approach; and happiness data were fitted, in addition to the computational models, with several linear mixed models to assess the impact of both the participant’s and their partner’s rewards, the impact of agency and their interactions. Inspired by findings reported in previous neuroimaging of social emotions, we also used several methods to analyse our fMRI data, including conventional methods (both region-of-interest and mass univariate); mixed-effects regression models; computational model-based analyses (inspired by e.g. Konovalov et al., 2021; Rutledge et al., 2014); and functional connectivity (e.g. Edelson et al., 2018; Konovalov et al., 2021). The behavioural modelling is thus complemented by neuroimaging analyses that offer insight about both the activity in regions associated with guilt as well as their place in a wider network, providing an in-depth comprehensive analysis of the mechanisms behind guilt evoked by social responsibility.”

      In addition, as suggested we added the following paragraph on open questions and future work in the Discussion:

      “Several open questions remain at the end of this study. As discussed above, asking participants directly about which emotions they have felt during the different stages of this task would allow us to link subjective experience with our analytical measures. Testing more participants would allow us to assess the impact of inter-individual variations in personality traits on the experience as well as the behavioural and neural correlates of guilt and responsibility. Using more trials in the experiment would allow separate modelling of risk preferences in gain and loss trials in each experimental condition using expected utility models, and could allow testing whether changes in momentary happiness affect subsequent choices. Varying partner identities (friends, strangers, artificial agent) could reveal the impact of social discounting on guilt and responsibility. In sum, we believe that this experimental approach lends itself very well to the study of several aspects of social emotions.”

      However, taken together, this study provides useful insights into the neural and behavioral mechanisms of responsibility and guilt in social decision-making and how they influence behavior. 

      We thank the reviewer again for their appreciation of our work and hope that our revisions improved the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The majority of my suggestions are in the public review, so I will not repeat them here. But in general, I like the paper, and in addition to my other comments, I think that there should be more discussion of the potential limitations of the study and conclusions that can be drawn. I also thought parts of the results were a little hard to follow, particularly in the 'momentary happiness' section. Perhaps an additional subsection here might help with flow.

      We agree that we could have discussed further the limitations of our study and the conclusions that can be drawn from it, which we have now done in the last paragraphs of the Discussion in this revised version.

      To improve the structure of the section on ‘momentary happiness’, we separated this section into two, entitled: ‘Momentary happiness: links to reward‘ and ‘Momentary happiness: effects of agency, responsibility and guilt’, which should facilitate the reading of this long section. We proceeded in a similar manner for the Choices section, which is now subdivided into ‘Choices: manipulation check’ and ‘Choices: risk preferences’. We believe that these changes have indeed improved the readability of our manuscript.

      Reviewer #2 (Recommendations for the authors):

      Overall, I believe this manuscript was well-designed, consists of extensive analyses, and provides interesting new insights into the mechanisms underlying social decision-making. I mostly have some clarifying questions and minor comments, which are described below. 

      (1) Integration of prior findings in the first paragraphs of the Introduction. Although all the previous work described in the 2nd-5th paragraph introduction is interesting, it felt a bit like an enumeration of findings rather than an integrated introduction leading to the current research question. At the end of paragraph 5, it becomes clear how these findings relate to the current research question, but I believe it will improve the flow and readability of the introduction if this becomes clear earlier on.

      We agree that we could have integrated the cited previous work into the Introduction so that the text builds up to the research question. We have now extensively reworked several paragraphs in the Introduction (pages 3-5) and hope that these changes have made it easier to follow.

      (2) For the risk attitudes (Choices), you describe pooling the gains and losses and then comparing the social and solo conditions. I was wondering whether you also looked at potential differences between gains and losses (delta measure) for social versus the solo condition (so a comparison of the delta). Based on prior work, I can imagine that the difference in risk attitudes for gains and losses might differ when making decisions for yourself versus when you're doing it for a partner. In general, I was wondering how you explain these findings, as there is also a lot of work showing differences in risk-taking patterns for gains and losses.

      We agree that we could have compared delta measures between solo and social conditions. However, as we describe in the Results section and comment on in the Discussion, the relatively low number of trials made separate fitting of gain and loss trials across conditions difficult. While this question could thus be addressed in subsequent versions of our experiment with more trials, such a fine-grained analysis of the decisions was not the focus of our current study.

      (3) On page 11, you state: "in particular the partner's reward prediction errors resulting from the participants' decisions, i.e. those pRPE for which participants were responsible." From the results described in the paragraph above, this doesn't become clear (e.g., there's no distinction made between social_pRPE and partner_pRPE in the text), as it only discusses differences in weights between pRPE and sRPE. I would recommend including some more information in the main text on these main modeling findings, so one doesn't have to go to the Supplemental Materials to understand them.

      We did indeed fail to report these findings in the text! We thank the reviewer for pointing this out. We have now edited this passage as follows:

      “Crucially, we find here that the partner’s reward prediction errors (social_pRPE and partner_pRPE) contributed to explaining changes in participants’ momentary happiness: the Responsibility and ResponsibilityRedux models explained the data better than the models without these parameters (see Table 1). In particular, the partner’s reward prediction errors resulting from the participants’ decisions (social_pRPE), i.e. those pRPE for which participants were responsible, contributed to explaining our data (weights for social_pRPE were greater than 0: Responsibility model: Study 1: Z = 2.85, p = 0.004, Study 2: Z = 3.26, p = 0.001; Responsibility Redux model: Study 1: Z = 2.93, p = 0.003, Study 2: Z = 3.30, p = 0.001; weights for social_pRPE tended to be higher than weights for partner_pRPE: Responsibility model: Study 1: Z = 2.14, p = 0.033; Study 2: Z = 1.41, p = 0.16).”

      (4) The functional connectivity findings seem to come out of nowhere and are not introduced or described anywhere prior in the manuscript. It is therefore not completely clear why you conducted these analyses, or what they add above and beyond previous analyses. Already introducing this method earlier on would fix that.

      We agree that we could have introduced functional connectivity analyses earlier in the text, particularly given the many previous studies in our field using this technique. We have now done this at the end of a new last paragraph of the Introduction:

      “Inspired by findings reported in previous neuroimaging of social emotions, we also used several methods to analyse our fMRI data, including conventional methods (both region-of-interest and mass univariate); mixed-effects regression models; computational model-based analyses (inspired by e.g. Konovalov et al., 2021; Rutledge et al., 2014); and functional connectivity (e.g. Edelson et al., 2018; Konovalov et al., 2021). The behavioural modelling is thus complemented by neuroimaging analyses that offer insight about both the activity in regions associated with guilt as well as their place in a wider network, providing an in-depth comprehensive analysis of the mechanisms behind guilt evoked by social responsibility.”

      (5) For the functional connectivity findings: I was wondering why you only looked at the choice phase, and not at the feedback phase. I understand that previous work focused on the choice phase, but for the purpose of this study (focus on guilt), I can imagine it is also interesting to see what happens with feedback. In the discussion, you also state "How we feel when we witness our decisions' consequences on others is an important signal to consider when attempting to make good social decisions." (p. 19), which is more focused on the feedback rather than choice, and also supports the idea that looking at the feedback moment might be relevant.

      We agree that we could also have looked at the functional connectivity during the feedback phase. The main reason why we had originally not done so was time constraints. At the current time we would in addition point out that the manuscript is already very long and contains many analyses of behavioural and fMRI data. Adding this analysis would cost additional time and would further delay the publication of our manuscript, which we would prefer to avoid. However, one could of course look at these effects in subsequent analyses of the same data or in subsequent versions of this experiment. We have now mentioned this in the Discussion, in the paragraphs on open questions.

      Minor comments:

      (1) For some of the Figures, it would be helpful if the subtitles were more informative. For Figure 2 and Figure 3 for example, it would be nice if Study 1 and Study 2 were not only mentioned in the figure description but also in the actual figure. For Figures 3 and 4, it would be helpful to have significance stars for the bar plots as well.

      We agree that these changes make the figures more easily understandable and have implemented them all, except for adding stars on Figure 4, because all bar plots in panels C and E would have been labeled with two or more stars, which would have made the figure difficult to read. We have now mentioned the fact that all these coefficients were significant in the figure legend.

      (2) For some of the Supplementary Results, it would be very helpful if there was a legend or description. This is already the case for most of the SR, but not for all.

      We have now added a legend to all elements of the Supplementary Results.

      Some questions that came to mind while going through them:

      - Supplementary Table 1: which p-values correspond to the significance stars? This information is included for Supplementary Table 2, but not for ST1. 

      We have now added the missing information in ST1.

      - Supplementary Figure 1: do the colors correspond to different participants? 

      We have now specified that the colors do indeed correspond to different participants.

      - Supplementary Table 5 (final table): what do the - represent? As in, why is there no value for "run" for the MPFC? At first, I thought you only included the significant values, but then I noticed a few non-significant values as well, so it wasn't completely clear to me why some of the values were missing. This also applies to Supplementary Table 6.

      We have indeed forgotten to explain this. The ‘-’ in Supplementary Tables 4 and 6 indicate that the linear mixed model without the factor ‘run’ was the better-fitting one. We have now added the following explanation in the text accompanying Supplementary Table 4:

      “We tested these models both with and without the factor Run and associated interaction, and we report the best-fitting model in the table below: a dash (‘-’) in the row displaying parameters for the run and socialVsSolo:run regressors indicates that the model without factor run was better-fitting for this ROI.”

      (3) I came across a few minor typos or sentences that were not completely clear to me.

      - On page 3: "Patients with damage to ventromedial prefrontal cortex (vmPFC) seem insensitive to guilt when playing social economic games (Krajbich et al., 2009)." This sentence felt a bit out of nowhere and doesn't logically follow from the previous sentences. 

      We have now revised the descriptions of this previous study as well as several others and how they fit into the research question.

      - On page 3: "In another study, participant errors in a difficult perception task lead to a partner feeling pain and evoked activations in left aIns and dlPFC (Koban et al., 2013)." This sentence doesn't really flow, and from the wording, it is not completely clear whether it's the errors or the partner pain that led to the aIns and dlPFC activation.

      We have now revised the description of this study as well, as follows:

      “In another study, partners received painful stimuli when participants made errors during a difficult perception task. These errors evoked activations in the left aIns and dlPFC in the participants (Koban et al., 2013).”

      - Supplementary Figure 1: there is a missing period after the sentence "We then compared these new estimated parameters to the actual parameters from which the synthetic data were generated"

      We have now added a missing comma after “generated”.

      - On page 5: "We ran two experiments, Study 1 outside fMRI and Study 2 during fMRI, with separate groups of participants." I would change "outside fMRI" to outside the MRI scanner or something like that, as it's not completely correct to say "outside fMRI".

      We have changed the sentence to “outside the MRI scanner”.

      - On page 6: for the first result, there are currently two p-values reported (p < 2.5e-20 and p < 2e-16). I believe this is an error?

      This was indeed an error! We have re-run this analysis, noticed that also the degrees of freedom were miscalculated, and have updated this result and the effect of condition (solo vs social). Results are almost identical as previously and all conclusions hold. We have also checked the other analyses reported in this paragraph – all results replicate exactly.

      - On page 6: "Supplemental Table 1" should be "Supplementary Table 1" (for consistency).

      Done.

      On page 8: "participants in both conditions of both studies", I would change "of both studies" to "for both studies".

      Done.

      On page 8: for the "Momentary Happiness" paragraph, it would be helpful if you could briefly describe the Rutledge method here, for people who are unfamiliar with the approach.

      We now write the following at the beginning of this paragraph:

      “Following Rutledge and colleagues’ methodology, which considers that changes in momentary happiness in response to outcomes of a probabilistic reward task are explained by the combined influence of recent reward expectations and prediction errors arising from those expectations, we fitted computational models to each participant’s happiness data.”

      On page 10: "Wilkoxon sign-rank tests", should be "Wilcoxon".

      Done.

      We thank the reviewer for their careful reading of our manuscript. We believe that these changes have indeed improved our manuscript.

    1. Author response:

      We thank the reviewers for their thoughtful and constructive comments, which greatly helped us to clarify, quantify, and strengthen both our findings and interpretations. Below, we provide a point-by-point response to each comment and describe the corresponding changes made.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Rayan et al. aims to elucidate the role of RNA as a context-dependent modulator of liquid-liquid phase separation (LLPS), aggregation, and bioactivity of the amyloidogenic peptides PSMα3 and LL-37, motivated by their structural and functional similarities.

      Strengths:

      The authors combine extensive biophysical characterization with cell-based assays to investigate how RNA differentially regulates peptide aggregation states and associated cytotoxic and antimicrobial functions.

      Weaknesses:

      While the study addresses an interesting and timely question with potentially broad implications for host-pathogen interactions and amyloid biology, several aspects of the experimental design and data analysis require further clarification and strengthening.

      Major Comments:

      (1) In Figure 1A, the author showed "stronger binding affinity" based on shifts at lower peptide concentrations, but no quantitative binding parameters (e.g., apparent Kd, fraction bound, or densitometric analysis) are presented. This claim would be better supported by including: (i) A binding curve with quantification of free vs bound RNA band intensities ,(ii) Replicates and error estimates (mean {plus minus} SD).

      We thank the reviewer for this suggestion. To quantitatively support the binding differences observed in Figure 1A, we have now performed densitometric analysis of the EMSA data and included the results in Figure S1. The analysis showed that the Kd for PSMα3 binding to polyAU and polyA RNA is in the same order of magnitude but lower for the polyAU, indicating a stronger binding. A description was added to the results in lines 137-145 of the revised version.

      (2) The authors report droplet formation at low RNA (50 ng/µL) but protein aggregation at high RNA (400 ng/µL) through fluorescence microscopy. However, no intermediate RNA concentrations (e.g., 100-300 ng/µL) are tested or discussed, leaving a critical gap in understanding the full phase diagram and transition mechanisms.

      Our initial choice of 50 ng/µL (low RNA) and 400 ng/µL (high RNA) was guided by a broader RNA titration performed by turbidity measurements across 0, 10, 20, 50, 100, 200, and 400 ng/µL (Figure S2 in the revised version). In this screen, turbidity increased up to 50 ng/µL and then decreased dose-dependently from 100–400 ng/µL. We interpret this non-monotonic behavior as consistent with a transition from a dropletrich regime (maximal light scattering at intermediate dense-phase volume) toward conditions where assemblies become larger and/or more compact and sediment out of the optical path. This is described in lines 158-161 of the revised version.

      Of note, additional intermediate RNA conditions (100 and 200 ng/µL) are included in Figure S14 (of the revised version). While these experiments were performed under the heat-shock perturbation, they nevertheless support the central point that RNA tunes assembly state across intermediate concentrations rather than producing a binary low/high outcome.

      Importantly, we agree with the reviewer that a full phase diagram would be the most rigorous way to define the transition mechanism. However, establishing csat and constructing a complete phase diagram would require systematic measurements of dilute-phase concentrations (e.g., centrifugation/quantification or fluorescence calibration), controlled ionic strength titrations, and time-resolved mapping, which is beyond the scope of the present study. We have therefore revised the text to avoid implying that we provide a complete phase diagram. Instead, we frame our results as a qualitative with multi-assay characterization showing that RNA concentration drives a shift from liquid-like condensates (at low RNA) toward solid-like assemblies (at high RNA), with an intermediate regime suggested by the turbidity transition and supported by additional imaging under stress. Finally, to address the “critical gap” concern directly, we add a sentence (lines 239-241) stating that: “Future work will be required to quantitatively define the phase boundaries and delineate the dominant mechanisms, such as sedimentation, dissolution, or coarsening/aging, across intermediate RNA concentrations.

      (3) Additionally, the behaviour of PSMα3 in the absence of RNA under LLPS conditions is not shown. Without protein-only data, it is difficult to assess if droplets are RNA-induced or if protein has a weak baseline LLPS that RNA tunes. The saturation concentration (csat) for PSMα3 phase separation, either in the absence or presence of RNA, should be reported.

      In response to the reviewer’s request, we have added Figure 2F, which shows PSMα3 alone in the absence of RNA under the same conditions. PSMα3 does not form droplets in this condition, indicating that condensate formation is RNA-dependent in the tested conditions. This is referred to in the text in lines 190-193 of the revised version. Please see our response about determining the csat in the response to the previous comment.

      (4) For a convincing LLPS claim, it is important to show: Quantitative FRAP curves (mobile fraction and half-time of recovery) rather than only microscopy images and qualitative statements.

      We have included quantitative FRAP analysis in Figure S4 of the revised version, showing normalized recovery curves along with extracted mobile fractions and half-times of recovery (t₁/₂). These quantitative measurements support the dynamic nature of the PSMα3–RNA. This is referred to in the text in lines 179-184 of the revised version.

      (5) The manuscript highly relies on fluorescence microscopy to show colocalization. However, the colocalization is presented in a qualitative manner only. The manuscript would benefit from the inclusion of quantitative metrics (e.g., Pearson's correlation coefficient, Manders' overlap coefficients, or intensity correlation analysis).

      In response, we have added quantitative colocalization analysis to the revised manuscript. Specifically, we now report Pearson’s correlation coefficients and Manders’ overlap coefficients for the dual-channel fluorescence microscopy datasets in Figure S5 of the revised version. These metrics provide an objective measure of codistribution and complement the qualitative imaging.

      The analysis supports that at low RNA concentrations (droplet/condensate conditions), PSMα3 and RNA show strong colocalization, consistent with RNA being incorporated within, or closely associated with, the peptide-rich phase. In contrast, at high RNA concentrations, where the assemblies are more solid-like/amyloid-positive, the quantitative coefficients decrease, consistent with reduced overlap and an apparent spatial demixing in which RNA becomes partially excluded from the peptide-rich structures. This is referred to in the text in lines 194-203 of the revised version.

      (6) In Figures 3 B and 3C, the contrast between "no AT630 at 30 min, strong at 2 h" (50 ng/μL) and "strong at 30 min" (400 ng/μL) is compelling, but a simple quantification (e.g., mean fluorescence intensity per area) would greatly increase rigor.

      We have included quantitative analysis of AmyTracker630 fluorescence intensity in Figure S6 of the revised version, reporting the mean fluorescence intensity per area for the indicated conditions and time points. This quantification supports the qualitative differences observed in Figures 3B and 3C. This is now referred to in the text in lines 233-236 of the revised version.

      (7) In Figure S3 ssCD data, if possible, indicate whether the α-helical signal increases with RNA concentration or shows a non-linear dependence, which might link to the LLPS vs solid aggregate regimes.

      The ssCD spectra displayed in Figure S7 in the revised version (corresponding to Figure S3 in the original submission) show that the α-helical signature of PSMα3 is markedly enhanced in the presence of RNA compared to peptide alone, as evidenced by increased signal intensity, deeper minima, and more pronounced spectral features characteristic of α-helical structure. Importantly, this enhancement is more pronounced at 400 ng/µL Poly(AU) RNA than at 50 ng/µL, particularly after 2 hours of coincubation, indicating that RNA concentration influences the stabilization of α-helical assemblies. This is now more specifically detailed in the text in lines 258-263 of the revised version.

      We note that solid-state CD does not allow direct quantitative deconvolution of secondary structure content (e.g., % helix) in the same manner as solution CD, due to sample anisotropy, scattering, and orientation effects inherent to dried or aggregated films. Consequently, our interpretation is qualitative rather than strictly quantitative. The ssCD data therefore suggest a non-linear dependence on RNA concentration, rather than a simple linear dose–response. This is also expected considering that phase transition, suggested by the other findings, is intrinsically non-linear.

      (8) In Figure 5B, FRAP recovery in dying cells may reflect artifactual mobility rather than biological relevance. Additionally, the absence of quantification data limits interpretation; providing recovery curves would clarify relevance.

      We added quantitative FRAP analysis of the effect on PSMα3 within HeLa cells, shown in Figure S8 of the revised version. Compared to PSMα3 assemblies in vitro, nucleolar PSMα3 exhibits slower fluorescence recovery and a reduced mobile fraction. The nucleolus represents a highly crowded, RNA-rich cellular environment, which is expected to impose additional constraints on molecular mobility and likely contributes to the slower recovery kinetics observed in cells. This is now more specifically detailed in the text in lines 324-333 and discussed in lines 597-607 of the revised version.

      (9) The narrative conflates cytotoxicity endpoints (membrane damage, PI staining, aggregates) with localization data (nucleolar foci), creating ambiguity about whether nucleolar targeting drives toxicity or is a consequence of cell death. Separating toxicity assessment from localization analysis, or clearly demonstrating that nucleolar accumulation precedes cytotoxicity, would resolve this ambiguity.

      We thank the reviewer for raising this important point. We agree that, in the current dataset, cytotoxicity readouts (membrane damage, PI staining, aggregate formation) and subcellular localization (nucleolar accumulation) are observed in close temporal proximity, which limits our ability to unambiguously assign causality. In the experiments presented here, PSMα3 was applied at concentrations known to induce rapid membrane disruption and cytotoxicity in HeLa cells. Under these conditions, PSMα3 accumulates on cellular membranes and penetrates into the cell and nucleus on very short timescales (seconds to minutes), likely preceding the temporal resolution accessible by standard live-cell fluorescence microscopy. As a result, nucleolar accumulation and cytotoxic endpoints are detected essentially concurrently, precluding a definitive determination of whether nucleolar association actively drives toxicity or occurs as a downstream consequence of membrane permeabilization and cell damage.

      We therefore emphasize that, in this study, nucleolar localization is presented as a phenomenological observation consistent with RNA-rich compartment association, rather than as a demonstrated causal mechanism of cytotoxicity. We have revised the Discussion (lines 597-607) to clarify this distinction and to avoid implying that nucleolar targeting is the primary driver of cell death.

      We agree that resolving this ambiguity would require systematic time-resolved and concentration-dependent experiments, including analysis at sub-toxic PSMα3 concentrations below the membrane-disruptive threshold, combined with orthogonal imaging approaches. Such experiments are planned for future work but are beyond the scope of the present study.

      (10) In Figure 8, to strengthen the LLPS assignment for LL-37, additional evidence, such as FRAP analysis or observation of droplet fusion events, would be valuable. This is particularly relevant given that the heat shock conditions (65 °C for 15 minutes) could potentially induce partial denaturation or nonspecific coacervation.

      In response to this comment, we have added FRAP analysis of LL-37 assemblies in the revised manuscript (Figure S12), including representative images and corresponding fluorescence recovery curves. The FRAP measurements show minimal fluorescence recovery over the acquisition window, indicating that the LL-37–RNA assemblies formed under these conditions are largely immobile and solid-like, rather than liquid-like droplets. This is now referred to in the text in lines 458-462 of the revised version.

      Reviewer #2 (Public review):

      In this paper, Rayan et al. report that RNA influences cytotoxic activity of the staphylococcal secreted peptide cytolysin PSMalpha3 versus human cells and E. coli by impacting its aggregation. The authors used sophisticated methods of structural analysis and described the associated liquid-liquid phase separation. They also compare the influence of RNA on the aggregation and activity of LL-37, which shows differences from that on PSMalpha3.

      Strengths:

      That RNA impacts PSM cytotoxicity when co-incubated in vitro becomes clear.

      Weaknesses:

      I have two major and fundamental problems with this study:

      (1) The premise, as stated in the introduction and elsewhere, that PSMalpha3 amyloids are biologically functional, is highly debatable and has never been conclusively substantiated. The property that matters most for the present study, cytotoxicity, is generally attributed to PSM monomers, not amyloids. The likely erroneous notion that PSM amyloids are the predominant cytotoxic form is derived from an earlier study by the authors that has described a specific amyloid structure of aggregated PSMalpha3. Other authors have later produced evidence that, quite unsurprisingly, indicated that aggregation into amyloids decreases, rather than increases, PSM cytotoxicity. Unfortunately, yet other groups have, in the meantime, published in-vitro studies on "functional amyloids" by PSMs without critically challenging the concept of PSM amyloid "functionality". Of note, the authors' own data in the present study, which show strongly decreased cytotoxicity of PSMalpha3 after prolonged incubation, are in agreement with monomer-associated cytotoxicity as they can be easily explained by the removal of biologically active monomers from the solution.

      We thank the reviewer for this important critique and agree that direct cytotoxicity is most plausibly mediated by soluble PSM species, while extensive fibrillation generally reduces toxicity by depleting these forms, a conclusion supported by our data and by other studies (e.g., Zheng et al 2018 and Yao et al 2019). We do not propose mature amyloid fibrils as the primary toxic entities. Rather, we use the term functional amyloid in a regulatory sense, consistent with other biological amyloids whose fibrillar states modulate activity (e.g., hormone storage amyloids or RNA-binding proteins).

      In line with emerging findings, we interpret PSMα3 toxicity as arising from a dynamic assembly process rather than from a single static molecular species. We previously showed that PSMα3 forms cross-α fibrils that are thermodynamically and mechanically less stable than cross-β amyloids and readily disassemble upon heat stress, fully restoring cytotoxic activity (Rayan et al., 2023). This behavior contrasts with PSMα1, which forms highly stable cross-β fibrils that do not recover activity after heat shock, suggesting that the limited thermostability of PSMα3 is an evolved feature enabling reversible switching between inactive (stored) and active states.

      Consistent with this view, both PSMα1 and PSMα3 are cytotoxic in their soluble states, yet mutants unable to fibrillate lose activity, indicating that fibrillation is required but not itself the toxic end state (Tayeb-Fligelman et al., 2017, 2020; Malishev et al., 2018). Our other studies further show that cytotoxicity toward human cells correlates with inherent or lipid-induced α-helical assemblies, rather than with inert β-sheet amyloids (RagonisBachar et al., 2022, 2026; Salinas 2020, Bücker 2022). Together, these findings support a model in which membrane-associated, dynamic α-helical assembly, which requires continuous exchange between soluble species and growing fibrils, drives membrane disruption, potentially through lipid recruitment or extraction, analogous to mechanisms proposed for human amyloids such as islet amyloid polypeptide (Sparr et al., 2004).

      In the present study, we further show that RNA reshapes this dynamic landscape: while PSMα3 alone progressively loses activity upon incubation, co-incubation with RNA preserves cytotoxicity by stabilizing bioactive polymorphs and condensate-like states, whereas high RNA concentrations promote solid aggregation but nevertheless preserve activity. Thus, aggregation is neither inherently functional nor toxic, but context-dependent and environmentally regulated. Taken together, our data support a model in which PSMα3 amyloids act as a dynamic reservoir, enabling S. aureus to tune virulence by reversibly shifting between dormant and active states in response to environmental cues such as heat or RNA.

      This is now discussed in lines 56-76 and 523-553 of the revised version.

      (2) That RNA may interfere with PSM aggregation and influence activity is not very surprising, given that PSM attachment to nucleic acids - while not studied in as much detail as here - has been described. Importantly, it does not become clear whether this effect has biologically significant consequences beyond influencing, again not surprisingly, cytotoxicity in vitro. The authors do show in nice microscopic analyses that labeled PSMalpha3 attaches to nuclei when incubated with HeLa cells. However, given that the cells are killed rapidly by membrane perturbation by the applied PSM concentrations, it remains unclear and untested whether the attachment to nucleic acids in dying cells makes any contribution to PSM-induced cell death or has any other biological significance.

      We thank the reviewer for this important point and agree that PSM–nucleic acid interactions are not unexpected and that our data do not support a direct intracellular role for RNA binding in mediating cytotoxicity. Accordingly, we do not propose nucleolar or nuclear association of PSMα3 as a causal mechanism of cell death. At the concentrations used, PSMα3 induces rapid membrane disruption, and nucleic acid association is observed along with membrane attachment, precluding conclusions about intracellular function. This limitation is now explicitly clarified in the revised manuscript. The biological significance of our findings lies instead in extracellular and environmental contexts, where PSMα3 encounters abundant nucleic acids, such as RNA or DNA released from damaged host cells or present in biofilms as now addressed in lines 622631. Our data show that RNA modulates PSMα3 aggregation trajectories, shifting the balance between liquid-like condensates and solid aggregates, and thereby regulates the persistence and timing of cytotoxic activity. In this framework, RNA acts as a context-dependent regulator of virulence, rather than as an intracellular cytotoxic cofactor, an aspect which would be studied in depth in future work. This is now addressed in the text in lines 597-607 of the revised version.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Rayan et al. aims to investigate the role of RNA in modulating both virulent amyloid and host-defense peptides, with the objective of understanding their self-assembly mechanisms, morphological features, and aggregation pathways.

      Strengths:

      The overall content is well-structured with a logical flow of ideas that effectively conveys the research objectives.

      Weaknesses:

      (1) Figure 2 displays representative FRAP images demonstrating fluorescence recovery within seconds. To gain a more comprehensive understanding of how recovery after photobleaching varies under different conditions, it is recommended to supplement these images with corresponding quantitative fluorescence recovery curves for analysis.

      In response to this comment, we have supplemented the representative FRAP images with quantitative fluorescence recovery curves, reporting normalized recovery kinetics for the indicated conditions. These data are now provided in Figure S4 of the revised manuscript, allowing direct comparison of recovery behavior across conditions (shown by microscopy in Figure 2). In addition, we have included quantitative FRAP analyses for the cellular imaging shown in Figure 5 (presented in Figure S8) and for LL-37 assemblies formed under heat-shock conditions (Figure S12). Together, these additions provide a quantitative framework for interpreting the FRAP results and strengthen the distinction between liquid-like and solid-like assembly states.

      (2) Ostwald ripening typically leads to the shrinkage or even disappearance of smaller droplets, accompanied by the further growth of large droplets. However, the droplet size in Figure 2D decreases significantly after 2 h of incubation. This observation prompts the question, what is the driving force underlying RNA-regulated phase separation and phase transition?

      We thank the reviewer for this observation. Across multiple samples, we consistently observe a coexistence of small droplets and larger aggregates, rather than systematic growth of larger droplets at the expense of smaller ones or a uniform decrease in droplet size. In addition, the timescales examined do not allow us to reliably assess whether diffusion-driven droplet coalescence is fast enough to draw firm conclusions about droplet size evolution. This is now addressed in the text in lines 181-184 of the revised version.

      A decrease in droplet size over time is nevertheless observed in some instances and is more consistent with a time-dependent conversion of initially liquid-like condensates into more solid-like assemblies, which would reduce molecular mobility and suppress droplet coalescence. In parallel, progressive fibril formation may act as a sink for soluble peptide, leading to partial dissolution or shrinkage of less mature condensates. Together, these observations are consistent with a non-equilibrium aging process, in which RNAregulated assemblies evolve from dynamic condensates toward more solid structures rather than following equilibrium Ostwald ripening.

      (3) The manuscript aims to study the role of RNA in modulating PSMα3 aggregation by using solution-state NMR to obtain residue-specific structural information. The current NMR data, as described in the method and figure captions, were recorded in the absence of RNA. Whether RNA binding induces conformational changes of PSMα3, and how these changes alter the NMR spectra? Also, the sequential NOE walk between neighboring residues can be annotated on the spectrum for clarity.

      The solution-state NMR experiments were performed specifically to characterize the potential binding of EGCG to PSMα3. Due to the strong tendency of PSMα3 to undergo rapid aggregation and line broadening upon RNA addition, solutionstate NMR spectra in the presence of RNA could not be obtained at sufficient quality for residue-specific analysis. As suggested, we have updated and annotated the sequential NOE walk between neighboring residues on the relevant NOESY spectra to improve clarity.

      (4) The authors claim that LL-37 shares functional, sequence, and structural similarities with PSMα3. However, no droplet formation was observed of LL-37 in the presence of RNA only. The authors then applied thermal stress to induce phase separation of LL-37. What are the main factors contributing to the different phase behaviors exhibited by LL37 and PSMα3? What are the differences in the conformation of amyloid aggregates and the kinetics of aggregation between the condensation-induced aggregation in the presence of RNA and the conventional nucleation-elongation process in the absence of RNA for these two proteins?”

      We appreciate this important question and have clarified both the basis of the comparison and the origin of the divergent phase behaviors of LL-37 and PSMα3. While PSMα3 and LL-37 share key properties as short, cationic, amphipathic α-helical peptides that self-assemble and interact with nucleic acids, they differ fundamentally in their assembly architectures. PSMα3 is an amyloidogenic peptide that forms cross-α amyloid fibrils, in which α-helices stack perpendicular to the fibril axis. In contrast, LL-37 can form fibrillar or sheet-like assemblies (observed in cryo grids), but these lack canonical amyloid features without clear cross-α or cross-β amyloid order, as so far observed by crystal structures. This is now clarified in different parts of the text of the revised version. Thus, the comparison between the two peptides is functional and physicochemical rather than implying identical amyloid mechanisms. These structural differences likely underlie their distinct phase behaviors.

      Because LL-37 does not follow a classical amyloid nucleation–elongation pathway, and high-resolution structural information (e.g., cryo-EM) is currently lacking, partly due to its sheet-like, non-twisted morphology (unpublished results), it is not possible to directly compare aggregation kinetics or nucleation mechanisms between LL-37 and PSMα3. It is possible that amyloidogenic systems such as PSMα3 exhibit greater flexibility in prefibrillar and fibrillar polymorphism, enabling RNA-regulated phase behavior, whereas nonamyloid assemblies such as LL-37 are more prone to stress-induced solid aggregation. We note that this interpretation is necessarily tentative and does not imply a general rule, but rather reflects differences evident in the present system.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review):

      This problem is evident in the presentation of the EAK specimens. In their response, the authors state that one EAK specimen shows "overlapping scars" and constitutes a "long bone flake"; however, these features are not clearly identifiable in the figures or captions as currently presented. The authors state that Figures S21-S23 clearly indicate human agency, including a long bone flake with overlapping scars and a view of the medullary surface, but it is unclear which specimens or surfaces these descriptions refer to. Figure S21 does appear to show green fracture and is described only as an "elephant-sized flat bone fragment with green-bone curvilinear break." Figure S22 shows the same bone and cortical surface in a different orientation, providing no additional information. In Figure S23, I cannot clearly identify a medullary surface or evidence of green-bone fracture from this image. None of these images clearly demonstrates overlapping scars, and the figures would be substantially improved by explicitly identifying the features described in the text. Even if both EAK specimens are accepted as green-broken, they do not demonstrate the co-occurrence of multiple diagnostic fracture traits such as multiple green breaks, large step fractures, hackle marks, and overlapping scars that the authors state is required to attribute dynamic percussive activity to hominins and address equifinality.

      We appreciate the reviewer’s careful evaluation of the EAK specimens. We acknowledge that the overlapping scars and medullary surface of the specimen originally shown in Figure S23 were not sufficiently clear. To address this, we have extensively revised Figure S23. In the updated Supplementary File, we have provided new annotations and line drawings that explicitly trace the outlines of the overlapping scars and clearly shows the green-bone fracture features. These enhancements ensure that the diagnostic traits discussed in the text are now directly identifiable in the visual record. This demonstrates the co-occurrence of traits: green-broken outlines and overlapping scars, which meet the criteria for identifying dynamic percussive activity. This is so following Reviewer´s 2 partial handling of our arguments; since we argued in our previous response that clear simple green-broken elephant long limb bones were an anthropogenic signature per se, given that currently no durophagous predator/scavenger (including spotted hyenas) are able to produce them. Additional secondary features like hackle marks are supportive but not necessary to attribute human agency.

      I appreciate that the authors are careful to state that spatial association between stone tools and fossils alone does not demonstrate hominin behavior, and that they treat the spatial analyses as supportive rather than decisive. While the association is intriguing, the problem is downstream: spatial association is used to strengthen an interpretation of butchery at EAK that still depends on fracture evidence that is not clearly documented at the assemblage level.

      The association is inferred (not demonstrated) by the strong statistical spatial association between lithics and bones. Additional taphonomic evidence (like cut marks or green-broken bones) do further support the inference but they do not demonstrate it, given the highly subjective nature of cut mark identification and the plethora of alternative scenarios: one green-broken bone would not demonstrate complete elephant butchery (it could result from a marginal exploitation of just that bone); one cutmarked bone could equally reflect several alternative access types to the remains. The reviewer recognized above the presence of green-broken elements at EAK; again, this supports anthropogenic agency better than any other alternative scenario, because one of the green-broken bones is a long bone and modern hyenas are not able to produce this kind of specimens.

      The critique concerning Nyayanga is not addressed in the revision. The manuscript proposes alternative explanations for the Nyayanga material but does not demonstrate why these are more plausible than the interpretation advanced by Plummer et al. (2023). I am not arguing that the Nyayanga material should be accepted as butchery; rather, showing that trampling is possible does not establish it as more probable than cut marks. In contrast, the EAK material is treated as evidence of butchery on the basis of evidence that, in my opinion, is more limited and less clearly demonstrated. Even if this is not the authors' intention, the uneven treatment removes an earlier megafaunal case from the comparison and strengthens the case for interpreting EAK as marking a behavioral shift toward megafaunal butchery by excluding other early cases.

      Again, it was never our intention to “demonstrate” anything. The reviewer is misusing this term. These types of arguments are epistemologically impossible to demonstrate. One can just discuss the heuristics of alternative scenarios. The point that we tried to make was that the Nyayanga purported cut marks on megafaunal remains are (as identified and published) impossible to differentiate from natural sedimentary abrasive marks (like trampling). Therefore, they cannot be argued to represent anthropogenic butchery on a secure basis. Especially, when they do not occur in conjunction with green-broken elements of clear dynamic loading nature.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate how UVC-induced DNA damage alters the interaction between the mitochondrial transcription factor TFAM and mtDNA. Using live-cell imaging, qPCR, atomic force microscopy (AFM), fluorescence anisotropy, and high-throughput DNA-chip assays, they show that UVC irradiation reduces TFAM sequence specificity and increases mtDNA compaction without protecting mtDNA from lesion formation. From these findings, the authors suggest that TFAM acts as a "sensor" of damage rather than a protective or repair-promoting factor.

      Strengths:

      (1) The focus on UVC damage offers a clean system to study mtDNA damage sensing independently of more commonly studied repair pathways, such as oxidative DNA damage. The impact of UVC damage is not well understood in the mitochondria, and this study fills that gap in knowledge.

      (2) In particular, the custom mitochondrial genome DNA chip provides high-resolution mapping of TFAM binding and reveals a global loss of sequence specificity following UVC exposure.

      (3) The combination of in vitro TFAM DNA biophysical approaches, combined with cellular responses (gene expression, mtDNA turnover), provides a coherent multi-scale view.

      (4) The authors demonstrate that TFAM-induced compaction does not protect mtDNA from UVC lesions, an important contribution given assumptions about TFAM providing protection.

      Weaknesses:

      (1) The authors show a decrease in mtDNA levels and increased lysosomal colocalization but do not define the pathway responsible for degradation. Distinguishing between replication dilution, mitophagy, or targeted degradation would strengthen the interpretation

      We thank the reviewer for their careful reading of our manuscript and thoughtful suggestions. We agree that distinguishing between replication dilution, mitophagy, and/or targeted degradation would strengthen our understanding of how UV-induced DNA damage is handled in the mitochondria. Currently we are undertaking experiments to tease this apart, but consider the scope of those experiments to be beyond this manuscript and expect to publish them in a subsequent paper rather than this one. We added text explicitly stating that these possibilities are not distinguished by our results in pages 8-9 in the Discussion under the subsection ‘Mitochondria respond to UVC-induced mtDNA damage in the absence of apparent mitochondrial dysfunction’.

      (2) The sudden induction of mtDNA replication genes and transcription at 24 h suggests that intermediate timepoints (e.g., 12 hours) could clarify the kinetics of the response and avoid the impression that the sampling coincidentally captured the peak.

      We agree and have added additional timepoints of 12 hours and 18 hours post exposure. We have updated Figure 2 to include the new data and have added text on page 4 to include these results.

      (3) The authors report no loss of mitochondrial membrane potential, but this single measure is limited. Complementary assays such as Seahorse analysis, ATP quantification, or reactive oxygen species measurement could more fully assess functional integrity.

      We focused on membrane potential because loss of membrane potential is such a well-understood of mechanism for triggering mitophagy, but agree that these additional measurements are useful. We have added experiments to assess ATP levels, but did not see changes; we have added this data to Figure 2. We have also added text highlighting that we previously assessed mtROS following the same levels of UV exposure and observed no changes (in the results section on page 5 and in the discussion section on page 9). Given that we observe no changes in membrane potential or ATP, we have opted to not move forward with Seahorse analysis for the purposes of this paper.

      (4) The manuscript briefly notes enrichment of TFAM at certain regions of the mitochondrial genome but provides little interpretation of why these regions are favored. Discussion of whether high-occupancy sites correspond to regulatory or structural elements would add valuable context.

      We agree a discussion of these findings provides context and insight into where the field is currently in understanding TFAM sequence specificity. We have updated text in the discussion (pages 9-10) to include our thoughts on the drivers of TFAM sequence specificity with regard to the discrepancy with the anisotropy data and the lack of overlap with regulatory/structural elements.

      (5) It remains unclear whether the altered DNA topology promotes TFAM compaction or vice versa. Addressing this directionality, perhaps by including UVC-only controls for plasmid conformation, would help disentangle these effects if UVC is causing compaction alone.

      We have added an additional control making this comparison and updated the text on page 7 in the results section. UVC by itself (without TFAM being present) does not alter the plasmid compaction; see new supplemental Figure S16.

      (6) The authors provide a discrepancy between the anisotropy and binding array results. The reason for this is not clear, and one wonders if an orthogonal approach for the binding experiments would elucidate this difference (minor point).

      The discrepancy between anisotropy and the binding array results is certainly unusual and contrary to previous studies that have used these arrays. In addition to the anisotropy experiments, we selected a ‘high occupancy’ and ‘low occupancy’ sequence from the binding array and performed oligomerization experiments using atomic force microscopy, which allowed us to detect small changes in cooperativity (see supplemental Figure S15). We previously only discussed this briefly in the results section on page 6, but we have now updated the discussion section (pages 9-10) to highlight this finding and put forth ideas for the field as to why we think this might be the case. While we do see that the binding array data aligns with oligomerization and cooperativity of TFAM, we still do not know what it is about these sequences that would drive such differences in TFAM binding, but we speculate that it could have something to do with flexibility of the DNA sequences.

      Assessment of conclusions:

      The manuscript successfully meets its primary goal of testing whether TFAM protects mtDNA from UVC damage and the impact this has on the mtDNA. While their data points to an intriguing model that TFAM acts as a sensor of damaged mtDNA, the validation of this model requires further investigation to make the model more convincing. This is likely warranted for a follow-up study. Also, the biological impact of this compaction, such as altering transcription levels, is not clear in this study.

      We have updated wording in the Abstract, Introduction, and elsewhere in the text (as detailed in other portions of our response) to make as explicit and clear as possible which results are supported by the in vitro versus in vivo data, and which parts are conclusions supported by the data versus hypothesized models to be tested in future work.

      Impact and utility of the methods:

      This work advances our understanding of how mitochondria manage UVC genome damage and proposes a structural mechanism for damage "sensing" independent of canonical repair. The methodology, including the custom TFAM DNA chip, will be broadly useful to the scientific community.

      Context:

      The study supports a model in which mitochondrial genome integrity is maintained not only by repair factors, but also by selective sequestration or removal of damaged genomes. The demonstration that TFAM compaction correlates with damage rather than protection reframes an interesting role in mtDNA quality control.

      Reviewer #2 (Public review):

      Summary:

      King et al. present several sets of experiments aimed to address the potential impact of UV irradiation on human mitochondrial DNA as well as the possible role of mitochondrial TFAM protein in handling UV-irradiated mitochondrial genomes. The carefully worded conclusion derived from the results of experiments performed with human HeLa cells, in vitro small plasmid DNA, with PCR-generated human mitochondrial DNA, and with UV-irradiated small oligonucleotides is presented in the title of the manuscript: "UV irradiation alters TFAM binding to mitochondrial DNA". The authors also interpret results of somewhat unconnected experimental approaches to speculate that "TFAM is a potential DNA damage sensing protein in that it promotes UVC-dependent conformational changes in the [mitochondrial] nucleoids, making them more compact." They further propose that such a proposed compaction triggers the removal of UV-damaged mitochondrial genomes as well as facilitates replication of undamaged mitochondrial genomes.

      Strengths:

      (1) The authors presented convincing evidence that a very high dose (1500 J/m2) of UVC applied to oligonucleotides covering the entire mitochondrial DNA genome alleviates sequence specificity of TFAM binding (Figure 3). This high dose was sufficient to cause UV lesions in a large fraction of individual oligonucleotides. The method was developed in the lab of one of the corresponding authors (reference 74) and is technically well-refined. This result can be published as is or in combination with other data.

      (2) The manuscript also presents AFM evidence (Figure 4) that TFAM, which was long known to facilitate compaction of the mitochondrial genome (Alam et al., 2003; PMID 12626705 and follow-up citations), causes in vitro compaction of a small pUC19 plasmid and that approximately 3 UVC lesions per plasmid molecule result in a slight, albeit detectable, increase in TFAM compaction of the plasmid. Both results can be discussed in line with a possible extrapolation to in vivo phenomena, but such a discussion should include a clear statement that no in vivo support was provided within the set of experiments presented in the manuscript.

      We thank this reviewer for their careful reading and interpretation of the manuscript. We agree that discussion of in vivo implications and extrapolations need clear statements indicating where there is not currently in vivo support. We have updated the text throughout the paper to include this.

      Weaknesses:

      Besides the experiments presented in Figures 3 and 4, other results do not either support or contradict the speculation that TFAM can play a protective role, eliminating mitochondrial genomes with bulky lesions by way of excessive compaction and removing damaged genomes from the in vivo pool.

      To specify these weaknesses:

      (1) Figure 1 - presents evidence that UVC causes a reduction in the number of mitochondrial spots in cells. The role of TFAM is not assessed.

      We are working to understand the role of TFAM in vivo following UV irradiation, but believe that work should be included in follow up studies rather than this publication.

      (2) Figure 2 - presents evidence that UVC causes lesions in mitochondrial genomes in vivo, detectable by qPCR. No direct assessment of TFAM roles in damage repair or mitochondrial DNA turnover is assessed despite the statements in the title of Figure 2 or in associated text. Approximately 2-fold change in gene expression of TFAM and of the three other genes does not provide any reasonable support to suggestion about increased mitochondrial DNA turnover over multiple explanations on related to mitochondrial DNA maintenance.

      We agree and have updated the title of Figure 2 to better reflect the findings outlined in the figure as well as the text.

      The new title is, “UVC causes mtDNA damage that decreases over time and is associated with upregulation of mtDNA replication genes, in the absence of apparent mitochondrial dysfunction.”

      We agree that there are numerous mechanistic hypotheses that could explain the decrease in mtDNA damage over time. In Figure 1, we show that there is an overall decrease in mtDNA spots, and an increase in mtDNA-lysosome colocalization, suggestive of mtDNA degradation, which could serve to remove damaged genomes. One possibility is that TFAM is playing a role in the damage removal (but not repair per cell as these lesions are not repaired). Another is changes in mtDNA turnover via increasing the replication machinery in order the synthesize non-damaged mtDNA molecules to dilute out damage. These and other possibilities are not mutually exclusive. We have added text (pages 8-9) to make explicit that additional work will be required to distinguish these possibilities. We note that we have also added an additional experiment showing that TFAM knockdown affects mtDNA damage at baseline, as well as after UVC exposure (Figure 5J).

      (3) Figure 5. Shows that TFAM does not protect either mitochondrial nucleoids formed in vitro or mitochondrial DNA in vivo from UVC lesions as well as has no effect on in vivo repair of UV lesions.

      We agree that Figure 5 shows that TFAM does not protect DNA from UVC-induced lesions, and that a roughly 2-fold increase in TFAM protein does not alter damage reduction over time. We have added new data showing that in vivo, knockdown of TFAM results in an increase in baseline (control conditions) mtDNA damage, and also alters the rate of decrease of mtDNA damage over time after UVC (Figure 5J).

      (4) Figure 6: Based on the above analysis, the model of the role of TFAM in sensing mtDNA damage and elimination of damaged genomes in vivo appears unsupported.

      We have updated the legend for Figure 6 in which we outline our hypothesized role of TFAM in sensing mtDNA damage to ensure that readers know this has yet to be fully tested in vivo. We have also updated the Figure legend title from “proposed model” to “hypothesized model,” and changed the wording in the conclusion section (page 11) to highlight more clearly that this is a working model.

      (5) Additional concern about Figure 3 and relevant discussion: It is not clear if more uniform TFAM binding to UV irradiated oligonucleotides with varying sequence as compared to non-irradiated oligonucleotides can be explained by just overall reduced binding eliminating sequence specific peaks.

      We do not believe this is the case given the similar K<sub>D</sub> values for the sequences tested. In our hands and in other publications (reviewed in PMID: 34440420), it has been well established that TFAM binds damaged DNA very well—essentially just as well as nondamaged DNA or better.

      Additionally, a reduction in overall binding on these DNA arrays tends to make sequence specific peaks more apparent. We ran our experiments at both 30 nM and 300 nM TFAM specifically to be able to assess this question. The 300 nM data can be found in supplemental Figure S7. In this figure, we notice that the peaks appear more uniform at the high concentration (comparing Figure 3A to Figure S7A). That is presumably because there is so much more binding happening across the array that the peaks associated with the strongest binders become less pronounced. For the sake of brevity, we have not added this reasoning to the text, but are willing to do so if the Reviewers and Editor feel that it is important to include.

      Reviewer #3 (Public review):

      Summary:

      The study is grounded in the observations that mitochondrial DNA (mtDNA) exhibits a degree of resistance to mutagenesis under genotoxic stress. The manuscript focuses on the effects of UVC-induced DNA damage on TFAM-DNA binding in vitro and in cells. The authors demonstrate increased TFAM-DNA compaction following UVC irradiation in vitro based on high-throughput protein-DNA binding and atomic force microscopy (AFM) experiments. They did not observe a similar trend in fluorescence polarization assays. In cells, the authors found that UVC exposure upregulated TFAM, POLG, and POLRMT mRNA levels without affecting the mitochondrial membrane potential. Overexpressing TFAM in cells or varying TFAM concentration in reconstituted nucleoids did not alter the accumulation or disappearance of mtDNA damage. Based on their data, the authors proposed a plausible model that, following UVC-induced DNA damage, TFAM facilitates nucleoid compaction, which may serve to signal damage in the mitochondrial genome.

      Strengths:

      The presented data are solid, technically rigorous, and consistent with established literature findings. The experiments are well-executed, providing reliable evidence on the change of TFAM-DNA interactions following UVC irradiation. The proposed model may inspire future follow-up studies to further study the role of TFAM in sensing UVC-induced damage.

      Weaknesses:

      The manuscript could be further improved by refining specific interpretations and ensuring terminology aligns precisely with the data presented.

      (1) In line 322, the claim of increased "nucleoid compaction" in cells should be removed, as there is a lack of direct cellular evidence. Given that non-DNA-bound TFAM is subject to protease digestion, it is uncertain to what extent the overexpressed TFAM actually integrates into and compacts mitochondrial nucleoids in the absence of supporting immunofluorescence data.

      We would like to thank this reviewer for their comments and suggestions. We feel these specific language changes have strengthened the interpretability of the text. The TFAM overexpression cells used in this experiment were given to us by Isaac et al., who demonstrated that when TFAM was overexpressed in this specific cell line, the nucleoids were indeed more compact, measured by Fiber-seq (Isaac et al., 2024; PMID: 38347148). We have removed the claim “increased compaction” from the section title, Figure 5 legend title, and from line 322 (now on page 8), and have also added an additional sentence to ensure the reader knows these cells have been shown to have presumed increased compaction by other groups.

      (2) In lines 405 and 406, the authors should avoid equating TFAM overexpression with compaction in the cellular context unless the compaction is directly visualized or measured.

      We have updated the text to ensure that it is clear that this was tested by other groups. We also changed the wording to “inaccessible (presumably compacted) nucleoids.” While we did not demonstrate altered compaction in our study, we think that based on the results from Isaac et al., it is likely that there was increased compaction. In addition, some readers might not have the context to make the connection between compaction and accessibility, so eliminating all reference to compaction could obscure the point.

      (3) In lines 304 and 305 (and several other places throughout the manuscript), the authors use the term "removal rates". A "removal rate" requires a direct comparison of accumulated lesion levels over a time course under different conditions. Given the complexity of UV-induced DNA damage-which involves both damage formation and potential removal via multiple pathways-a more accurate term that reflects the net result of these opposing processes is "accumulated DNA damage levels." This terminology better reflects the final state measured and avoids implying a single, active 'removal' pathway without sufficient kinetic data.

      We agree and have updated the language throughout the text as well as the results heading for this section.

      (4) In line 357, the authors refer to the decrease in the total DNA damage level as "The removal of damaged mtDNA". The decrease may be simply due to the turnover and resynthesis of non-damaged mtDNA molecules. The term "removal" may mislead the casual reader into interpreting the effect as an active repair/removal process.

      We agree and have restructured this sentence for clarity. We do believe there is some removal happening, given the increase in mtDNA colocalization in lysosomes alongside decrease of mtDNA spots in our live cell imaging. We have written it to reflect the inclusion of removal and resynthesis of nondamaged mtDNA molecules (see pages 8-9).

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers appreciate the quality of the presented data but concur that they do not support the primary claims in the title and abstract. The reviewers also realize that in vivo evidence for the model would require extensive new experimentation that goes beyond a reasonable revision. The recommendation is to change the title and significantly revise text, figure titles and legends for transparency, and conclusions within results and discussion sections.

      We thank the editor and all the reviewers for their feedback. We have added additional experiments, updated text throughout the entire paper to ensure our claims are supported, and revised our title. We feel that the changes we have made have indeed made the paper stronger, more transparent, and that the evidence put forth in this paper provides support for all claims made.

      Reviewer #1 (Recommendations for the authors):

      (1) Clarify mitochondrial response kinetics by adding an intermediate (e.g., 12 hrs) recovery timepoint for transcriptional analysis to resolve when TFAM and replication genes are induced.

      We have added additional timepoints of 12 and 18 hours following exposure in Figure 2. These results strengthen our finding that the nuclear transcriptional program supporting mtDNA replication appears to be activated prior to the nuclear transcriptional program supporting mitochondrial transcription, in that POLG and TFAM come up before POLRMT and ND1.

      (2) Strengthen functional readouts by assessing additional parameters of mitochondrial function to substantiate the claim that UVC does not impair mitochondrial performance.

      We have referenced our previously-published data on mtROS and added a measurement of ATP following UVC exposure in Figure 2.

      (3) Consider exploring whether mtDNA degradation occurs via mitophagy, nucleoid-phagy, or another pathway-potentially by using inhibitors or markers of these processes.

      While we agree that this is an important follow up question and are currently working on experiments to address this, those experiments are outside the scope of this manuscript.

      (4) Provide additional details for the high occupancy TFAM sites. Provide brief annotation or discussion of genomic regions showing strong TFAM binding under non-irradiated conditions that are lost during UVC treatment. This would be helpful to the field as a whole.

      We have updated our discussion section to include this.

      (5) Include or discuss a control using UVC irradiated pUC19 without TFAM to confirm that observed compaction categories are TFAM dependent rather than an UVC induced DNA distortion.

      We have added in a supplemental figure (Figure S16) containing comparison of area analysis of control pUC19 and UV-irradiated pUC19 and we have added associated text in the results section of the paper.

      (6) It would be interesting to explore the link between compaction to transcriptional output. In the TFAM overexpression model, the authors could measure expression of mtDNA encoded transcripts (e.g., ND1, COX1) to connect increased compaction with altered mitochondrial transcription.

      While we agree that understanding how the compactional status alters mitochondrial transcription is worthwhile, we believe this is beyond the scope of this paper. Furthermore, this connection has previously been shown by Bruser et al., 2021 (PMID: 34818548) who showed that more compact nucleoids are not undergoing active transcription. It will be interesting to see in future work if mtDNA damage drives changes in both compaction as well as transcriptional activity.

      (7) Clarify quantitative presentation in figure 2F to explicitly note whether the observed increase in fluorescence intensity was statistically insignificant and confirm that the assay sensitivity is sufficient to detect small potential changes. As presented it is not clear if there is a change.

      We have changed the presentation of Figure 2F. There is a slight increase in membrane potential at the 24-hour time point and we have made that clear in the text as well. We included FCCP as a (standard) positive control, for which we can detect the associated decrease in membrane potential for. While it is always possible that a very small decrease occurred that we were unable to detect, we note that none of the six UVC-exposed groups that we tested even trended towards a decrease in MMP, making it less likely that there was an effect that we simply lacked the power or sensitivity to detect.

      (8) It would be interesting if the authors can comment on whether TFAM induced compaction after UVC might shield mtDNA from other, repairable lesions (e.g., oxidative or alkylation damage), offering a broader context for this mechanism beyond just UVC.

      In theory, we believe this is possible. It will also be interesting to see if the increased compaction following UVC also protects or shields the mtDNA from other enzymatic processes, such as repair proteins that may be searching for repairable lesions such as oxidative or alkylation damage. In this case, it seems as though the increased compaction would prevent the repair from happening at genomes harboring damage.

      In this study we show with our in vitro nucleoids that the increased compaction does not protect against UVC, but this is likely because UVC does not need physical access to the DNA in order to damage it, as the wavelengths of UVC (centered in this case at 254nm) are readily absorbed by proteins and thus can go right through the proteins. Currently, we know that increased compaction by TFAM makes the DNA inaccessible to the enzymes required to methylate DNA used in Fiber-seq (PMID: 38347148), but we do not know if the compaction is tight enough to prevent ROS or alkylating agents from damaging the DNA. We have updated text in the discussion on page 10 to highlight some of these ideas.

      Reviewer #2 (Recommendations for the authors):

      Please, go over all display items and text and clarify details that can help readers to understand important specifics of the experiments. Examples are provided below:

      (1) Abstract and Introduction - indicate species and cell line

      We have updated the text to include this information.

      (2) Table 1 "TFAM KD measurements"- title and footnotes are entirely cryptic. Please, clarify the experimental design, question(s) addressed and conclusions drawn from data.

      We have updated the title of Table 1 to "Binding of TFAM to array sequences, measured using fluorescence anisotropy,” and clarified the footnotes to make sure it is clear which sequences were selected for AFM oligomerization experiments.

      (3) Figure 3 and Material and Methods - specify UVC dose.

      We have added this information to both the figure legend and the methods section.

      (4) Figure 4 - specify UVC dose.

      We have added this information to the figure legend.

      (5) Figure 5. Panel B indicate which band is TFAM and which is HA-tag; Indicate clearly which panel is showing in vivo or in vitro results.

      We have updated the figure to label the untagged TFAM and HA-tagged TFAM and changed the panel titles to specify if they are in vivo results.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Recommendations for the authors):

      Major:

      Over-interpretation of data. There are a few instances of this:

      The authors claim "Our work shows that MgdE interacts with both WDR5 and ASH2L and inhibits the methyltransferase activity of the COMPASS complex" (Line 318). However, they provide no biochemical analysis of methyltransferase activity to support this claim. While they cite Figure 4A-C and Figure 5, these data simply show (slightly) decreased cellular levels of H3K4Me. There are multiple ways H3K4Me could decrease including blocking recruitment of COMPASS to promoters or the enzymatic activity of MgdE itself.

      The data itself related to H3K4Me changes (Figure 5D) is difficult to interpret in light of the controls they now provide. Examining the blot itself there seems to be a massive increase in H3K4Me in control cells expressing GFP that is not reflected in the quantification that shows only a ~2x increase in GFP-expressing cells. In addition, there is very little decrease in H3K4Me in the MgdE-expressing cells relative to controls or site-mutant (no change apparent visually and ~10% change per their quantification). However, the authors interpret this as," revealed that cells expressing WT MgdE exhibited lower levels of H3K4me3". In both these cases I would recommend the authors consider modifying their interpretation of the data.

      We thank the reviewer for the comment.

      (1) We have now revised this interpretation in the manuscript as follows:

      Lines 311-312: “Our work shows that MgdE interacts with both WDR5 and ASH2L, leading to a decrease in H3K4me3 levels.”

      (2) Figure 5D presents the results of three independent biological replicates. The bar graph shows the average signal intensity of H3K4me3 normalized to the corresponding loading controls. Accordingly, we have revised the analysis and description of the experimental results.

      Lines 214-217: “Immunoblot analysis of nuclear extracts showed that cells expressing WT MgdE had ~25% lower H3K4me3 levels than EGFP-expressing cells and ~40% lower levels than those expressing the D244A/H47A mutant (Figure 5D).”

      Minor

      What is "CK"? Please clarify (Figure 2F).

      We thank the reviewer for the comment. In this context, "CK" refers to the uninfected control group, which serves as the negative control in the experiment. We have revised the label in Figure 2F.

      How many times was the BCG mouse experiment performed? This should be indicated in the figure legend? (Figure 7A).

      We thank the reviewer for the comment. The BCG mouse experiment was performed once, and we have added this information to the figure legend of Figure 7A.

      It is unclear why the secreted protein (after signal peptide removal) migrates at the same size as the full-length protein (Figure S2).

      We thank the reviewer for the comment. The precursors of secreted proteins after translation in the cytoplasm will be translated into the periplasm immediately. Therefore, MgdE or Ag85B obtained from the whole-cell lysate (Figure S2A) mostly have had the signal peptides removed. This is also validated in the case of Rv0455c secretion by Mtb (Zhang et al., Nature Communications, 2022). This explains why MgdE (or Ag85B) proteins from whole-cell lysates or from supernatants show same size in SDS-PAGE gels.

      It is still unclear why the transcripts with very little fold-change in expression (in grey) have the most significant p-values for being different (Figure 6).

      We thank the reviewer for the comment. The p-value calculation takes into account not only the magnitude of expression change but also the consistency of expression levels within each group and the number of biological replicates. When the variation among replicates is minimal, even a small difference in group means can result in a statistically significant p-value. In our RNA-seq analysis, we used DESeq2 with three biological replicates per group. DESeq2 employs a model based on the negative binomial distribution and accounts for multiple factors, including the mean expression level, within-group variance (dispersion), sample size, and normalization accuracy. As a result, it is common to observe that genes with small variability and strong consistency between replicates may show significant p-values even with modest fold changes. Conversely, genes with larger fold changes but greater variability might not reach statistical significance.

      Reference

      Zhang L, Kent JE, Whitaker M, Young DC, Herrmann D, Aleshin AE, Ko YH, Cingolani G, Saad JS, Moody DB, Marassi FM, Ehrt S, Niederweis M (2022) A periplasmic cinched protein is required for siderophore secretion and virulence of Mycobacterium tuberculosis Nat Commun 13(1):2255.

    1. Author response:

      We thank the reviewers for their thoughtful and constructive feedback. Addressing these points will strengthen the manuscript and improve its clarity.

      A primary concern involved the justification for using COS7 cell lysates in reconstitution approaches and iPSC-derived neuronal model systems as models for AD. We will clarify the language throughout the manuscript to more explicitly state the study’s goals, emphasize that these systems were selected as robust, well-controlled platforms to test the mechanisms through which tau hyperphosphorylation affects microtubule interactions and tau’s role in regulating intracellular transport, and the limitations of in vitro and iPSC models.

      Reviewers also raised the possibility that background phosphorylation could contribute to the effects observed in the pseudo-phosphorylation model. We cite two recent preprints that provide insight into this question through quantitatively assessing tau phosphorylation across expression systems. In the revised manuscript, we will elaborate on how their assessment of tau phosphorylation fits within the scope of our approach and clarify how our experimental controls effectively minimize uncertainty related to background phosphorylation.

      Another point concerned the potential influence of other microtubule-associated proteins in lysates and the impact of tau lattice occupancy on motility outcomes. To further strengthen this aspect, we will include additional analyses correlating tau intensity along microtubules with kinesin intensity and motility behavior, and we will more clearly explain how the AP and WT controls provide confidence in the robustness of the system.

      Detailed responses to each reviewer comment are provided below point by point. The planned revisions, which include clearer language, stronger justification of the experimental approaches, and additional supporting analyses, will substantially improve the clarity, rationale, and overall impact of the study.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work by Beaudet and colleagues aims at exploring the effect of phosphorylation on the formation of tau envelopes and consequently on axonal transport, both in vitro on reconstituted microtubules and in human excitatory neurons derived from IPSCs.

      The authors found that a relatively widely used construct in which 14 serine or threonine residues, often hyperphosphorylated in Alzheimer's disease, are mutated to alanines (phosphodeficient), increases the density of tau envelopes compared to wildtype tau, whereas a phosphomimetic (same residues mutated to glutamic acid) reduces envelope density both in vitro and in human excitatory neurons derived from IPSCs.

      By analysing the trafficking of different kinesins (KIF1a and KIF5C), they observed different effects of tau phosphorylation status on the movement of these two motors.

      They then analyse transport of lysosomes by employing live imaging of lysotracker in human excitatory neurons derived from IPSCs transfected with wildtype, phosphodeficient or phosphomimetic tau, observing that phosphodeficient tau seems to reduce transport of lysosomes while phosphomimetic increases transport compared to wildtype tau.

      Strengths:

      (1) The work aims to study a novel and underexplored topic in the tau field, tau envelopes, and investigate their relevance to Alzheimer's disease pathology.

      (2) Experiments are well conducted and of high quality.

      Weaknesses:

      Relying only on in vitro reconstituted microtubules and human neurons derived from IPSCs leaves some doubts about the relevance of these results for Alzheimer's disease, considering the embryonic state of IPSCs-derived neurons.

      We agree with the reviewer that iPSC-derived neurons represent an immature state compared with the neurons affected in Alzheimer’s disease. However, iPSC-derived neurons, together with in vitro reconstitution, provide insight into (1) whether tau hyperphosphorylation influences its association with microtubules and its ability to form envelope-like structures thought to regulate transport, (2) how tau hyperphosphorylation affects the motility of kinesin motors that are strongly inhibited by tau, and (3) how transport of endogenous degradative organelles such as lysosomes are impacted by tau hyperphosphorylation. We hope that our studies will help to inform future studies examining how tau-related dysfunction evolves in more mature neurons and contributes to the more severe pathological effects observed at later disease stages.

      We will include a paragraph in the Discussion section addressing the limitations of this study to better contextualize our findings within the broader effort to understand tauopathies and Alzheimer’s disease.

      Reviewer #2 (Public review):

      This manuscript examines how disease-associated hyperphosphorylation disrupts tau's role as a cooperative microtubule-binding regulator of intracellular transport. Using in vitro reconstitution assays and live-cell imaging in iPSC-derived neurons, the authors employ phosphomutant tau constructs (E14 to mimic hyperphosphorylation, AP to prevent phosphorylation) at 14 disease-associated residues to isolate phosphorylation effects independent of expression system-dependent PTM heterogeneity. The results show that hyperphosphorylated tau fails to form cooperative envelope-like structures on microtubules, instead binding diffusely and dissociating rapidly. In contrast, wild-type and phospho-resistant tau form cohesive envelopes that regulate motor protein access. At the single-molecule level, hyperphosphorylation reduces KIF5C inhibition while maintaining or enhancing KIF1A inhibition through altered processivity and detachment rates. In live neurons, hyperphosphorylated tau phenocopies tau knockout conditions, weakening tau-mediated inhibition of lysosome transport and increasing processive motility. The authors quantify tau binding using Gaussian mixture model-based image analysis and measure tau kinetics via FRAP, demonstrating that hyperphosphorylation-induced loss of cooperative binding correlates with dysregulated organelle transport. These findings establish a mechanism by which phosphorylation-driven disruption of tau's gatekeeper function on microtubules compromises axonal transport prior to aggregation in tauopathies. The paper provides interesting new knowledge for the field, but there are outstanding concerns that could be further addressed by the authors to strengthen and clarify the current manuscript:

      (1) Lack of Phosphatase-Treated Control and Explicit WT Phosphorylation Quantification

      Wild-type tau expressed in insect and mammalian cells is known to be phosphorylated by endogenous kinases (eg, GSK3, CDK5, MARK). The manuscript acknowledges this in the Discussion but provides no phosphatase-treated lysate control or quantification of endogenous phosphorylation on WT tau via phospho-specific Western blots. This leaves ambiguity about whether observed differences between WT and E14 reflect purely the introduced mutations or confounding baseline differences in phosphostate content.

      Tau contains ~85 putative phosphorylation sites and is modified by several kinases in cells. Studies by Siahaan et al. (2024) and Fan et al. (2025) provide detailed insight into tau phosphorylation, its role in protecting the microtubule lattice from severing enzymes, and the implications of phosphorylation patterns for aggregate formation. Specifically, Fan et al. (2025) show that HEK-expressed tau is phosphorylated by endogenous kinases at 58 residues, with most phospho-occupancy levels below 15%, indicating substantial heterogeneity among individual tau molecules. In the revised manuscript, we will (1) provide justification for the use of the pseudo-phosphorylation model system as an approach to limit heterogeneity among tau molecules, (2) clarify the importance of the WT and AP controls, (3) discuss that E14, WT, and AP tau likely exhibit similar degrees of background phospho-heterogeneity, with WT tau likely exhibiting some overlap between background phosphorylation and the 14 AD-associated sites examined, and (4) expand the discussion to emphasize that although background phosphorylation is present, our results do not suggest that it contributes significantly to the observations reported in this study.

      (2) Limited Normalization of Motor Effects to Measured Tau Lattice Occupancy

      Although kinesin trajectories are classified inside vs. outside tau envelopes (inherently normalizing to local tau density), motor parameters are not systematically reported as functions of tau fluorescence intensity across all constructs. Co-purifying MAPs or microtubule-modifying enzymes in cell lysates is not quantified or excluded, leaving residual uncertainty about tau-specificity of observed motor inhibition. This should be at least acknowledged in the results section.

      The reviewer raises a valid point. It is challenging to compare conditions where the occupancy of tau on microtubules is similar across conditions, as phosphorylation strongly effects the interaction between tau and microtubules. We will quantify and report tau intensity in single-molecule motility assays. On the second point, while effects from other MAPs or motor proteins could potentially affect kinesin motility, we would expect that these effects would be similar for all tau phosphomutant constructs, such that the effect of tau phospho-states on kinesin motility can be assessed.

      (3) Insufficient Citation of Prior Neuronal Tau Envelope Evidence

      In the Introduction, the authors state, "it was an open question if tau forms envelopes in neurons," but this understates existing evidence. Tan et al. (2019) report tau neuronal staining consistent with envelope formation, while Siahaan et al. (2021) provide more direct evidence in non-neuronal cells. The framing should acknowledge and integrate these prior findings.

      We agree with the reviewer that evidence from several studies using reconstitution systems, fixed neurons, and live cultured cells provides evidence of tau envelope formation in neurons. Specifically, tau envelopes have been observed along taxol-stabilized or GMPCPP-capped GDP microtubules in vitro (e.g., Dixit et al., 2008; Monroy et al., 2018; Tan et al., 2019; Siahaan et al., 2019), in 4% PFA-fixed and Triton X-100–extracted DIV7 mouse hippocampal neurons (Tan et al., 2019), and in live, non-neuronal U-2 OS cells following taxol treatment (Siahaan et al., 2022) or elevated pH (Siahaan et al., 2024). However, to our knowledge, our study is the first to demonstrate tau envelope formation in live neuronal cells under normal cell culture conditions. We will revise this sentence in the manuscript to more precisely position our findings within the context of prior studies.

      (4) Unclear Wording on Expression System-Dependent Phosphorylation

      The sentence "The phosphostate of tau is strongly dependent on the expression system" requires rewording. It is ambiguous whether this refers to the final phosphostate achieved after expression or the inherent phosphorylating capacity of each system. Clearer language would strengthen the methodological justification.

      We agree that the wording here is ambiguous and requires clarification. In the revised manuscript, we will clarify that tau phosphorylation depends on the expression system used; bacterial systems lack the capacity for many post-translational modifications compared with insect and mammalian systems. We will also emphasize that in insect and mammalian expression systems, tau phosphorylation occurs heterogeneously, as demonstrated in previous studies by Siahaan et al. (2024) and Fan et al. (2025).

      (5) Insufficient Quantification of Motor and Lysosome Transport Effect Magnitudes in Results Section

      The data on molecular motor motility and lysosome transport are densely described. The magnitude of effects (fold-changes, percentage differences) should be explicitly stated in the Results section when first presenting findings to orient readers to biological significance. For example, effect magnitudes for lysosome run lengths, velocities, and directional bias should be quantified in text, not left to figure inspection.

      Our initial justification for omitting quantitative data from the results text was to improve readability; however, in doing so, we may have reduced the accessibility and clarity regarding the significance of the findings. In the revised manuscript, we will incorporate the relevant quantifications and statistical significance for the motility data in the text.

      (6) Incomplete Discussion of Projection Domain Necessity for Envelope Formation

      The Discussion states the projection domain is "a critical regulator of both tau-tau and tau-microtubule interactions," but does not engage with prior domain dissection work. Tan et al. (2019) found that the entire projection domain is not necessary for envelope formation in vitro. The authors should discuss which projection domain regions are specifically regulated by phosphorylation vs. required for cooperativity, providing a more nuanced interpretation than implied by their current framing.

      We agree with the reviewer. Tan et al. (2019) demonstrated that the proline-rich region (residues 198–244) within the projection domain of full-length 2N4R tau is the minimal region required to maintain tau’s ability to form envelopes along microtubules. We will incorporate this work on the dissection of the projection domain and discuss how the phosphorylation sites examined in our study are primarily located within this region. Together, these data highlight the proline-rich region as a potential major regulator of tau–tau cooperativity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Review of the manuscript titled " Mycobacterial Metallophosphatase MmpE acts as a nucleomodulin to regulate host gene expression and promotes intracellular survival".

      The study provides an insightful characterization of the mycobacterial secreted effector protein MmpE, which translocates to the host nucleus and exhibits phosphatase activity. The study characterizes the nuclear localization signal sequences and residues critical for the phosphatase activity, both of which are required for intracellular survival.

      Strengths:

      (1) The study addresses the role of nucleomodulins, an understudied aspect in mycobacterial infections.

      (2) The authors employ a combination of biochemical and computational analyses along with in vitro and in vivo validations to characterize the role of MmpE.

      Weaknesses:

      (1) While the study establishes that the phosphatase activity of MmpE operates independently of its NLS, there is a clear gap in understanding how this phosphatase activity supports mycobacterial infection. The investigation lacks experimental data on specific substrates of MmpE or pathways influenced by this virulence factor.

      We thank the reviewer for this insightful comment and agree that identification of the substrates of MmpE is important to fully understand its role in mycobacterial infection. MmpE is a putative purple acid phosphatase (PAP) and a member of the metallophosphoesterase (MPE) superfamily. Enzymes in this family are known for their catalytic promiscuity and broad substrate specificity, acting on phosphomonoesters, phosphodiesters, and phosphotriesters (Matange et al., Biochem J, 2015). In bacteria, several characterized MPEs have been shown to hydrolyze substrates such as cyclic nucleotides (e.g., cAMP) (Keppetipola et al., J Biol Chem, 2008; Shenoy et al., J Mol Biol, 2007), nucleotide derivatives (e.g., AMP, UDP-glucose) (Innokentev et al., mBio, 2025), and pyrophosphate-containing compounds (e.g., Ap4A, UDP-DAGn) (Matange et al., Biochem J., 2015). Although the binding motif of MmpE has been identified, determining its physiological substrates remains challenging due to the low abundance and instability of potential metabolites, as well as the limited sensitivity and coverage of current metabolomic technologies in mycobacteria.

      (2) The study does not explore whether the phosphatase activity of MmpE is dependent on the NLS within macrophages, which would provide critical insights into its biological relevance in host cells. Conducting experiments with double knockout/mutant strains and comparing their intracellular survival with single mutants could elucidate these dependencies and further validate the significance of MmpE's dual functions.

      We thank the reviewer for the comment. Deletion of the NLS motifs did not impair MmpE’s phosphatase activity in vitro (Figure 2F), indicating that MmpE's enzymatic function operates independently of its nuclear localization. Indeed, we confirmed that Fe<sup>3+</sup>-binding ability via the residues H348 and N359 is required for enzymatic activity of MmpE. We have expanded on this point in the Discussion section “MmpE is a bifunctional virulence factor in Mtb”.

      (3) The study does not provide direct experimental validation of the MmpE deletion on lysosomal trafficking of the bacteria.

      We thank the reviewer for the comment. To validate the role of MmpE in lysosome maturation during infection, we conducted fluorescence colocalization assays in THP-1 macrophages infected with BCG strains, including WT, ∆MmpE, Comp-MmpE, Comp-MmpE<sup>ΔNLS1</sup>, Comp-MmpE<sup>ΔNLS2</sup>, Comp-MmpE<sup>ΔNLS1-2</sup>. These strains were stained with the lipophilic membrane dye DiD, while macrophages were treated with the acidotropic probe LysoTracker<sup>TM</sup> Green (Martins et al., Autophagy, 2019). The result indicated that ΔMmpE and MmpE<sup>NLS1-2</sup> mutants exhibited significantly higher co-localization with LysoTracker compared to WT and Comp-MmpE strains (New Figure 5G), suggesting that MmpE deletion leads to enhanced lysosomal maturation during infection.

      (4) The role of MmpE as a mycobacterial effector would be more relevant using virulent mycobacterial strains such as H37Rv.

      We thank the reviewer for the comment. Previously, the role of Rv2577/MmpE as a virulence factor has been demonstrated in M. tuberculosis CDC 1551, where its deletion significantly reduced bacterial replication in mouse lungs at 30 days post-infection (Forrellad et al., Front Microbiol, 2020). However, that study did not explore the underlying mechanism of MmpE function. In our study, we found that MmpE enhances M. bovis BCG survival in macrophages (THP-1 and RAW264.7 both) and in mice (Figure 3, Figure 7A), consistent with its proposed role in virulence. To investigate the molecular mechanism by which MmpE promotes intracellular survival, we used M. bovis BCG as a biosafe surrogate and this model is widely accepted for studying mycobacterial pathogenesis (Wang et al., Nat Immunol, 2015; Wang et al., Nat Commun, 2017; Péan et al., Nat Commun, 2017).

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors have characterized Rv2577 as a Fe3+/Zn2+ -dependent metallophosphatase and a nucleomodulin protein. The authors have also identified His348 and Asn359 as critical residues for Fe3+ coordination. The authors show that the proteins encode for two nuclease localization signals. Using C-terminal Flag expression constructs, the authors have shown that the MmpE protein is secretory. The authors have prepared genetic deletion strains and show that MmpE is essential for intracellular survival of M. bovis BCG in THP-1 macrophages, RAW264.7 macrophages, and a mouse model of infection. The authors have also performed RNA-seq analysis to compare the transcriptional profiles of macrophages infected with wild-type and MmpE mutant strains. The relative levels of ~ 175 transcripts were altered in MmpE mutant-infected macrophages and the majority of these were associated with various immune and inflammatory signalling pathways. Using these deletion strains, the authors proposed that MmpE inhibits inflammatory gene expression by binding to the promoter region of a vitamin D receptor. The authors also showed that MmpE arrests phagosome maturation by regulating the expression of several lysosome-associated genes such as TFEB, LAMP1, LAMP2, etc. These findings reveal a sophisticated mechanism by which a bacterial effector protein manipulates gene transcription and promotes intracellular survival.

      Strength:

      The authors have used a combination of cell biology, microbiology, and transcriptomics to elucidate the mechanisms by which Rv2577 contributes to intracellular survival.

      Weakness:

      The authors should thoroughly check the mice data and show individual replicate values in bar graphs.

      We kindly appreciate the reviewer for the advice. We have now updated the relevant mice data in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript titled "Mycobacterial Metallophosphatase MmpE Acts as a Nucleomodulin to Regulate Host Gene Expression and Promote Intracellular Survival", Chen et al describe biochemical characterisation, localisation and potential functions of the gene using a genetic approach in M. bovis BCG and perform macrophage and mice infections to understand the roles of this potentially secreted protein in the host cell nucleus. The findings demonstrate the role of a secreted phosphatase of M. bovis BCG in shaping the transcriptional profile of infected macrophages, potentially through nuclear localisation and direct binding to transcriptional start sites, thereby regulating the inflammatory response to infection.

      Strengths:

      The authors demonstrate using a transient transfection method that MmpE when expressed as a GFP-tagged protein in HEK293T cells, exhibits nuclear localisation. The authors identify two NLS motifs that together are required for nuclear localisation of the protein. A deletion of the gene in M. bovis BCG results in poorer survival compared to the wild-type parent strain, which is also killed by macrophages. Relative to the WT strain-infected macrophages, macrophages infected with the ∆mmpE strain exhibited differential gene expression. Overexpression of the gene in HEK293T led to occupancy of the transcription start site of several genes, including the Vitamin D Receptor. Expression of VDR in THP1 macrophages was lower in the case of ∆mmpE infection compared to WT infection. This data supports the utility of the overexpression system in identifying potential target loci of MmpE using the HEK293T transfection model. The authors also demonstrate that the protein is a phosphatase, and the phosphatase activity of the protein is partially required for bacterial survival but not for the regulation of the VDR gene expression.

      Weaknesses:

      (1) While the motifs can most certainly behave as NLSs, the overexpression of a mycobacterial protein in HEK293T cells can also result in artefacts of nuclear localisation. This is not unprecedented. Therefore, to prove that the protein is indeed secreted from BCG, and is able to elicit transcriptional changes during infection, I recommend that the authors (i) establish that the protein is indeed secreted into the host cell nucleus, and (ii) the NLS mutation prevents its localisation to the nucleus without disrupting its secretion.

      We kindly appreciate the reviewer for this insightful comment. To confirm the translocation of MmpE into the host nucleus during BCG infection, we first detected the secretion of MmpE by M. bovis BCG, using Ag85B as a positive control and GlpX as a negative control (Zhang et al., Nat commun, 2022). Our results showed that MmpE- Flag was present in the culture supernatant, indicating that MmpE is secreted by BCG indeed (new Figure S1C).

      Next, we performed immunoblot analysis of the nuclear fractions from infected THP-1 macrophages expressing FLAG-tagged wild-type MmpE and NLS mutants. The results revealed that only wild-type MmpE was detected in the nucleus, while MmpE<sup>ΔNLS1</sup>, MmpE<sup>ΔNLS2</sup> and MmpE<sup>ΔNLS1-2</sup> were not detectable in the nucleus (New Figure S1D). Taken together, these findings demonstrated that MmpE is a secreted protein and that its nuclear translocation during infection requires both NLS motifs.

      Demonstration that the protein is secreted: Supplementary Figure 3 - Immunoblotting should be performed for a cytosolic protein, also to rule out detection of proteins from lysis of dead cells. Also, for detecting proteins in the secreted fraction, it would be better to use Sauton's media without detergent, and grow the cultures without agitation or with gentle agitation. The method used by the authors is not a recommended protocol for obtaining the secreted fraction of mycobacteria.

      We kindly appreciate the reviewer for the advice. To avoid the effects of bacterial lysis, we cultured the BCG strains expressing MmpE-Flag in Middlebrook 7H9 broth with 0.5% glycerol, 0.02% Tyloxapol, and 50 µg/mL kanamycin at 37 °C with gentle agitation (80 rpm) until an OD<sub>600</sub> of approximately 0.6 (Zhang et al., Nat Commun, 2022). Subsequently, we assessed the secretion of MmpE-Flag in the culture supernatant, using Ag85B as a positive control and GlpX as a negative control (New Figure S1C). The results showed that GlpX was not detected in the supernatant, while MmpE and Ag85B were detected, indicating that MmpE is indeed a secreted protein in BCG.

      Demonstration that the protein localises to the host cell nucleus upon infection: Perform an infection followed by immunofluorescence to demonstrate that the endogenous protein of BCG can translocate to the host cell nucleus. This should be done for an NLS1-2 mutant expressing cell also.

      We thank the reviewer for the suggestion. We agree that this experiment would be helpful to further verify the ability of MmpE for nuclear import. However, MmpE specific antibody is not available for us for immunofluorescence experiment. Alternatively, we performed nuclear-cytoplasmic fractionation for the THP-1 cells infected with the M. bovis BCG strains expressing FLAG-tagged wild-type MmpE, as well as NLS deletion mutants (MmpE<sup>ΔNLS1</sup>, MmpE<sup>ΔNLS2</sup>, and MmpE<sup>ΔNLS1-2</sup>). The WT MmpE is detectable in both cytoplasmic and nuclear compartments, while MmpE<sup>ΔNLS1</sup>, MmpE<sup>ΔNLS2</sup> or MmpE<sup>ΔNLS1-2</sup> were almost undetectable in nuclear fractions (New Figure S1D), suggesting that both NLS motifs are necessary for nuclear import.

      (2) In the RNA-seq analysis, the directionality of change of each of the reported pathways is not apparent in the way the data have been presented. For example, are genes in the cytokine-cytokine receptor interaction or TNF signalling pathway expressed more, or less in the ∆mmpE strain?

      We thank the reviewer for the comment. The KEGG pathway enrichment diagrams in our RNA-seq analysis primarily reflect the statistical significance of pathway enrichment based on differentially expressed genes, but do not indicate the directionality of genes expression changes. To address this concern, we conducted qRT-PCR on genes associated with the cytokine-cytokine receptor interaction pathway, specifically IL23A, CSF2, and IL12B. The results showed that, compared to the WT strain, infection with the ΔMmpE strain resulted in significantly increased expression levels of these genes in THP-1 cells (Figure 4F, Figure S4B), consistent with the RNA-seq data. Furthermore, we have submitted the complete RNA-seq dataset to the NCBI GEO repository [GSE312039], which includes normalized expression values and differential expression results for all detected genes.

      (3) Several of these pathways are affected as a result of infection, while others are not induced by BCG infection. For example, BCG infection does not, on its own, produce changes in IL1β levels. As the author s did not compare the uninfected macrophages as a control, it is difficult to interpret whether ∆mmpE induced higher expression than the WT strain, or simply did not induce a gene while the WT strain suppressed expression of a gene. This is particularly important because the strain is attenuated. Does the attenuation have anything to do with the ability of the protein to induce lysosomal pathway genes? Does induction of this pathway lead to attenuation of the strain? Similarly, for pathways that seem to be downregulated in the ∆mmpE strain compared to the WT strain, these might have been induced upon infection with the WT strain but not sufficiently by the ∆mmpE strain due to its attenuation/ lower bacterial burden.

      We thank the reviewer for the comment. Previous studies have shown that wild-type BCG induces relatively low levels of IL-1β, while retaining partial capacity to activate the inflammasome (Qu et al., Sci Adv, 2020). Our data (Figures 3G) show that infection with the ΔMmpE strain results in enhanced IL-1β expression, consistent with findings by Master et al. (Cell Host Microbe, 2008), in which deletion of zmp1 in BCG or M. tuberculosis led to increased IL-1β levels due to reduced inhibition of inflammasome activation.

      In the revised manuscript, we have provided additional qRT-PCR data using uninfected macrophages as a baseline control. These results demonstrate that the WT strain suppresses lysosome-associated gene expression, whereas the ΔMmpE strain upregulates these genes, indicating that MmpE inhibits lysosome-related genes expression (Figure 4G). Furthermore, bacterial burden analysis revealed that ∆mmpE exhibited ~3-fold lower intracellular survival than the WT strain in THP-1 cells. However, when lysosomal maturation was inhibited, the difference in bacterial load between the two strains was reduced to ~1-fold (New Figures S6B and C). These findings indicate that MmpE promotes intracellular survival primarily by inhibiting lysosomal maturation, which is consistent with a previous study (Chandra et al., Sci Rep, 2015).

      (4) CHIP-seq should be performed in THP1 macrophages, and not in HEK293T. Overexpression of a nuclear-localised protein in a non-relevant line is likely to lead to several transcriptional changes that do not inform us of the role of the gene as a transcriptional regulator during infection.

      We thank the reviewer for the comment. We performed ChIP-seq in HEK293T cells based on their high transfection efficiency, robust nuclear protein expression, and well-annotated genome (Lampe et al., Nat Biotechnol, 2024; Marasco et al., Cell, 2022). These characteristics make HEK293T an ideal system for the initial identification of genome-wide chromatin binding profiles by MmpE.

      Further, we performed comprehensive validation of the ChIP-seq findings in THP-1 macrophages. First, CUT&Tag and RNA-seq analyses in THP-1 cells revealed that MmpE modulates genes involved in the PI3K–AKT signaling and lysosomal maturation pathways (Figure 4C; Figure S5A-B). Correspondingly, we found that infection with the ΔMmpE strain led to reduced phosphorylation of AKT (S473), mTOR (S2448), and p70S6K (T389) (New Figure 5E-F), and upregulation of lysosomal genes such as TFEB, LAMP1, and LAMP2 (Figure 4G), compared to infection with the WT strain, and lysosomal maturation in cells infected with the ΔMmpE strain more obviously (New Figure 5G). Additionally, CUT&Tag profiling identified MmpE binding at the promoter region of the VDR gene, which was further validated by EMSA and ChIP-qPCR. Also, qRT-PCR demonstrated that MmpE suppresses VDR transcription, supporting its role as a transcriptional regulator (Figure 6). Collectively, these data confirm the biological relevance and functional significance of the ChIP-seq findings obtained in HEK293T cells.

      (5) I would not expect to see such large inflammatory reactions persisting 56 days post-infection with M. bovis BCG. Is this something peculiar for an intratracheal infection with 1x107 bacilli? For images of animal tissue, the authors should provide images of the entire lung lobe with the zoomed-in image indicated as an inset.

      We thank the reviewer for the comment. The lung inflammation peaked at days 21–28 and had clearly subsided by day 56 across all groups (New Figure 7B), consistent with the expected resolution of immune responses to an attenuated strain like M. bovis BCG. This temporal pattern is in line with previous studies using intravenous or intratracheal BCG vaccination in mice and macaques, which also demonstrated robust early immune activation followed by resolution over time (Smith et al., Nat Microbiol, 2025; Darrah et al., Nature, 2020).

      In this study, the infectious dose (1×10<sup>7</sup> CFU intratracheal) was selected based on previous studies in which intratracheal delivery of 1×10<sup>7</sup> CFU produced consistent and measurable lung immune responses and pathology without causing overt illness or mortality (Xu et al., Sci Rep, 2017; Niroula et al., Sci Rep, 2025). We have provided whole-lung lobe images with zoomed-in insets in the source dataset.

      (6) For the qRT-PCR based validation, infections should be performed with the MmpE-complemented strain in the same experiments as those for the WT and ∆mmpE strain so that they can be on the same graph, in the main manuscript file. Supplementary Figure 4 has three complementary strains. Again, the absence of the uninfected, WT, and ∆mmpE infected condition makes interpretation of these data very difficult.

      We thank the reviewer for the comment. As suggested, we have conducted the qRT-PCR experiment including the uninfected, WT, ∆mmpE, Comp-MmpE, and the three complementary strains infecting THP-1 cells (Figure 4F and G; New Figure S4B–D).

      (7) The abstract mentions that MmpE represses the PI3K-Akt-mTOR pathway, which arrests phagosome maturation. There is not enough data in this manuscript in support of this claim. Supplementary Figure 5 does provide qRT-PCR validation of genes of this pathway, but the data do not indicate that higher expression of these pathways, whether by VDR repression or otherwise, is driving the growth restriction of the ∆mmpE strain.

      We thank the reviewer for the comment. In the updated manuscript, we have provided more evidence. First, the RNA-seq analysis indicated that MmpE affects the PI3K-AKT signaling pathway (Figure 4C). Second, CUT&Tag analysis suggested that MmpE binds to the promoter regions of key pathway components, including PRKCBPLCG2, and PIK3CB (Figure S5A). Third, confocal microscopy showed that ΔMmpE strain promotes significantly increased lysosomal maturation compared to the WT, a process downstream of the PI3K-AKT-mTOR axis (New Figure 5G).

      Further, we measured protein phosphorylation for validating activation of the pathway (Zhang et al., Stem Cell Reports, 2017). Our results showed that cells infected with WT strains exhibited significantly higher phosphorylation of Akt, mTOR, and p70S6K compared to those infected with ΔMmpE strains (New Figures 5E and F). Moreover, the dual PI3K/mTOR inhibitor BEZ235 abolished the survival advantage of WT strains over ΔMmpE mutants in THP-1 macrophages (New Figure S6B and C). Collectively, these results support that MmpE activates the PI3K–Akt–mTOR signaling pathway to enhance bacterial survival within the host.

      (8) The relevance of the NLS and the phosphatase activity is not completely clear in the CFU assays and in the gene expression data. Firstly, there needs to be immunoblot data provided for the expression and secretion of the NLS-deficient and phosphatase mutants. Secondly, CFU data in Figure 3A, C, and E must consistently include both the WT and ∆mmpE strain.

      We thank the reviewer for the comment. We have now added immunoblot analysis for expression and secretion of MmpE mutants. The result show that NLS-deficient and phosphatase mutants can detected in supernatant (New Figure S1C). Additionally, we have revised Figures 3A, 3C, and 3E to consistently include both the WT and ΔMmpE strains in the CFU assays (Figures 3A, 3C, and 3E).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors should attempt to address the following comments:

      (1) Please perform densitometric analysis for the western blot shown in Figure 1E.

      We sincerely thank the reviewer for the suggestion. In the updated manuscript, we have performed densitometric analysis of the western blot shown in New Figure 1F and G.

      (2) Is it possible to measure the protein levels for MmpE in lysates prepared from infected macrophages.

      We thank the reviewer for the comment. In the revised manuscript, we performed immunoblot analysis to measure MmpE levels in lysates from infected macrophages. The results demonstrated that wild-type MmpE was present in both the cytoplasmic and nuclear fractions during infection in THP-1 cells (New Figure S1D).

      (3) The authors should perform circular dichroism studies to compare the secondary structure of wild type and mutant proteins (in particular MmpEHis348 and MmpEAsn359.

      We thank the reviewer for this valuable suggestion. We agree that circular dichroism spectroscopy could provide useful information in comparison of the differences on the secondary structures. However, due to the technical limitations, we instead compared the structures of wild-type MmpE and the His348 and Asn359 mutant proteins predicted by AlphaFold. These structural models showed almost no differences in secondary structures between the wild-type and mutants (Figure S1B).

      (4) The authors should perform more experiments to determine the binding motif for MmpE in the promoter region of VDR.

      We thank the reviewer for this suggestion. In the current study, we have identified the MmpE-binding motif within the promoter region of VDR using CUT&Tag sequencing. This prediction was further validated by ChIP-qPCR and EMSA (Figure 6). These complementary approaches collectively support the identification of a specific MmpE-binding motif and demonstrate its functional relevance. Such approach was acceptable in many publications (Wen et al., Commun Biol, 2020; Li et al., Nat Commun, 2022).

      (5) Were the transcript levels of VDR also measured in the lung tissues of infected animals?

      We thank the reviewer for this suggestion. In the revised manuscript, we have performed qRT-PCR to assess VDR transcript levels in the lung tissues of infected mice (New Figure S8B).

      (6) How does MmpE regulate the expression of lysosome-associated genes?

      We thank the reviewer for this question. Our experiments suggested that MmpE suppresses lysosomal maturation probably by activating the host PI3K–AKT–mTOR signaling pathway (New Figure 5E–I). This pathway is well established as a negative regulator of lysosome biogenesis and function (Yang et al., Signal Transduct Target Ther, 2020; Cui et al., Nature, 2023; Cui et al., Nature, 2025). During infection, THP-1 cells infected with the WT showed increased phosphorylation of Akt, mTOR, and p70S6K compared to those infected with ΔMmpE (New Figure S5C, New Figure 5E and F), and concurrently downregulated key lysosomal maturation markers, including TFEB, LAMP1, LAMP2, and multiple V-ATPase subunits (Figure 4G). Given that PI3K–AKT–mTOR signaling suppresses TFEB activity and lysosomal gene transcription (Palmieri et al., Nat Commun, 2017), we propose that MmpE modulates lysosome-associated gene expression and lysosomal function probably by PI3K–AKT–mTOR signaling pathway.

      (7) Mice experiment:

      (a) The methods section states that mice were infected intranasally, but the legend for Figure 6 states intratracheally. Kindly check?

      (b) Supplementary Figure 7 - this is not clear. The legend says bacterial loads in spleens (CFU/g) instead of DNA expression, as shown in the figure.

      (c) The data in Figure 6 and Figure S7 seem to be derived from the same experiment, but the number of animals is different. In Figure 6, it is n = 6, and in Figure S7, it is n=3.

      We thank the reviewer for the comments.

      (a) The infection was performed intranasally, and the figure legend for New Figure 7 has now been corrected.

      (b) We adopted quantitative PCR method to measure bacterial DNA levels in the spleens of infected mice. We have now revised the legend.

      (c) We have conducted new experiments where each experiment now includes six mice. The results are showed in Figure 7B and C, as well as in the new Figure S8.

      (8) The authors should show individual values for various replicates in bar graphs (for all figures).

      We thank the reviewer for this helpful suggestion. We have now updated all relevant bar graphs to include individual data points for each biological replicate.

      (9) The authors should validate the relative levels of a few DEGs shown in Figure 3F, Figure 3G, and Figure S4C, in the lung tissues of mice infected with wild-type, mutant, and complemented strains.

      We thank the reviewer for this suggestion. In the revised manuscript, we have performed qRT-PCR to validate the expression levels of selected DEGs, including inflammation-related and lysosome-associated genes, in lung tissues from mice infected with wild-type, mutant, and complemented strains (New Figure S8C-H).

      (10) Did the authors perform an animal experiment using a mutant strain complemented with the phosphatase-deficient MmpE (Comp-MmpE-H348AN359H)?

      We appreciate the reviewer's comment. We agree that an additional animal experiment would be useful to assess the effects of the phosphatase. However, our study mainly focused on interpreting the function of the nuclear localization of MmpE during BCG infection. Additionally, we have assessed the role of the phosphatase of MmpE during infection with cell model (Figure 3E).

      Minor comment:

      The mutant strain should be verified by either Southern blot or whole genome sequencing.

      We thank the reviewer for this comment. We verified deletion of mmpE gene by PCR method (Figure S3A-D) which was acceptable in many publications (Zhang et al., PLoS Pathog, 2020; Zhang et al., Nat Commun, 2022).

      Reviewer #3 (Recommendations for the authors):

      (1) Line 195: cytokine.

      We thank the reviewer for the comments. We have now corrected it.

      (2) Line 225: rewording required.

      Corrected.

      (3) Figure 4A. "No difference" instead of "No different".

      Corrected.

      (4) "KommpE" should be replaced with "∆mmpE strain" (∆=delta symbol).

      Corrected.

      (5) Supplementary Figure 7. The figure legend states CFU assays, but the y-axis and the graph seem to depict IS1081 quantification.

      We thank the reviewer for the comment. The figure is based on IS1081 quantification using qRT-PCR, not CFU assays. We have now revised the legend for New Figure S8A.

      References

      Chandra P, Ghanwat S, Matta SK, Yadav SS, Mehta M, Siddiqui Z, Singh A, Kumar D (2015) Mycobacterium tuberculosis Inhibits RAB7 Recruitment to Selectively Modulate Autophagy Flux in Macrophages Sci Rep 5:16320.

      Darrah PA, Zeppa JJ, Maiello P, Hackney JA, Wadsworth MH 2nd, Hughes TK, Pokkali S, Swanson PA 2nd, Grant NL, Rodgers MA, Kamath M, Causgrove CM, Laddy DJ, Bonavia A, Casimiro D, Lin PL, Klein E, White AG, Scanga CA, Shalek AK, Roederer M, Flynn JL, Seder RA (2020) Prevention of tuberculosis in macaques after intravenous BCG immunization Nature 577:95-102. 

      Forrellad MA, Blanco FC, Marrero Diaz de Villegas R, Vázquez CL, Yaneff A, García EA, Gutierrez MG, Durán R, Villarino A, Bigi F (2020) Rv2577 of Mycobacterium tuberculosis Is a virulence factor with dual phosphatase and phosphodiesterase functions Front Microbiol 11:570794.

      Innokentev A, Sanchez AM, Monetti M, Schwer B, Shuman S (2025) Efn1 and Efn2 are extracellular 5'-nucleotidases induced during the fission yeast response to phosphate starvation mBio 16: e0299224.

      Keppetipola N, Shuman S (2008) A phosphate-binding histidine of binuclear metallophosphodiesterase enzymes is a determinant of 2',3'-cyclic nucleotide phosphodiesterase activity J Biol Chem 283:30942-9.

      Lampe GD, King RT, Halpin-Healy TS, Klompe SE, Hogan MI, Vo PLH, Tang S, Chavez A, Sternberg SH (2024) Targeted DNA integration in human cells without double-strand breaks using CRISPR-associated transposases Nat Biotechnol 42:87-98.

      Li Z, Sheerin DJ, von Roepenack-Lahaye E, Stahl M, Hiltbrunner A (2022) The phytochrome interacting proteins ERF55 and ERF58 repress light-induced seed germination in Arabidopsis thaliana Nat Commun 13:1656.

      Marasco LE, Dujardin G, Sousa-Luís R, Liu YH, Stigliano JN, Nomakuchi T, Proudfoot NJ, Krainer AR, Kornblihtt AR (2022) Counteracting chromatin effects of a splicing-correcting antisense oligonucleotide improves its therapeutic efficacy in spinal muscular atrophy Cell 185:2057-2070.e15.

      Martins WK, Santos NF, Rocha CS, Bacellar IOL, Tsubone TM, Viotto AC, Matsukuma AY, Abrantes ABP, Siani P, Dias LG, Baptista MS (2019) Parallel damage in mitochondria and lysosomes is an efficient way to photoinduce cell death Autophagy 15:259-279.

      Master SS, Rampini SK, Davis AS, Keller C, Ehlers S, Springer B, Timmins GS, Sander P, Deretic V (2008) Mycobacterium tuberculosis prevents inflammasome activation Cell Host Microbe 3:224-32.

      Matange N, Podobnik M, Visweswariah SS (2015) Metallophosphoesterases: structural fidelity with functional promiscuity Biochem J 467:201-16.

      Niroula N, Ghodasara P, Marreros N, Fuller B, Sanderson H, Zriba S, Walker S, Shury TK, Chen JM (2025) Orally administered live BCG and heat-inactivated Mycobacterium bovis protect bison against experimental bovine tuberculosis Sci Rep 15:3764.

      Palmieri M, Pal R, Nelvagal HR, Lotfi P, Stinnett GR, Seymour ML, Chaudhury A, Bajaj L, Bondar VV, Bremner L, Saleem U, Tse DY, Sanagasetti D, Wu SM, Neilson JR, Pereira FA, Pautler RG, Rodney GG, Cooper JD, Sardiello M (2017) mTORC1-independent TFEB activation via Akt inhibition promotes cellular clearance in neurodegenerative storage diseases Nat Commun 8:14338.

      Péan CB, Schiebler M, Tan SW, Sharrock JA, Kierdorf K, Brown KP, Maserumule MC, Menezes S, Pilátová M, Bronda K, Guermonprez P, Stramer BM, Andres Floto R, Dionne MS (2017) Regulation of phagocyte triglyceride by a STAT-ATG2 pathway controls mycobacterial infection Nat Commun 8:14642.

      Qu Z, Zhou J, Zhou Y, Xie Y, Jiang Y, Wu J, Luo Z, Liu G, Yin L, Zhang XL (2020) Mycobacterial EST12 activates a RACK1-NLRP3-gasdermin D pyroptosis-IL-1β immune pathway Sci Adv 6: eaba4733.

      Shenoy AR, Capuder M, Draskovic P, Lamba D, Visweswariah SS, Podobnik M (2007) Structural and biochemical analysis of the Rv0805 cyclic nucleotide phosphodiesterase from Mycobacterium tuberculosis J Mol Biol 365:211-25.

      Smith AA, Su H, Wallach J, Liu Y, Maiello P, Borish HJ, Winchell C, Simonson AW, Lin PL, Rodgers M, Fillmore D, Sakal J, Lin K, Vinette V, Schnappinger D, Ehrt S, Flynn JL (2025) A BCG kill switch strain protects against Mycobacterium tuberculosis in mice and non-human primates with improved safety and immunogenicity Nat Microbiol 10:468-481.

      Wang J, Ge P, Qiang L, Tian F, Zhao D, Chai Q, Zhu M, Zhou R, Meng G, Iwakura Y, Gao GF, Liu CH (2017) The mycobacterial phosphatase PtpA regulates the expression of host genes and promotes cell proliferation Nat Commun 8:244.

      Wang J, Li BX, Ge PP, Li J, Wang Q, Gao GF, Qiu XB, Liu CH (2015) Mycobacterium tuberculosis suppresses innate immunity by coopting the host ubiquitin system Nat Immunol 16:237–245

      Wen X, Wang J, Zhang D, Ding Y, Ji X, Tan Z, Wang Y (2020) Reverse Chromatin Immunoprecipitation (R-ChIP) enables investigation of the upstream regulators of plant genes Commun Biol 3:770.

      Xu X, Lu X, Dong X, Luo Y, Wang Q, Liu X, Fu J, Zhang Y, Zhu B, Ma X (2017) Effects of hMASP-2 on the formation of BCG infection-induced granuloma in the lungs of BALB/c mice Sci Rep 7:2300.

      Zhang L, Hendrickson RC, Meikle V, Lefkowitz EJ, Ioerger TR, Niederweis M. (2020) Comprehensive analysis of iron utilization by Mycobacterium tuberculosis PLoS Pathog 16: e1008337.

      Zhang L, Kent JE, Whitaker M, Young DC, Herrmann D, Aleshin AE, Ko YH, Cingolani G, Saad JS, Moody DB, Marassi FM, Ehrt S, Niederweis M (2022) A periplasmic cinched protein is required for siderophore secretion and virulence of Mycobacterium tuberculosis Nat Commun 13:2255.

      Zhang X, He X, Li Q, Kong X, Ou Z, Zhang L, Gong Z, Long D, Li J, Zhang M, Ji W, Zhang W, Xu L, Xuan A (2017) PI3K/AKT/mTOR Signaling Mediates Valproic Acid-Induced Neuronal Differentiation of Neural Stem Cells through Epigenetic Modifications Stem Cell Reports 8:1256-1269.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Thank you for the authors' responses to my concerns. I do not have any further comments.

      We thank this reviewer for the positive and constructive evaluation of our manuscript.

      Reviewer #2 (Public Review):

      I have no further comment about this amended version, aside from suggesting to add (if known) the time at which biopsies were collected. Time-of-day is an important yet often overlooked parameter of gene expression variation, and along the same line, the imposed fasting to bariatric surgery patients is also a matter of variation of gene expression and of metabolite abundance. It is hoped that future investigations will more precisely characterize the role of the newly identified targets in MASLD.

      We agree with this and are fully aware that metabolism in the liver is controlled by circadian rhythm and therefore the time-of-day is an important parameter when liver samples are collected. All liver samples were collected between 8am and 1pm, and this information has been added to the Methods section. We are already working on the characterization of the newly identified targets. Thank you for the positive and constructive evaluation of our manuscript.

      Reviewer #3 (Public Review):

      (1) Confounders (such as (pre-)diabetes)

      The patient table shows significant differences in non-MASLD vs. MASLD individuals, with the latter suffering more often from diabetes or hypertriglyceridemia. Rather than just stating corrections, subgroup analyses should be performed (accompanied with designated statistical power analyses) to infer the degree to which these conditions contribute to the observations. I.e., major findings stating MASLD-associated changes should hold true in the subgroup of MASLD patients without diabetes/of female sex and so forth (testing for each of the significant differences between groups).

      Post-rebuttal update: The authors have performed the requested sub-group analysis and find the gene signatures hold for the non-diabetic sub-cohort, but not the diabetic subgroup. They denote a likely interaction between fibrosis and diabetes, that was not corrected for in the original analysis.

      (2) External validation

      Additionally, to back up the major GTPase signature findings, it would be desirable to analyze an external dataset of (pre)diabetes patients (other biased groups) for alternations in these genes. It would be important to know if this signature also shows in non-MASLD diabetic patients vs. healthy patients or is a feature specific to MASLD. Also, could the matched metabolic data be used to validate metabolite alterations that would be expected under GTPase-associated protein dysregulation?

      Post-rebuttal update: The authors confirm that with the present data, insulin resistance cannot be fully ruled out as a confounder to the GTPase related gene signature. They however plan future mouse model experiments to study whether the GTPase-fibrosis signature differs in diabetic vs. non-diabetic conditions.

      (3) 3D liver spheroid MASH model, Fig. 6D/E

      This 3D experiment is technically not an external validation of GTPase-related genes being involved in MASLD, since patient-derived cells may only retain changes that have happened in vivo. To demonstrate that the GTPase expression signature is specifically invoked by fibrosis the LX-2 set up is more convincing, however, the up-regulation of the GTPase-related genes upon fibrosis induction with TGF-beta, in concordance with the patient data, needs to be shown first (qPCR or RNA-seq). Additionally, the description of the 3D model is too uncritical. The maintenance of functional PHHs is a major challenge (PMID: 38750036, PMID: 21953633, PMID: 40240606, PMID: 31023926). It cannot be ruled out that their findings are largely attributable to either 1) the (other present) mesenchymal cells (i.e., mesenchyme-derived cells, such as for example hepatic stellate cells, not to be confused with mesenchymal stem cells, MSCs), or 2) related to potential changes in PHHs in culture, and these limitations need to be stated.

      Post-rebuttal update: To address the concern of other cells than hepatocytes contributing to the observed effects in culture, the authors performed TGF-beta treatment in independent mono-cultures (Figure R4): LX-2 and hepatocytes, and the spheroid system. Surprisingly, important genes highlighted in Figure 6E for the spheroid system (RAB6A, ARL4A, RAB27B, DIRAS2) are all absent from this qPCR(?) validation experiment. The authors evaluate instead RAC1, RHOU, VAV1, DOCK2, RAB32. -In spheroids, RHOU and RAB32 are down-regulated with TGF-B. In hepatocytes DOCK2 and RAC seemed up-regulated. They find no difference in these genes in LX-2 cells. Surprisingly, ACTA2 expression values are missing for LX-2 cells. Together, it is hard to judge which individual cell type recapitulates the changes observed in patients in this validation experiment, as the major genes called out in Figure 6E are not analyzed.

      All biological experiments show variations and especially when analyzing various cell types (lines), we are not completely surprised that not all results are completely aligned. In other words, some of the GTPases will be upregulated in hepatocytes, while other may be upregulated in hepatic stellate cells due to the complex signaling arrangement in each cell. To address this reviewer’s concerns, we have done qPCR for RAB6A, ARL4A, RAB27B, DIRAS2 in LX-2 cells and the results are shown in the revised now Figure 6– figure supplement 5. To align all three graphs displaying the same genes analyzed, we have now depicted the gene expression for the co-culture (hepatocytes, hepatic stellate cells, and Kupffer cells) and mono-culture (hepatocytes only) from RNAseq analysis.

      Unfortunately, the 3D liver spheroid model used (as presente-d in PMID39605182) lacks important functional validation tests of maintained hepatocyte identity in culture (at the very least Albumin expression and secretion plus CYP3A4 assay). This functional data (acquired at the time point in culture when the RNA expression analysis in 6E was performed) is indispensable prior to stating that mature hepatocytes cause the observed effects.

      We agree that the characterization of the liver spheroid model derived from human patient samples is important. The functional characterization has already been published in these papers:

      (1) Bell, C. C. et al. Transcriptional, Functional, and Mechanistic Comparisons of Stem Cell–Derived Hepatocytes, HepaRG Cells, and Three-Dimensional Human Hepatocyte Spheroids as Predictive In Vitro Systems for Drug-Induced Liver Injury. Drug Metab. Dispos. 45, 419–429 (2017).

      (2) Bell, C. C. et al. Characterization of primary human hepatocyte spheroids as a model system for drug-induced liver injury, liver function and disease. Sci. Rep. 6, 25187 (2016). 3.Vorrink, S. U. et al. Endogenous and xenobiotic metabolic stability of primary human hepatocytes in long‐term 3D spheroid cultures revealed by a combination of targeted and untargeted metabolomics. FASEB J. 31, 2696–2708 (2017).

      (4) Messner, S. et al. Transcriptomic, Proteomic, and Functional Long-Term Characterization of Multicellular Three-Dimensional Human Liver Microtissues. Appl. In Vitro Toxicol. 4, 1–12 (2018).

      (5) Bell, C. C. et al. Comparison of Hepatic 2D Sandwich Cultures and 3D Spheroids for Long-term Toxicity Applications: A Multicenter Study. Toxicol. Sci. 162, 655–666 (2018). We have mentioned this now in the manuscript on page 18 to make this point clear.

      (4) Novelty / references

      Similar studies that also combined liver and blood lipidomics/metabolomics in obese individuals with and without MASLD (e.g. PMID 39731853, 39653777) should be cited. Additionally, it would benefit the quality of the discussion to state how findings in this study add new insights over previous studies, if their findings/insights differ, and if so, why.

      Post-rebuttal update: The authors have included the studies into their discussion.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1) Add the plots showing diabetes/non-diabetes sub-group analysis and power estimates to the Supplementary Figures (rather than just as a Supplementary table)

      We have added this as Figure 5-figure supplement 2 in the revised manuscript (R2).

      (2) Add a short note on the validity of the results limiting to the non-diabetes subgroup to the limitations section

      We have done this in the revised manuscript (R2).

      (3) Add a short note on the missing adjustment for fibrosis/diabetes interactions in the study to the limitations paragraph

      We appreciate the reviewer’s suggestion to address the lack of adjustment for potential fibrosis–diabetes interaction. We added a note to the limitations paragraph in the Limitations section. Although diabetes considerably modulates the risk for steatohepatitis, only a small number of participants had diabetes (29 of 109) in our study, undermining statistical power to detect meaningful interaction effects.

      Author response table 1.

      (4) Fig S10/6E: In vitro TGF-b stimulation on spheroids, LX-2 cells, hepatocytes: evaluate expression of RAB6A, ARL4A, RAB27B, DIRAS2 genes from 6E to create consistency between the findings. Confirm ACTA2 up-regulation in LX-2 cells treated with TGF-β as a positive control. Also specify methods for gene expression analysis in spheroids and the cell types in the figure legends (RNA-Seq? qPCR?)

      To address this reviewer’s concerns, we have done qPCR for RAB6A, ARL4A, RAB27B, DIRAS2 in LX-2 cells stimulated with TGF-β and the results are shown in the revised now Figure 6–figure supplement 5. To align all three graphs displaying the same genes analyzed, we have now depicted the gene expression for the co-culture (hepatocytes, hepatic stellate cells, and Kupffer cells) and mono-culture (hepatocytes only) from RNAseq analysis. We have also updated the methods that we used in the figure legend.

      (5) Validate the functionality of hepatocytes in the 3D liver spheroid model used (PMID: 39605182) at the time points of which the experiments have been performed (e.g. Albumin secretion, CYP-assays).

      We agree that the characterization of the liver spheroids from human patients using fully differentiated cells, is important but this has already been done and is published in these papers:

      (1) Bell, C. C. et al. Transcriptional, Functional, and Mechanistic Comparisons of Stem Cell–Derived Hepatocytes, HepaRG Cells, and Three-Dimensional Human Hepatocyte Spheroids as Predictive In Vitro Systems for Drug-Induced Liver Injury. Drug Metab. Dispos. 45, 419–429 (2017).

      (2) Bell, C. C. et al. Characterization of primary human hepatocyte spheroids as a model system for drug-induced liver injury, liver function and disease. Sci. Rep. 6, 25187 (2016). 3.Vorrink, S. U. et al. Endogenous and xenobiotic metabolic stability of primary human hepatocytes in long‐term 3D spheroid cultures revealed by a combination of targeted and untargeted metabolomics. FASEB J. 31, 2696–2708 (2017).

      (4) Messner, S. et al. Transcriptomic, Proteomic, and Functional Long-Term Characterization of Multicellular Three-Dimensional Human Liver Microtissues. Appl. In Vitro Toxicol. 4, 1–12 (2018).

      (5) Bell, C. C. et al. Comparison of Hepatic 2D Sandwich Cultures and 3D Spheroids for Long-term Toxicity Applications: A Multicenter Study. Toxicol. Sci. 162, 655–666 (2018).

      We have mentioned this now in the manuscript on page 18 and also the Limitation section to make this point clear.

      (6) Add a note on limitations of the PHH-spheroid and cell line in vitro models to the limitations section and discuss the need for future experiments to examine the cellular crosstalk and cell types potentially responsible for the proposed GTPase-gene dysregulation.

      We have added this to the limitation section on page 13 this in the revised manuscript (R2).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The hippocampus, especially the ventral subregion, has been related to emotional processing. However, the specific circuitry involved deserves further investigation. By using a bidirectional optogenetic modulation, Kambali et al. have investigated the role of different inputs to vCA1 (i.e., from vCA3 and entorhinal cortex) in anxiety- and fear-related responses. The major findings of this work suggested that both inputs to vCA1 control fear-related responses, whereas only the projection between vCA3 and vCA1 controls anxiety-related behavior. Overall, the authors used an advanced methodological approach, which allows them to modulate specific brain circuits, to study specific hippocampal projections, providing some new information regarding the hippocampal function in anxiety and fear.

      Strengths:

      (1) The manuscript is well written, clear and has a detailed and specific discussion.

      (2) Results from each optogenetic manipulation are clear in different anxiety- and fear-related tasks, demonstrating the robustness of the findings.

      (3) The overall conclusions are very interesting and might be relevant for the field of mental health disorders accompanied by anxiety- and fear-related alterations.

      Weaknesses:

      (1) The major differences in basal behavioral performance in the different paradigms between the two optogenetic modulations prevent the achievement of strong conclusive results.

      The two projections of ventral CA1 were studied independently in different cohorts of animals tested at different times during the study. This difference in timing may have contributed to variations in the basal behavioral performance between the two projections. Importantly we found that within each cohort – control and optogenetic manipulation, the basal performance within each set of experiments (i.e., corresponding to projections) is highly consistent, e.g., basal cued and contextual freezing responses and responses to OFF conditions in Vogel conflict test. Moreover, the ANOVA statistics conducted across the baseline and ON conditions for each task revealed robust significant effects of bidirectional optogenetic modulation for each cohort. In case of the fear responses, a point to note is that the freezing levels in SHAM controls differ between projections but are consistent between two types of assessments (tone and context) within each projection. We will mention these limitations in the revised manuscript.

      (2) Data presentation and representative figures need a major revision.

      The figures will be rearranged according to the projections. The anxiety-related figures and fear response related figures will be grouped for each projection to improve clarity and readability. The revised manuscript will include representative heat maps for each behavioral task for both projections in addition to population quantification data.

      (3) No analysis has been performed to analyze potential sex differences in behavioral domains where sex is important.

      This assessment was not done in the original submission. We will perform statistical analysis for male and female mice separately and if the results are sex-dependent, we will present separate figures. Otherwise, the combined data presentation will be followed.

      Reviewer #2 (Public review):

      Summary:

      This paper uses an optogenetic approach to either activate or inhibit separate neural pathways projecting to the ventral CA1 hippocampal subregion, from either CA3 or the entorhinal cortex. The authors report that manipulation of the vCA3→vCA1 pathway affected behavioural performance on a number of tasks: elevated plus maze, open field, Vogel conflict test and freezing behaviour to both context and a trace CS cue. In contrast, optogenetic manipulation of neural activity in the EC→vCA1 pathway only affected behaviour on the trace CS/context fear memory test but had no effect on the elevated plus maze, open field or Vogel conflict test. The authors suggest different roles for these two ventral hippocampal pathways in fear versus anxiety.

      Strengths:

      This is an interesting study addressing an important question in a highly topical subject area. The experiments are well conducted and have generated interesting and important data.

      Weaknesses:

      While I am broadly sympathetic to the overall narrative of the paper, I have some questions/comments around the specific interpretation of the results presented. In my view, the authors' claims may not be completely supported by their data, but the data are interesting nonetheless.

      In terms of the framework presented by the authors for interpreting their data, many would argue that freezing (or at least reduced activity/behavioural inhibition) to the context provides a readout of conditioned anxiety rather than fear. In this sense, the context is a signal of potential threat (i.e. the context becomes associated with both shock and with the absence of shock) and thus generates anxiety rather than fear. Likewise, the trace CS cue could be considered as an ambiguous predictor of shock in that the shock doesn't occur straight away.

      In contrast, a punctate CS cue which co-terminates with shock would be a reliable signal of imminent threat and thus generates a fear response. Thus, it might be argued that all of the assays adopted by the authors are readouts of anxiety (albeit comprising tests of both conditioned and unconditioned anxiety).

      We agree with the reviewer that context and trace fear conditioning do not represent an “imminent” threat as severe as would likely be internalized in delay fear conditioning. However, the goal of the study was to probe hippocampal dependent processes (contextual and trace fear conditioning are strongly modulated by the hippocampus while delay conditioning is not). Consistent with several other studies, we believe the conditional nature of the task (context and trace are invariably linked to shock) provides support for a “non-ambiguous” relationship that is conducive for measuring the assessment of fear-based behavior.

      Several studies show clear differences in the involvement of amygdala and hippocampus in delay vs. trace fear conditioning. Inactivating amygdala led to deficits in contextual and delay conditioning but had no effect on trace conditioning. In contrast, inactivating hippocampus led to deficits in trace and contextual but not delay fear conditioning. These findings suggest that a temporal gap between the CS and US can generate amygdala-independent but hippocampal-dependent fear conditioning (Raybuck J. D., Lattal K. M 2011, PMID: 21283812). Lesions of the entorhinal cortex impair the acquisition of trace fear conditioning but not the acquisition of delay fear conditioning (Raybuck J. D., Lattal K. M 2011, PMID: 21283812) . Further, using single unit recording during fear retention tests after delay or trace fear conditioning, the study showed that entorhinal neurons specifically respond after trace but not after delay fear conditioning (Kong et al 2023, PMID: 36919333). These findings demonstrate that trace fear conditioning and delay fear conditioning may involve overlapping but largely different neuronal circuits. A knockdown of the expression of the α5-subunit–containing GABA<sub>𝐴</sub> receptors in the CA1 region (α5CA1KO mice) leads to improved spatial learning and enhanced trace fear conditioning memory, actually to the level of delay fear conditioning, suggesting that α5GABA<sub>𝐴</sub>Rs in CA1 pyramidal neurons normally constrain hippocampus-dependent memory processes and that trace fear conditioning in the absence of a5-GABA<sub>𝐴</sub> receptors in CA1 has the same effect size as delay fear conditioning (Engin et al 2020, PMID: 32934095), supporting the view that trace fear conditioning is not “ambiguous”.

      For example, from the authors' perspective, it is not clear a priori why the Vogel conflict test is considered anxiety, but contextual freezing is considered fear? Indeed, in the Discussion, the authors mention another study in which the data from the Vogel conflict test align with fear assays rather than anxiety tests. Can the authors elaborate on their distinction? I appreciate that, in practice, it might be difficult to distinguish between fear and anxiety at the behavioral level in rodents (although opposing effects of fear and anxiety on pain responses might be one option). At the very least, this issue merits further discussion.

      We will make this distinction clearer in the revisions. Briefly, behavioral actions in the Vogel conflict test are generally considered to be most pertinent to general anxiety disorders in humans and anxiolytics have high predictive validity in animals in this task. In particular, the robust actions of benzodiazepines and 5-HT<sub>1A</sub> partial agonists parallel their clinical efficacy in patients (McMillan and Brocco, 2003, PMID: 12600703).

      Our previous study (Engin et al 2016, PMID: 26971710) used global diazepam-induced neuronal inhibition and identified that positive modulation of α2-GABA<sub>𝐴</sub>Rs in dentate gyrus granule cells and CA3 pyramidal neurons is required to reduce anxiety-like behaviors while inhibition of positive modulation of α2-GABA<sub>𝐴</sub>Rs in CA1 pyramidal neurons is required to reduce fear-related behaviors. The effects were absent when α2-GABA<sub>𝐴</sub>Rs was knocked out in the respective subregions. These results indicate that these intrahippocampal subregions can modulate fear and anxiety-like behaviors independently of the amygdala. In the previous study we used conditional α2-GABA<sub>𝐴</sub>R knockouts in hippocampal subregions and subjected these mice to systemic diazepam. In these experiments, diazepam still acts on α1-, α3- and α5-<sub>𝐴</sub>Rs in the hippocampal subregions and cell types in which when α2-GABA<sub>𝐴</sub>Rs are lacking. Therefore, for example when α2CA1KO mice were administered diazepam, diazepam still led to inhibition of pyramidal neurons in CA3 and DG via α1-, α2-, α3- and α5- GABA<sub>𝐴</sub>Rs, and in addition, diazepam also inhibited α1-, α3- and α5- GABA<sub>𝐴</sub>Rs in CA1 itself. Diazepam also acted on GABA<sub>𝐴</sub>Rs in amygdala or other brain regions. These are fundamentally different experimental conditions compared to the optogenetic experiment described in this paper. Moreover, in contrast to the current paper, the previous work did not examine projections but used global diazepam-induced neuronal inhibition as a baseline. Moreover, whereas the previous paper examined whether a specific neuronal cell type was required for anxiolytic-like or fear-like actions, the current manuscript examined whether activation or inhibition of neuronal projections is sufficient to modulate anxiety- and fear-related behaviors. Overall, one cannot easily compare the results in the Vogel conflict test in both papers.

      Another question is whether rather than representing a qualitative difference between the contributions of the vCA3→vCA1 and EC→vCA1 pathways to different aspects of fear/anxiety behaviours, the different results reflect a quantitative difference between the magnitude of effects in vCA1 that are generated from optogenetic manipulation of the two pathways, coupled with the possibility that behaviour on the trace CS/context fear memory task is more sensitive to manipulation than the "anxiety tests". The possibility that vCA3→vCA1 stimulation is more effective is potentially supported by the c-fos measurements in vCA1. vCA3→vCA1 stimulation produced a much bigger vCA1 c-fos response (approx. 350% c-fos cell activation; see Figure 1E) compared to activation of the EC→vCA1 pathway (approx. 170% c-fos cell activation; see Figure 4E).

      Furthermore, in some studies, there seem to be quite large differences between the laser OFF conditions for the different groups (which presumably one would not expect to be different). For example, compare laser OFF for the Inhibition group for time in open arms of EPM in Figure 5C (> 40%) versus laser OFF for the Inhibition group for time in open arms of EPM in Fig. 2C (< 20%). This could potentially result in ceiling effects, such that it is very hard to see a further increase in time in the open arms from a level already above 40% when the laser is then switched on. This could complicate the interpretation of the laser ON condition.

      The magnitude of activation as evidenced by c-fos measurements differs between the two projections. This might reflect different levels of modulations of CA1 neuronal activity. The fact that the two projections were studied at different time points (see response to reviewer 1) may also have contributed to the difference. The revised manuscript will include a formal discussion about magnitude of modulation that could contribute to differential sensitivity for the modulation of anxiety-like behaviors. However, the inputs from these two projections systems target different regions of CA1 pyramidal neurons and each pathway has distinct roles in other processes (sensory versus memory-based completion) – thus a dissociation may also be present for other types of behavior as well including the modulation of anxiety-like behaviors.

      While it is possible that ceiling effects could impact our interpretation, we believe ceiling effects would only impact one direction of the optogenetic manipulation and there was no effect of activation (Fig. 5C) or bidirectional modulation of anxiety-related behavior in the novel open field test (Fig. 5F) which has levels of behavior comparable to Figure 2F.

      Likewise, there is a big difference between the behavioral performance of the two SHAM groups in Figure 3 (compare SHAM in 3 B, C and SHAM in 3 D, E). How is this explained? Could this generate a ceiling effect? This may also merit some discussion. More details on the SHAM procedure(s) in the main manuscript may also be helpful.

      With respect to contextual fear, ceiling effects are not a major factor as we still see enhanced freezing in the activation condition. With tone fear, we cannot formally exclude a ceiling effect, and this will be addressed as a potential confound in the manuscript.

      According to Figure 3A, the test of freezing response to the trace Tone CS is conducted in a different context from the conditioning context. The data presented in Figure 3 for tone fear are the levels of freezing during the presentation of this cue in different contexts. It would be important to present both pre-CS and CS freezing levels here to determine how much of the freezing is actually driven by the punctate tone CS. The pre-CS freezing levels in this different context would also provide a nice control for the contextual fear conditioning.

      We agree and will analyze and report the pre-CS freezing data in the revision.

      Reviewer #3 (Public review):

      Summary:

      In their paper entitled "Ventral hippocampal temporoammonic and Schaffer collateral pathways differential control fear- and anxiety-related behaviors" the authors use a bidirectional optogenetic approach to elucidate the role of temporammonic (TA) and Schaffer collateral (SC) inputs to the ventral hippocampus (CA1) in modulating both fear and anxiety-related behaviors. While fear and anxiety behaviors are often considered on a continuous spectrum, identifying neural pathways that are differentially activated represents an important open question in the field. The authors find that optogenetic stimulation or inhibition of the Schaffer Collateral pathway in the ventral hippocampus (CA3-CA1) bidirectionally modulates both fear-related and anxiety-related behavioral paradigms. More specifically, optogenetic excitation of the CA3-CA1 pathway using ChR2-expressing viral constructs increases anxiety-like behaviors in numerous behavioral paradigms (elevated plus maze, open field, Vogel conflict test). Conversely, optogenetic inhibition using halorhodopsin reduced anxiety-like behaviours. To examine fear behaviors, the authors examined contextual and trace fear conditioning. Similar to their results with anxiety-like behaviors, the authors observed bidirectional fear modulation following optogenetic stimulation of the vCA3-vCA1 pathway. The authors next examined the temporammonic pathway originating from the lateral entorhinal cortex to vCA1. Unlike with SC stimulation, stimulation of the TA pathway had no effect on anxiety-like behaviors but did bidirectionally modulate contextual fear conditioning. Together, these results differentiate the SC and TA pathways in the ventral hippocampus as distinct regulators of affective behavior.

      Strengths:

      The paper has numerous technical strengths, including dissecting the role of both excitation and inhibition of both pathways and the use of behavioral measures of anxiety and fear. This balanced and internally controlled design allows readers to evaluate the effects of both pathways in a single study, thereby reducing technical complications from experiments being completed across laboratories and experimental conditions.

      Weaknesses:

      There are a few limitations of the study, however, which bear discussion.

      (1) The authors use halorhodopsin to achieve optogenetic inhibition. Halorhodopsin is generally considered a first-generation optogenetic actuator, as it is a Cl- pump rather than an ion channel. This limits the degree of inhibition (i.e. by preventing shunting inhibition) and can result in altered chloride gradients in the period immediately following optogenetic stimulation. This is of particular concern in this paper as the stimulation parameters and behavioral analysis are not temporally correlated, therefore confounds of disrupted chloride cannot be experimentally accounted for or controlled.

      Choice of halorhodopsin was in part influenced by a report that spontaneous archaerhodopsin activation was paradoxically associated with increased spontaneous release of neurotransmitter from presynaptic terminals, whereas activation of chloride-reducing halorhodopsin triggered neurotransmitter release upon light onset (Mahn et al., PMID: 26950004), suggesting that halorhodospin may be advantageous in studies inhibiting presynaptic nerve terminals. Halorhodpsin has been used in several studies to effectively silence activity and had substantial influence on behavioral in our studies that was inversely proportional to ChR2 stimulation. While perhaps not optimal out of an abundance of caution, we chose it over Archaerhodopsin based on the cited literature.

      (2) The authors use an AAV-CaMKII-eGFP as a control (Sham) throughout the dataset; however, in the trace fear conditioning experiments, there are no AAV-CaMKII-ChR2-eYFP or AAV-CaMKII-eNpHR3.0-eYFP controls without optogenetic stimulation. Therefore, it is unclear the extent to which viral expression of optogenetic actuators impacts behavior. Additionally, the authors only provided optogenetic stimulation during contextual fear recall and tone fear recall. Additional experiments disrupting each pathway during trace conditioning would have provided additional insight into the role of each pathway in the initial encoding of fear memories.

      Thank you for your observation. We have used a SHAM control that was injected with the AAV vector without any opsins. In fear conditioning experiments we performed optogenetic manipulations only during the fear response either with context or cue recall. This aligned well with our hypothesis to test whether the intrahippocampal projections play any role in fear response modulation. Investigating the role of each pathway during acquisition of trace and/or contextual fear conditioning is also highly relevant; however, evaluating these projections in fear memory formation was beyond the scope of this study. The observation that we can bidirectionally modulate fear responses with light is consistent with (although it does not prove) a light-specific modulation. In any case, even if there were baseline effects without light, they would still be suggestive of the effects observed being mediated by the optogenetic actuators.

      (3) The location and extent of viral expression across animals were not systematically quantified.Overall, however, these weaknesses do not significantly detract from the main conclusions of the paper. The authors' data convincingly demonstrates that disruption of the trisynaptic circuit bidirectionally modulates both fear- and anxiety-like behaviors while disruption of the temporammonic pathway has no effect on anxiety-like behaviors but disrupts fear-related behaviors. It is interesting to note, however, that the TA activation had no effect on tone-related fear conditioning, suggesting a potential specialized role of the temporammonic pathway specifically in contextual fear memory.

      Thank you for your thoughtful description of the present study. It is true that TA pathway is distinct from vCA3 to vCA1 pathway in various ways, one being the synapse formation of these two projections are at different locations or layers on vCA1 neurons i.e., the TA pathway synapses on the stratum lacunosum-moleculare (LMol) layer while the vCA3 to vCA1 pathway synapses at stratum radiatum (Rad), close to the CA1 pyramidal cell layer, which is in line with differential functions of the two projections They modulate the pyramidal cell activity in a different way, with TA pathway synapses being distinct from vCA3 to vCA1 synapses on the pyramidal cell layer, which may result in different computational properties of the two projections. Additionally, TA projections are modulated by dopamine while projections from vCA3 are not, but the projections from vCA3 receive inputs from various sources including collaterals, and entorhinal via dentate gyrus. These distinct features of the two projections may contribute to differential modulation of vCA1 activity. We note that cue-related fear is not affected by the TA activation, however even in this case, the TA pathway activation by channelrhodopsin or inhibition by halorhodopsin results in a decrease or an increase of the contextual fear response, respectively.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study offers insights into the role of Leiomodin-1 (LMOD1) in muscle stem cell biology, advancing our understanding of myogenic differentiation and indicating LMOD1 as a regulator of muscle regeneration, aging, and exercise adaptation. The integration of in vitro and in vivo approaches, complemented by proteomic and imaging methodologies, is solid. However, certain aspects require further attention to improve the clarity, impact, and overall significance of the work, particularly in substantiating the in vivo relevance. This work will provide a starting point that will be of value to medical biologists and biochemists working on LMOD and its variants in muscle biology.

      Thank you for the positive feedback on our manuscript and the constructive criticism provided by the reviewers that helped us improve our manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript by Ori and colleagues investigates the role of Lmod1 in muscle stem cell activation and differentiation. The study begins with a time-course mass spectrometry analysis of primary muscle stem cells, identifying Lmod1 as a pro-myogenic candidate (Figure 1). While the initial approach is robust, the subsequent characterization lacks depth and clarity. Although the data suggest that Lmod1 promotes myogenesis, the underlying mechanisms remain vague, and key experiments are missing. Please find my comments below.

      We thank the reviewer for the positive feedback on our manuscript and the helpful comments, which helped improve it.

      (1) The authors mainly rely on coarse and less-established readouts such as myotube length and spherical Myh-positive cells. More comprehensive and standard analyses, such as co-staining for Pax7, MyoD, and Myogenin, would allow quantification of quiescent, activated, and differentiating stem cells in knockdown and overexpression experiments. The exact stage at which Lmod1 functions (stem cell, progenitor, or post-fusion) is unclear due to the limited depth of the analysis. Performing similar experiments on cultured single EDL fibers would add valuable insights.

      We thank the reviewer for this comment. In addition to performing standard measurements such as staining for Myogenin and Myosin Heavy Chain (Figure S2H), we focused on morphological readouts, such as myotube formation, because LMOD1 is an actin cytoskeleton-associated protein. Therefore, we reasoned its function would be most directly reflected in structural changes during differentiation, rather than solely in early transcriptional markers. 

      Regarding the use of standard markers, we have already performed co-staining for Myogenin and Myosin Heavy Chain (MHC), which effectively quantifies early myogenic committed (Myogenin+/MHC-) and terminally differentiating (Myogenin+/MHC+) cells (Figure S2H). We did not include Pax7 as our primary culture system consists of already activated myoblasts, where Pax7 is not a reliable marker of quiescence. Our data also suggest that Lmod1 is important in regulating differentiation with comparably only mild effects on proliferation (S2D-E), therefore, we focused on this stage of myogenesis.

      Our focus on differentiation over activation is further supported by multiple lines of evidence. First, analysis of publicly available transcriptome datasets reveals that Lmod1 mRNA levels actually decrease upon Muscle Stem Cell (MuSC) activation, suggesting its primary role is not during this initial phase. We added this data for clarification to Figure S1B. This aligns perfectly with our in vivo data from cardiotoxin-induced muscle regeneration, where abundance of LMOD1 protein peaks at days 4-7 post-injury — a time point coinciding with new myofiber formation and maturation — rather than during the initial activation and proliferation phase (days 1-3) (Figure 4I).

      Given this strong evidence pointing to a primary role for LMOD1 during the later stages of differentiation, we believe our current analyses are the most relevant. While single EDL fiber cultures are valuable for studying the quiescence-to-activation transition, they would not provide significant additional insight into the specific differentiation-centric mechanism we are investigating here. We are confident that our chosen readouts appropriately address Lmod1's function in the differentiation of myoblasts and formation of myotubes.

      (2) In supplementary Figure 2E, the distinction between Hoechst-positive cells and total cell counts is unclear. The authors should clarify why Hoechst-positive cells increase and relabel "reserve cells," as the term is confusing without reading the legend.

      We thank the reviewer for pointing out the confusion regarding the naming of the cell populations and the increase in Hoechst-positive cells. We have now modified this and revised the terminology used in Figure S2E to improve clarity. Specifically, we have relabeled "reserve cells" as "non-proliferating myoblasts (Ki67-/Hoechst+)" to describe these cells more accurately without requiring the legend for interpretation. Regarding the increase in Hoechst-positive cells, we observed a slight (26%) but significant decrease in the number of proliferating myoblasts (Ki67+/Hoechst+) (Figures S2D and S2E). The relative increase in non-proliferating (Ki67-/Hoechst+) cells is a consequence of the significant reduction in the number of proliferating cells (Ki67+/Hoechst+) cells. Importantly, the total cell count (sum of Ki67-/Hoechst+) and (Ki67+/Hoechst+) remained stable. This has been clarified in the revised figure legend and main text as follows:

      “This was accompanied by a proportional increase in non-proliferating myoblasts (Ki67-/Hoechst+), while the total Hoechst-positive cell count (Ki67+/Hoechst+ and Ki67-/Hoechst+) remained unchanged (Figure S2E).”

      (3) The specificity of Lmod1 and Sirt1 immunostaining needs validation using siRNA-treated samples, especially as these data form the basis of the mechanistic conclusions.

      We have validated the specificity of the LMOD1 antibody using multiple approaches. Specifically, we performed immunofluorescence and immunoblotting on Lmod1 siRNA-transfected samples, where we observed a significant reduction in the Lmod1 protein signal compared to control conditions (see manuscript data from Figure S2G).

      Additionally, LMOD1 overexpression experiments demonstrated a corresponding increase in the signal for LMOD1 using immunofluorescence analyses, confirming the specificity of the antibody for detecting LMOD1.

      For the reviewers’ interest, we add Author response image 1:

      Author response image 1.

      Specificity of antibodies detecting LMOD1. Representative immunofluorescence images of LMOD1 in primary myoblast cultures following siLmod1 knockdown, LMOD1 overexpression, or controls transfected with a non-targeting siRNA (siCtrl) after one day of differentiation. LMOD1 (purple), SIRT1 (yellow), and nuclei (Hoechst, blue). Scale bar: 10 µm.

      For the SIRT1 antibody used in our immunostaining, the specificity was validated by transfecting primary myoblasts with siRNA targeting Sirt1 and performing immunoblot analyses (Figure S5A). These showed a significant reduction in SIRT1 protein levels, confirming both the effectiveness of the siRNA and, critically, the antibody's ability to specifically recognize and detect SIRT1 protein. Furthermore, the same SIRT1 antibody was utilized in our nuclear-cytoplasmic fractionation experiments (Figure S4C), and its ability to detect SIRT1 in the expected subcellular compartments further supports its specific binding to SIRT1. While direct immunofluorescence on Sirt1 siRNA-transfected samples was not performed, the robust demonstration of the antibody's specificity for Sirt1 protein via immunoblotting (i.e., correct molecular weight band, significantly reduced by Sirt1 siRNA) and its distribution in subcellular fractions, which is fully consistent with the localization immunostaining performed at the same time points (compare Figure S4C and 5A), provide strong evidence on the antibody’s specificity, also in immunofluorescence experiments.

      (4) The authors must test the effect of Lmod1 siRNA on Sirt1 localization, as only overexpression experiments are shown

      We carefully considered performing this experiment. However, the knockdown of Lmod1 significantly impairs myogenic differentiation, a crucial cellular process that itself can influence protein localization. Consequently, if SIRT1 localization would be altered following knockdown of Lmod1, it would be challenging to disentangle whether this was a direct result of LMOD1 absence impacting SIRT1 trafficking or an indirect consequence of the cells failing to differentiate properly. This would make it difficult to draw clear conclusions regarding a direct causal link between LMOD1 and SIRT1 localization from such an experiment. Therefore, we focused on overexpression experiments, where we could demonstrate that altering LMOD1 levels is sufficient to affect SIRT1 localization. Our nuclear-cytoplasmic fractionation experiments clearly show that LMOD1 overexpression leads to changes in SIRT1 distribution (Figure 5H-K). These findings provide evidence that LMOD1 can directly modulate SIRT1 localization, supporting our mechanistic conclusions.

      (5) In Figure S3, the biotin signal in LMOD2 samples appears weak. The authors need to address whether comparing LMOD1 and LMOD2 is valid given the apparent difference in reaction efficiency. It would also help to highlight where Sirt1 falls on the volcano plot in S3B.

      We agree that the overall biotin signal on the streptavidin blot for the LMOD2-BirA* sample appears weaker than for LMOD1-BirA*. To provide a more direct comparison of the bait proteins themselves, we have now added a bar graph to the revised Figure S3D, which quantifies the relative abundance of LMOD1 and LMOD2 bait proteins in the pull down experiments. This analysis shows that the levels of LMOD1-BirA* and LMOD2-BirA* were comparable in our BioID samples. Furthermore, the validity of the LMOD2 BioID experiment is strongly supported by the identification of several known LMOD1 and LMOD2 interaction partners. As shown in the dataset, well-established interactors such as TMOD1, TPM3, and TMOD3 were identified, with some even showing stronger enrichment with LMOD2 than with LMOD1. This confirms that the biotinylation reaction was efficient enough to capture proximal proteins for both baits.

      Regarding SIRT1, we have now highlighted in yellow its position on the volcano plot in the revised Figure S3E. As can be seen, SIRT1 was identified in the LMOD1-BirA sample and showed enrichment. We believe these clarifications, along with the additional expression data and the successful identification of known interactors, confirm the validity of our comparative BioID analysis.

      (6) The immunostaining data suggest that Lmod1 remains cytoplasmic throughout differentiation, whereas Sirt1 shows transient cytoplasmic localization at day 1 of differentiation. The authors should explain why Sirt1 is not constantly sequestered if Lmod1's cytoplasmic localization is consistent. It is also unclear whether day 1 is the key time point for Lmod1 function, as its precise role during myogenesis remains ambiguous.

      We thank the reviewer for this comment. We have no data explaining why SIRT1 is not constantly sequestered while LMOD1 remains consistently cytoplasmic. We can only speculate that the transient cytoplasmic localization of SIRT1 may be linked to the availability and functional role of LMOD1 throughout the differentiation process. While LMOD1 is present at low levels in proliferating primary myoblasts, its expression increases upon the initiation of differentiation (Figure 2A). Initially, during the early stages of differentiation, LMOD1 may not be required for actin nucleation as the major remodeling of the cytoskeleton has not yet begun. During this phase, LMOD1 might have the capacity to sequester SIRT1 in the cytoplasm.

      However, as differentiation progresses and morphological changes take place, LMOD1 may switch its functional role to actin nucleation, thereby releasing SIRT1. This transition could explain why SIRT1 is free to localize transiently to the cytoplasm, particularly at day 1, when cytoskeletal remodeling is beginning but not yet fully established.

      Additionally, as LMOD1 and SIRT1 are known to colocalize in the nucleus, they may exit the nucleus together. Once in the cytoplasm, LMOD1 may become engaged in actin nucleation, allowing SIRT1 to function independently, which could explain the transient nature of SIRT1’s cytoplasmic localization.

      We have acknowledged this gap in our understanding in the discussion of the revised manuscript:

      “Our immunostaining data show that while LMOD1 is consistently cytoplasmic, its partner SIRT1 is only transiently localized in the cytoplasm. This suggests that their interaction is dynamically regulated. We hypothesize that the function of LMOD1 is determined by the changing availability of its binding partners during differentiation. During the initial phase, LMOD1 may primarily function to sequester SIRT1, a key regulator of myogenic genes. As differentiation proceeds, the increased expression of cytoskeletal components, such as its canonical partners TMODs and TPMs, likely shifts the function of LMOD1 towards its role in actin nucleation. This molecular switch, potentially driven by a change in the interactome of LMOD1, could then result in the release of SIRT1 from the cytoplasm. Such a mechanism may coordinate transcriptional regulation with cytoskeletal remodeling during myoblast differentiation.”

      (7) The introduction does not sufficiently establish the motivation or knowledge gap this work aims to address. Instead, it reads like a narration of disparate topics in a single paragraph. The authors should clarify the statement in line 150, "since this protein has been...,".

      We thank the reviewer for requesting clarification regarding our focus on LMOD1 (Introduction and Line 150 in the original submission). In the revised manuscript, we shortened the introduction and more clearly emphasized the motivation of our study:

      “Although these mechanisms contribute to remodeling the cellular architecture of MuSCs, a comprehensive understanding of the temporal dynamics of proteome remodeling during differentiation remains lacking. To address this knowledge gap, we performed an unbiased proteomic analysis of the early stages of myogenic differentiation to identify previously unrecognized proteins involved in this process and to examine how they functionally interact with established regulatory pathways.”

      Our decision to focus on LMOD1 was driven by its significant upregulation in our temporal proteome dataset, together with its previously uncharacterized role in primary myoblasts. Furthermore, to strengthen the interpretation of LMOD1’s role, particularly in the context of aging, we have integrated a new analysis of published transcriptomic datasets. This can be found in the main text as follows:

      “Surprisingly, we detected LMOD1 in freshly isolated muscle stem cells (MuSCs), but not LMOD2. Additionally, we observed that the protein levels of LMOD1 increased in MuSCs isolated from older mice (Figure 2C and Figure S1B). We further analyzed published transcriptomic data sets that describe changes between young and old MuSCs in both quiescent and activated states in young and old animals (Liu et al. 2013; Lukjanenko et al. 2016). In these analyzed transcriptomic data sets, Lmod1 was found to be significantly downregulated during the activation of MuSCs in both young and old mice (see Figure S1B).

      To assess the in vivo relevance of our finding, we queried two proteomic datasets of freshly isolated MuSCs and four different skeletal muscles (gastrocnemius, G; soleus, S; tibialis anterior, TA; extensor digitorum longus, EDL) (Schüler et al. 2021). We found LMOD2 to be the most abundant leiomodin protein in whole skeletal muscle, consistent with data from (Tsukada et al. 2010; Nworu et al. 2015; Kiss et al. 2020), while the overall abundance of LMOD1 was lower since this protein has been mainly associated with smooth muscle cells (Nanda and Miano 2012; Conley et al. 2001; Nanda et al. 2018) (Figure 2B).”

      Overall, while the identification of Lmod1 as a pro-myogenic factor is convincing, the mechanistic insights are insufficient, and the manuscript would benefit from addressing these concerns.

      We thank the reviewer for their constructive criticism. In the revised manuscript, we have strengthened our mechanistic insights and the validation of our findings by implementing the suggestions of the reviewers and including new experimental data to address their concerns.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors identify Leiomodin-1 (LMOD1) as a key regulator of early myogenic differentiation, demonstrating its interaction with SIRT1 to influence SIRT1's cellular localization and gene expression. The authors propose that LMOD1 translocates SIRT1 from the nucleus to the cytoplasm to permit the expression of myogenic differentiation genes such as MYOD or Myogenin.

      Strengths:

      A major strength of this work lies in the robust temporal resolution achieved through a time-course mass spectrometry analysis of in vitro muscle differentiation. This provides novel insights into the dynamic process of myogenic differentiation, often under-explored in terms of temporal progression. The authors provide a strong mechanistic case for how LMOD1 exerts its role in muscle differentiation which opens avenues to modulate.

      We thank the reviewer for the positive feedback on our manuscript and the insightful comments which helped to improve the manuscript!

      Weaknesses:

      One limitation of the study is the in vivo data. Although the authors do translate their findings in vivo for LMOD1 localization and expression, the cross-sectional imaging is not highly convincing. Longitudinal cuts or isolated fibers could have been more useful specimens to answer these questions. Moreover, the authors do not assess their in vitro SIRT1 findings in vivo. A few key experiments in regenerating or aged mice would strengthen the mechanistic insight of the findings.

      We agree that longitudinal cuts and isolated fibers can provide excellent morphological detail for specific questions. However, for our primary objective in this study, which was to assess the temporal expression and localization of LMOD1 across the tissue during the regeneration process, we decided that cross-sectional analysis provided the most robust and reliable overview. Cross-sectional imaging effectively captures the spatial distribution of LMOD1 across multiple myofibers and their surrounding microenvironment, simultaneously assessing the whole cross-sectional area. By using this approach, we were able to evaluate the broader tissue architecture and cellular context, which was essential for understanding the dynamic changes occurring during regeneration. We were also able to investigate all myofibers of a muscle, and not only a small proportion, which we would analyze with longitudinal sections and isolated myofibers. Therefore, we continued using cross-sections for further analyses.

      We fully agree with the reviewer that validating our in vitro SIRT1 findings in an in vivo context is an essential next step. To address this, we performed additional analyses on our existing regenerating muscle samples and incorporated new immunostainings for SIRT1 and PAX7 into the regeneration time-course (now shown in revised Figure 4I), providing further in vivo support for our proposed mechanism. We focused specifically on cross-sections collected at day 5 post-injury, a time point selected based on the peak in LMOD1 expression, to assess whether SIRT1 levels increase in parallel with LMOD1 during regeneration. Notably, SIRT1 abundance is elevated at day 5 post-injury, underscoring its involvement in early myogenic differentiation. This conclusion is further supported by the localization of SIRT1 within mononucleated cells and newly formed myofibers at this stage of regeneration.

      Finally, we agree that further mechanistic studies in vivo would be highly valuable. While we were able to address SIRT1 dynamics in our regeneration model as suggested, an aged mouse cohort was unfortunately not available to us for this kind of study. Furthermore, more extensive in vivo experiments, such as those involving genetic manipulation, were beyond the scope of the current study, partly due to constraints related to animal welfare regulations and our approved experimental protocols.

      Discussion:

      Overall, the study emphasizes the importance of understanding the temporal dynamics of molecular players during myogenic differentiation and provides valuable proteomic data that will benefit the field. Future studies should explore whether LMOD1 modulates the nuclear-cytoplasmic shuttling of other transcription factors during muscle development and how these processes are mechanistically achieved. Investigating whether LMOD1 can be therapeutically targeted to enhance muscle regeneration in contexts such as exercise, aging, and disease will be critical for translational applications. Additionally, elucidating the interplay among LMOD1, LMOD2, and LMOD3 could uncover broader implications for actin cytoskeletal regulation in muscle biology.

      We thank the reviewer for this excellent suggestion for future analyses. We have included these important considerations and future avenues in the Discussion of the revised manuscript:

      “Our immunostaining data show that while LMOD1 is consistently cytoplasmic, its partner SIRT1 is only transiently localized in the cytoplasm. This suggests that their interaction is dynamically regulated. We hypothesize that the function of LMOD1 is determined by the changing availability of its binding partners during differentiation. During the initial phase, LMOD1 may primarily function to sequester SIRT1, a key regulator of myogenic genes. As differentiation proceeds, the increased expression of cytoskeletal components, such as its canonical partners TMODs and TPMs, likely shifts the function of LMOD1 towards its role in actin nucleation. This molecular switch, potentially driven by a change in the interactome of LMOD1, could then result in the release of SIRT1 from the cytoplasm. Such a mechanism may coordinate transcriptional regulation with cytoskeletal remodeling during myoblast differentiation.”

      “Moreover, delineating the functional specialization and potential redundancy among leiomodin proteins represents an important next step. Our data indicate that LMOD1 primarily regulates early myogenic differentiation (Figure 3). In contrast, the lack of an early functional phenotype upon LMOD2 depletion, together with its upregulation at later stages (Figure S2A), suggests a temporal shift in regulatory control. Accordingly, a systematic comparative analysis of LMOD1, LMOD2, and LMOD3 will be required to elucidate their distinct roles in actin cytoskeleton regulation across the myogenic program, particularly with respect to myofibril maturation and maintenance.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Major Changes:

      (1) In Vivo Data on SIRT1:

      The inclusion of in vivo data on SIRT1 localization and expression would significantly strengthen the manuscript. Similar staining techniques used for LMOD1 could be applied to SIRT1. Additionally, imaging muscle specimens such as longitudinal sections or isolated myofibers would provide clearer insights into SIRT1's spatial distribution and improve upon the less convincing cross-sectional images currently presented (Figure 2).

      We fully agree that providing in vivo data on SIRT1 localization and expression is a crucial step to support our in vitro findings. Following the reviewer's suggestion, we have performed new experiments on muscle regeneration samples using the analyses of cross-sections as done for the analysis of LMOD1 localization. Specifically, we performed immunostaining for SIRT1 on cross-sections from muscle samples collected at day 5 post-injury, a time point selected based on the observed peak in LMOD1 expression. These new data (now included in revised Figure 4I) allowed us to assess whether SIRT1 levels increase during regeneration in parallel with an increase in LMOD1 abundance.

      Regarding the suggestion to use longitudinal sections or isolated myofibers, we agree that these preparations offer excellent answers for certain questions. For the primary goal of our study, to assess the temporal expression changes across the entire regenerating tissue at different time points, we found that cross-sections provided the most comprehensive and robust overview and therefore did not use longitudinal sections or isolated myofibers. 

      Performing additional animal experiments to obtain these specific preparations was beyond the scope of the current study and subject to constraints from our approved animal welfare protocols.

      (2) Morphology of siLmod1 Cells:

      The morphology of siLmod1-treated cells in vitro (Figure 3) raises concerns. Assessing cell viability or cell death in these experiments would help ensure that differences are not due to dead or unhealthy cells being quantified. There is also a notable discrepancy between the control panels in Figures 3C and 3H compared to the experimental conditions in 3F and 3K, particularly in terms of cell length and morphology. These inconsistencies should be addressed or clarified.

      We acknowledge the visual discrepancies in cell morphology noted by the reviewer (e.g., between Figures 3C/3H and 3F/3K). These differences can be attributed to biological variability between primary myoblast cultures isolated from different mice. Such variability includes differences in myogenic potential and the fact that cells are not synchronized, leading to variations in differentiation efficiency, baseline morphology, and cell length across cultures (Cornelison 2008; Vaughan and Lamia 2019). To account for this, we decided to use n=6 biological replicates, i.e., primary myoblast cultures isolated from 6 different mice, for immunofluorescence analysis, ensuring robust quantitative data. Furthermore, we confirmed that this phenotype was not an artifact of culture conditions, as we consistently observed the same effect of Lmod1 knockdown independently of the passage number of the myoblasts or the donor mouse.

      To address the concerns that morphological changes in siLmod1-treated cells might reflect cell death, we performed a TUNEL assay (transfection at day 1, analysis at day 3 of differentiation). This revealed no significant increase in TUNEL-positive (apoptotic) cells in siLmod1- (or siSirt1-) transfected samples versus siCtrl-transfected cells. These new data have been added to the revised manuscript as Supplementary Figure S2I. The TUNEL data indicate that the observed morphological changes upon knockdown of Lmod1 are not due to induced cell death. Supported by these results, our interpretation is that knockdown of Lmod1 impairs or arrests differentiation rather than causing cell death. Furthermore, our quantification of different cell populations showed shifts indicative of impaired differentiation (e.g., accumulation of cells at earlier stages) without exhibiting significant loss in cell numbers. For example, the numbers of myogenin+/MHC- and myogenin+/MHC+ cell populations, and differentiated myotubes, were not significantly reduced after transfection with siLmod1. A slight, not significant trend towards fewer non-proliferating myoblasts/reserve cells characterized by the expression of Myogenin-/MHC-Hoechst+ (Figure S2H) was noted. Overall, cells appeared to be 'stuck' in differentiation, consistent with the role of Lmod1 in impairing differentiation but not causing cell death. We have further clarified this aspect in the revised manuscript.

      (3) LMOD1 and SIRT1 Interaction in Myogenic Cells:

      Strengthening the connection between LMOD1 and SIRT1 within the myogenic system would enhance the manuscript. Could proximity ligation assays (PLA) be performed in myogenic cells, as was done in HEK293T cells? Additionally, investigating whether SIRT1 remains in the nucleus upon LMOD1 knockdown using siRNA would provide mechanistic insight into their interaction during myogenic differentiation.

      We would like to clarify that the Proximity Ligation Assays (PLA) shown in Figure 4H were indeed performed in primary myoblasts, confirming the LMOD1-SIRT1 interaction directly in a myogenic context. We have modified the text to clarify that primary myoblasts were used for the PLA assays.

      Minor Points:

      (1) Was Lmod1 knockdown confirmed in vivo?

      To target Lmod1 in Muscle Stem Cells (MuSCs) in vivo, we utilized self-delivering Accell siRNAs. This delivery system has been previously validated and shown to be highly effective for targeting MuSCs in regenerating muscle (Bentzinger et al., Cell Stem Cell, 2013).

      While this is an established method for delivery, confirming knockdown specifically within the rare MuSC population is technically challenging using bulk tissue analysis, as the target signal is diluted by numerous other cell types. 

      Therefore, to ensure the efficacy of our specific siRNA, we performed in vitro validation. For the reviewers' interest, we add Author response image 2 showing the efficiency of the respective siRNAs:

      Author response image 2.

      Knockdown efficiency of siRNAs targeting Lmod1 and Lmod2 following using the same self-delivering siRNA in proliferating primary myoblasts as used in in vivo experiments. Self-delivering Accell siRNA was added to primary myoblasts cultured in low serum media for 48 hours. Relative mRNA expression levels of Lmod1 and Lmod2 were measured after self-delivering Accell siRNA transfection targeting either Lmod1 (siLmod1) or Lmod2 (siLmod2). Expression levels were compared to control siRNA-transfected cells (siCtrl) and normalized to Gapdh expression.

      Based on the documented efficacy of this delivery system from prior literature and our own validation of the specific siRNAs used here, we are confident in the knockdown efficiency of the respective siRNAs. We decided not to perform additional animal experiments due to animal welfare considerations.

      (2) Some of the western blot bands do not appear to match the expected patterns for the tested proteins compared to controls (e.g., Figure S2J, S4C). Ensure that these are accurately labeled and include the entire membrane for transparency and reproducibility.

      Regarding Figure S2J, we agree that the presentation could be confusing to the reader. The blot shows LMOD1 and LMOD2 knockdown, while the bar plot quantifies only the change in LMOD2 levels. We have now revised the figure legend to explicitly state this. We hope this makes the presentation of our data clearer.

      For Figure S4C, we believe the concern about 'patterns' relates to loading variability. In this experiment, we manually counted the nuclei before lysis to ensure that each nuclear fraction started with an equal amount of material. We then loaded the cytoplasmic fractions in proportion to these counts. The purity of the fractions was additionally confirmed using nuclear (H4) and cytoplasmic (ALDOA) markers. As stated in the figure, the nuclear/cytoplasmic ratio of LMOD1 or SIRT1 was normalized across the entire lane of the Ponceau S staining, which we have now clarified in the relevant figure legends.

      Finally, regarding transparency, the presented immunoblot images are representative crops, which is standard practice for clarity. We are committed to reproducibility and will provide full, uncropped scans of all blots in the final version of the manuscript, in line with eLife publishing guidelines. 

      (3) Figure S1B appears to reuse images from Figure 2D (rotated). Verify that this is acceptable for the journal's guidelines, and if necessary, provide additional justification or clarification.

      We acknowledge that the image presented in Figure S1B was accidentally reused as a representative example in Figure 2D. To address this and prevent any potential redundancy or confusion, we have revised Figure S1B by replacing the duplicated image with a different, representative example from our dataset. The updated figure now contains unique image data, and we believe this revision fully resolves the concern.

      (4) Ensure consistent scale bars across images, particularly in Figures 3C and 3H, where discrepancies might affect interpretation.

      We thank the reviewer for pointing this out, we have now standardized all scale bars throughout the manuscript to ensure consistency. All immunofluorescence images of cultured cells (including Fig 3C and 3H) now have a 50 µm scale bar, and all tissue cross-sections have a 100 µm scale bar. This change has been implemented in the revised figures.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the investigators identified LMOD1 as one of a subset of cytoskeletal proteins whose levels increase in the early stages of myogenic differentiation. Lmod1 is understudied in striated muscle and in particular in myogenic differentiation. Thus, this is an important study. It is also a very thorough study - with perhaps even too much data presented. Importantly, the investigators observed that LMOD1 appears to be important for skeletal regeneration, and myogenic differentiation and that it interacts with SIRT1. Both primary myoblast differentiation and skeletal muscle regeneration were studied. Rescue experiments confirmed these observations: SIRT1 can rescue perturbations of myogenic differentiation as a result of LMOD1 knockdown.

      Strengths:

      Particular strengths include: important topic, the use of primary skeletal cultures, the use of both cell culture and in vivo approaches, careful biomarker analysis of primary mouse myoblast differentiation, the use of two methods to probe the function of the Lmod1/SIRT1 pathway via using depletion approaches and inhibitors, and generation of six independent myoblast cultures. Results support their conclusions.

      We thank the reviewer for the positive assessment of our work and the helpful comments for improving our manuscript.

      Weaknesses:

      (1) Figure 1. Images of cells in Figure 1A are too small to be meaningful (especially in comparison to the other data presented in this figure). Perhaps the authors could make graphs smaller?

      We have adjusted the size of the images across all figure panels to ensure better visibility and clarity. We hope these adjustments improve the presentation of the data.

      (2) Line 148 "We found LMOD2 to be the most abundant Lmod in the whole skeletal muscle." This is confusing since most, if not all, prior studies have shown that Lmod3 is the predominant isoform in skeletal muscle. The two papers that are cited are incorrectly cited. Clarification to resolve this discrepancy is needed.

      We acknowledge that LMOD2 and LMOD3 are predominantly expressed in skeletal and cardiac muscles (Tsukada et al. 2010; Nworu et al. 2015), www.proteinatlas.org) and LMOD3’s transcription is directly regulated by MRTF/SRF and MEF2 to coordinate sarcomeric assembly (Cenik et al. 2015). However, our statement refers specifically to the analysis of the proteomic datasets from freshly isolated MuSCs and four distinct skeletal muscles (G, S, TA, EDL) generated by Schüler et al. 2021. Crucially, LMOD3 was not detected in the quantitative mass spectrometry data for the EDL, G, S, or TA muscle samples analyzed in this specific study. In the context of this particular dataset, LMOD2 was the most highly abundant Leiomodin isoform detected in the whole skeletal muscle samples. This finding suggests a differential expression and function between LMOD isoforms depending on the muscle type and/or developmental/regenerative state. We have revised and corrected this clarification in the manuscript, including correcting the initial citations.

      (3) Figure 2. Immunoflorescence (IF) panels are too small to be meaningful. Perhaps the graphs could be made smaller and more space allocated for the IF panels? This issue is apparent for just about all IF panels - they are simply too small to be meaningful. Additionally, in many of the immunofluorescence figures, the colors that were used make it difficult to discern the stained cellular structures. For example, in Figure S1, orange and purple are used - they do not stand out as well as other colors that are more commonly used.

      We agree that the IF panels were too small for optimal interpretation and have adjusted them in Figure 2 and throughout the manuscript. Regarding the color choices, we appreciate the reviewer's comments. Our initial selection (e.g., orange and purple in Figure S1) was intended to enhance accessibility for individuals with common color vision deficiencies, including red-green color blindness. However, we acknowledge the reviewer's point that these combinations provided insufficient contrast for discerning cellular structures. Therefore, we have revised the color schemes to use green, red, and blue, which should offer improved contrast.

      (4) There is huge variability in many experiments presented - as such, more samples appear to be required to allow for meaningful data to be obtained. For example, Figure S2. Many experimental groups, only have 3 samples - this is highly problematic - I would estimate that 5-6 would be the minimum.

      We thank the reviewer for the comment regarding experimental variability and sample size. In our study, n=3 biological replicates, i.e., independent primary cell cultures obtained from different mice, were primarily used for immunoblots. We acknowledge that variability can be observed between distinct primary cell cultures due to factors such as inherent differences in myogenic potential, cell cycle state (as cells were not synchronized), and passage number. Importantly, despite this inter-sample variation, the investigated phenotypes showed consistent trends across biological replicates. Rather than increasing the number of replicates for immunoblots, we opted for validating our key findings using independent approaches with a higher number of replicates. For instance, qRT-PCR analyses (to confirm knockdown efficiency) and immunofluorescence analyses were mostly performed using five to six independent myoblast cultures (biological replicates).

      (5) Ponceau S staining is often used as a loading control in this manuscript for western blots. The area/molecular weight range actually used should be specified. Not clear why in some experiments GAPDH staining is used, in other experiments Ponceau S staining is used, and in some, both are used. In some experiments, the variability of total protein loaded from lane to lane is disconcerting. For example, in Figure S4C there appears to be more than normal variability. Can the protein assay be redone and samples run again?

      We have clarified in the relevant figure legends that Ponceau S normalization, when used, was based on the quantification of the entire lane. Our standard loading control is GAPDH. We used Ponceau S for normalization only when GAPDH was deemed unsuitable, e.g., in nuclear-cytoplasmic fractionation experiments where GAPDH is not present in all fractions.

      Concerning the variability observed in Figure S4C, we manually counted the nuclei before lysis to ensure that each nuclear fraction started with an equal amount of material. We then loaded the cytoplasmic fractions in proportion to these counts. The purity of the fractions was additionally confirmed using nuclear (H4) and cytoplasmic (ALDOA) markers. The nuclear/cytoplasmic ratio of LMOD1 or SIRT1 was normalized across the entire lane of the Ponceau S staining, which we have now clarified in the relevant figure legends.

      (6) Figure S3 - Lmod3 is included in the figure but no mention of it occurs in the title of the figure and/or legend.

      We wish to clarify that the protein identified in Figure S3 is TMOD3 (Tropomodulin 3), not LMOD3. TMOD3 is a known pointed-end capping protein regulating the actin filament nucleation process together with LMODs (Fowler and Dominguez 2017; Boczkowska et al. 2015), so its presence in our dataset was expected and helps validate our results.

      (7) Abstract, line 25. "overexpression accelerates and improves the formation of myotubes". This is a confusing sentence. How is it improving the formation? A little more information about how they are different than developing myotubes in normal/healthy muscles would be helpful.

      We thank the reviewer for the comment. To clarify, we have revised the sentence in line 25 to "improves the initiation of myotube formation." This change reflects our observation that overexpression of LMOD1 leads to a more rapid onset of myotube formation, as evidenced by earlier expression of differentiation markers and accelerated fusion of myoblasts into myotubes compared to GFP overexpression myoblast cell line. These findings suggest that LMOD1 overexpression enhances the efficiency of the early stages of differentiation and fusion, thereby contributing to improved initiation of myotube formation.

      (8) It is impossible from the IF figures presented to determine where Lmod1 localizes in the myocytes. Information on its subcellular localization is important. Does it localize with Lmod2 and Lmod3 at thin filament pointed ends?

      Several publications suggest that LMODs are involved in actin nucleation and interact with TMODs at the thin filament pointed ends (Boczkowska et al. 2015; Fowler and Dominguez 2017; Fowler, Greenfield, and Moyer 2003; Tsukada et al. 2010; Rao, Madasu, and Dominguez 2014). We performed F-actin (Phalloidin) staining together with LMOD1 staining and observed possible co-localization (see Author response image 3). Specifically, we noted an accumulation of LMOD1 at the ends of myocytes, indicating that LMOD1 might play a role in the elongation and guidance of myotube differentiation. For the reviewer’s interest, we include Author response image 3 as it was not part of the original manuscript. While performing subcellular localization stainings, we added the F-actin/Phalloidin staining to explore potential interactions, but this aspect was not further investigated in the current study.

      Author response image 3.

      Co-staining of LMOD1 and Phalloidin in differentiating myocytes.Example image showing immunofluorescence staining of LMOD1 (purple) and F-actin (green; Phalloidin) in differentiating primary myocytes. LMOD1 appears to accumulate at the ends of elongated myocytes and co-localizes with actin structures (highlighted in boxes), suggesting a potential role in myotube elongation and guidance during differentiation.

      Our study focused on a distinct role for LMOD1, independent from its function in actin filament nucleation, and we therefore did not pursue further co-localization staining with LMOD2 or LMOD3. We recognize the potential importance of exploring these interactions and their relevance to thin filament organization in skeletal muscle. However, although this was beyond the scope of our current work, we will investigate this aspect in the future.

      References

      Boczkowska, Malgorzata, Grzegorz Rebowski, Elena Kremneva, Pekka Lappalainen, and Roberto Dominguez. 2015. “How Leiomodin and Tropomodulin Use a Common Fold for Different Actin Assembly Functions.” Nature Communications 6 (1): 8314.

      Cenik, Bercin K., Ankit Garg, John R. McAnally, John M. Shelton, James A. Richardson, Rhonda Bassel-Duby, Eric N. Olson, and Ning Liu. 2015. “Severe Myopathy in Mice Lacking the MEF2/SRF-Dependent Gene Leiomodin-3.” The Journal of Clinical Investigation 125 (4): 1569–78.

      Cornelison, D. D. W. 2008. “Context Matters: In Vivo and in Vitro Influences on Muscle Satellite Cell Activity.” Journal of Cellular Biochemistry 105 (3): 663–69.

      Fowler, Velia M., and Roberto Dominguez. 2017. “Tropomodulins and Leiomodins: Actin Pointed End Caps and Nucleators in Muscles.” Biophysical Journal 112 (9): 1742–60.

      Fowler, Velia M., Norma J. Greenfield, and Jeannette Moyer. 2003. “Tropomodulin Contains Two Actin Filament Pointed End-Capping Domains.” The Journal of Biological Chemistry 278 (41): 40000–9.

      Liu, Ling, Tom H. Cheung, Gregory W. Charville, Bernadette Marie Ceniza Hurgo, Tripp Leavitt, Johnathan Shih, Anne Brunet, and Thomas A. Rando. 2013. “Chromatin Modifications as Determinants of Muscle Stem Cell Quiescence and Chronological Aging.” Cell Reports 4 (1): 189–204.

      Lukjanenko, Laura, M. Juliane Jung, Nagabhooshan Hegde, Claire Perruisseau-Carrier, Eugenia Migliavacca, Michelle Rozo, Sonia Karaz, et al. 2016. “Loss of Fibronectin from the Aged Stem Cell Niche Affects the Regenerative Capacity of Skeletal Muscle in Mice.” Nature Medicine 22 (8): 897–905.

      Nworu, Chinedu U., Robert Kraft, Daniel C. Schnurr, Carol C. Gregorio, and Paul A. Krieg. 2015. “Leiomodin 3 and Tropomodulin 4 Have Overlapping Functions during Skeletal Myofibrillogenesis.” Journal of Cell Science 128 (2): 239–50.

      Rao, Jampani Nageswara, Yadaiah Madasu, and Roberto Dominguez. 2014. “Mechanism of Actin Filament Pointed-End Capping by Tropomodulin.” Science 345 (6195): 463–67.

      Schüler, Svenja C., Joanna M. Kirkpatrick, Manuel Schmidt, Deolinda Santinha, Philipp Koch, Simone Di Sanzo, Emilio Cirri, Martin Hemberg, Alessandro Ori, and Julia von Maltzahn. 2021. “Extensive Remodeling of the Extracellular Matrix during Aging Contributes to Age-Dependent Impairments of Muscle Stem Cell Functionality.” Cell Reports 35 (10): 109223.

      Tsukada, Takehiro, Christopher T. Pappas, Natalia Moroz, Parker B. Antin, Alla S. Kostyukova, and Carol C. Gregorio. 2010. “Leiomodin-2 Is an Antagonist of Tropomodulin-1 at the Pointed End of the Thin Filaments in Cardiac Muscle.” Journal of Cell Science 123 (Pt 18): 3136–45.

      Vaughan, Megan, and Katja A. Lamia. 2019. “Isolation and Differentiation of Primary Myoblasts from Mouse Skeletal Muscle Explants.” Journal of Visualized Experiments: JoVE, no. 152 (October). https://doi.org/10.3791/60310.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript describes critical intermediate reaction steps of a HA synthase at the molecular level; specifically, it examines the 2nd step, polymerization, adding GlcA to GlcNAc to form the initial disaccharide of the repeating HA structure. Unlike the vast majority of known glycosyltransferases, the viral HAS (a convenient proxy extrapolated to resemble the vertebrate forms) uses a single pocket to catalyze both monosaccharide transfer steps. The authors' work illustrates the interactions needed to bind & proof-read the UDP-GlcA using direct and '2nd layer' amino acid residues. This step also allows the HAS to distinguish the two UDP-sugars; this is very important as the enzymes are not known or observed to make homopolymers of only GlcA or GlcNAc, but only make the HA disaccharide repeats GlcNAc-GlcA.

      Strengths:

      Overall, the strengths of this paper lie in its techniques & analysis.

      The authors make significant leaps forward towards understanding this process using a variety of tools and comparisons of wild-type & mutant enzymes. The work is well presented overall with respect to the text and illustrations (especially the 3D representations), and the robustness of the analyses & statistics is also noteworthy.

      Furthermore, the authors make some strides towards creating novel sugar polymers using alternative primers & work with detergent binding to the HAS. The authors tested a wide variety of monosaccharides and several disaccharides for primer activity and observed that GlcA could be added to cellobiose and chitobiose, which are moderately close structural analogs to HA disaccharides. Did the authors also test the readily available HA tetramer (HA4, [GlcA-GlcNAc]2) as a primer in their system? This is a highly recommended experiment; if it works, then this molecule may also be useful for cryo-EM studies of CvHAS as well.

      The reviewer requested testing whether an HA tetratsaccharide could also serve as an glycosyl transfer acceptor for HAS. The commerically available HA tetrasaccharide (HA4) is terminated at its non-reducing end by GlcA, therein we proceeded to measure its effect on UDP-GlcNAc turnover kientics. Titration of HA4 failed to elicit any detectable change in UDP-GlcNAc turnover rate, indicating no priming. This is now mentioned in the main text and the data is shown in Fig. S9.

      Weaknesses:

      In the past, another report describing the failed attempt of elongating short primers (HA4 & chitin oligosaccharides larger than the cello- or chitobiose that have activity in this report) with a vertebrate HAS, XlHAS1, an enzyme that seems to behave like the CvHAS ( https://pubmed.ncbi.nlm.nih.gov/10473619/); this work should probably be cited and briefly discussed. It may be that the longer primers in the 1999 paper and/or the different construct or isolation specifics (detergent extract vs crude) were not conducive to the extension reaction, as the authors extracted recombinant enzyme.

      We apologize for the oversight. This reference is now cited (ref. 18) together with the description of the failed elongation of HA4 by CvHAS.

      There are a few areas that should be addressed for clarity and correctness, especially defining the class of HAS studied here (Class I-NR) as the results may (Class I-R) or may not (Class II) align (see comment (a) below), but overall, a very nicely done body of work that will significantly enhance understanding in the field.

      Done as requested

      Reviewer #2 (Public review):

      Summary:

      The paper by Stephens and co-workers provides important mechanistic insight into how hyaluronan synthase (HAS) coordinates alternating GlcNAc and GlcA incorporation using a single Type-I catalytic centre. Through cryo-EM structures capturing both "proofreading" and fully "inserted" binding poses of UDP-GlcA, combined with detailed biochemical analysis, the authors show how the enzyme selectively recognizes the GlcA carboxylate, stabilizes substrates through conformational gating, and requires a priming GlcNAc for productive turnover.

      These findings clarify how one active site can manage two chemically distinct donor sugars while simultaneously coupling catalysis to polymer translocation.

      The work also reports a DDM-bound, detergent-inhibited conformation that possibly illuminates features of the acceptor pocket, although this appears to be a purification artefact (it is indeed inhibitory) rather than a relevant biological state.

      Overall, the study convincingly establishes a unified catalytic mechanism for Type-I HAS enzymes and represents a significant advance in understanding HA biosynthesis at the molecular level.

      Strengths:

      There are many strengths.

      This is a multi-disciplinary study with very high-quality cryo-EM and enzyme kinetics (backed up with orthogonal methods of product analysis) to justify the conclusions discussed above.

      Weaknesses:

      There are few weaknesses.

      The abstract and introduction assume a lot of detailed prior knowledge about hyaluronan synthases, and in doing so, risk lessening the readership pool.

      A lot of discussion focuses on detergents (whose presence is totally inhibitory) and transfer to non-biological acceptors (at high concentrations). This risks weakening the manuscript.

      The abstract and parts of the introduction have been revised to address the reviewer’s concerns.

      Reviewer #1 (Recommendations for the authors):

      (1) As noted above, please state in title, abstract & introduction that this work is focused on a "Class I-NR HAS" (as described in Ref. #4), and NOT all HAS families...this is truly essential to note as someone working with the Pasteurella HAS version (Class II) would be totally misled & at this point, no one knows the Streptococcus HAS (Class-IR) mechanistic details which could be different due to its inverse molecular directionality of elongation compared to the CvHAS Class I-NR enzyme.

      Done as requested.

      (2) Page 6 - for the usefulness of the HAS mutants as being folded correctly, it was stated these mutants are suitable since they all 'purify' similarly...the use of the more proper term should probably be 'chromatograph', similarly suggesting similar hydrodynamic radii without massive folding issues.

      This has been revised to state that they all exhibited comparable size exclusion chromatography profiles.

      “All mutants share similar size exclusion chromatography profiles with the WT enzyme, suggesting that the substitutions do not cause a folding defect (Fig. S3).”

      (3) Page 7 - please check these sentences (& rest of paragraph?) as the meaning is not clear. "First, UDP-GlcNAc was titrated in the presence of excess UDP-GlcA, resulting in a response similar to the acceptor-free condition (Fig. 2C). However, the maximum reaction velocity at 20 mM UDP-GlcNAc was approximately 25% lower than that measured in the presence of UDP-GlcNAc only (Fig. 2C)."

      The paragraph has been revised to avoid confusion.

      (4) In Methods, please use an italicized 'g' for the centrifugation steps globally.

      Changed as requested

      (5) Please note the source/vendor for the HA standards on gels.

      Done

      (6) Page 35 - TLC section.

      (a) 'n-butanol' (with italic n) is the most widespread chemical name (not butan-1-ol).

      Done

      (b) Also, for all of the TLC images, the origin and the solvent front should be marked.

      Changed as suggested.

      Reviewer #2 (Recommendations for the authors):

      A number of minor issues should be addressed.

      (1) Abstract

      Two comments on the Abstract, which I found surprisingly weak given the quality of the work, and lacking a key detail.

      A major conceptual contribution of this work is the demonstration of how a single Type-I catalytic centre discriminates, positions, and transfers two chemically distinct substrates in an alternating pattern. This distinguishes HAS from dual-active-site (Type-II) glycosyltransferases and is important for understanding HA polymerization.

      However, this central point is not clearly articulated in the abstract. I suggest explicitly stating that HAS performs both GlcNAc and GlcA transfer reactions within a single catalytic site, and that the proofreading/inserted poses illuminate how this multifunctionality is achieved.

      The abstract currently ends with the observation of a DDM-bound, detergent-inhibited state. While this is interesting, it absolutely does not represent the central conceptual advance of the study and gives the abstract an artefactual ending.

      I strongly recommend revising the final sentences to emphasize the broader mechanistic insight and not an "artefact" (indeed, the enzyme is inactive in the presence of this detergent; it is thus a very unusual way to conclude an abstract).

      That is, finish with the wider implications of how HAS coordinates alternating substrate use, proofreading, and polymer translocation. Ending on the main mechanistic or biological significance would make the abstract considerably stronger and more aligned with the main message of the paper.

      The abstract has been revised thoroughly to reflect the important insights gained on CvHAS’ catalytic function and HA biogenesis in general.

      (2) Introduction

      The distinction between single active-centre enzymes, which transfer both sugars alternately, and twin catalytic domain enzymes that each perform one addition is surely central to the whole paper. But it is not discussed. Surely this has to be covered. There is a lot of work in this space, including, but not limited to:

      https://doi.org/10.1093/glycob/cwg085

      https://doi.org/10.1093/glycob/10.9.883

      https://doi.org/10.1093/glycob/cwad075 (includes this author team)

      Originally back to https://doi.org/10.1021/bi990270y

      If the authors instead assume such a level of knowledge for the reader, then surely they are writing for a specialist audience, not consistent with the wider readership ambitions of eLife?

      The Introduction has been revised as suggested by the reviewer, providing necessary background to frame our description of the Chlorella virus HAS. We made a deliberate effort to put new insights into a broader context.

      (3) Results and Discussion

      DDM "was observed for >50% of the analysed particles". I struggled with this. I couldn't understand how the authors selected particles that did or did not contain DDM. The main body text states: "To our surprise, careful sorting of the UDP-GlcA supplemented cryo EM dataset revealed a CvHAS subpopulation that was not bound to the substrate, but, instead, a DDM molecule near the active site (Fig 3A and S7). This was observed for >50% of the analyzed particles."

      That reads like there is one sample with two populations. But the figures and the methods section suggest differently: they suggest two samples with different data-collection regimes. That does not match the main text. Could this be clarified?

      Yes, that wasn’t explained well. We clarified the text to stress that the DDM-bound sample came from a dataset that was intended to resolve an UDP-GlcA-bound state, but instead revealed the inhibition by DDM.

      Also in this space, in the modern world, "nominal magnification" has no real meaning, and calibrated pixel size would be more appropriate. Can this be given, please?

      The relevant Methods section now states: “imaging of … was performed at a calibrated pixel size of 0.652 Å”.

      The discovery of DDM in the active site is surprising. But it is an inhibitory artefact. Is this section pushed a little too hard? Also, "The coordination of DDM's maltoside moiety, an αlinked glucose disaccharide, is consistent with priming by cellobiose and chitobiose." I'm not sure why an α-linked maltose is consistent with the binding of a β-linked cellobiose. That makes no sense. There will be no other enzymes where starch and cellulose oligos are mutually accepted. Consider rewriting.

      We like to stress the DDM coordination because it could lead to the development of compounds that can really function as inhibitors, either for HAS or other related enzymes. In the observed DDM binding pose, the alpha-linkage is not recognized. Instead, the reducing end glucosyl unit stacks against Trp342 while the non-reducing unit extends into the catalytic pocket. Hence, a similar binding pose is conceivable for cellobiose and potentially also for chitobiose. The relevant section has been reworded.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This work shows that resistance profiles to a variety of drugs are variable between different mycobacterial species and are not correlated with growth rate or intrabacterial compound concentration (at least for linezolid, bedaquiline, and Rifampicin). Note that intrabacterial compound concentration does not distinguish between cytosolic and periplasmic/cell wall-associated drugs. The susceptibility profiles for a wide range of mycobacteria tested under the same conditions against 15 commonly used antimycobacterial drugs provide the first recorded cross-species comparison which will be a valuable resource for the scientific community. To understand the reasons for the high Rifampicin resistance seen in many mycobacteria, the authors confirm the presence of the arr gene known to encode a Rif ribosyltransferase involved in Rif resistance in M. smegmatis in the resistant mycobacteria after confirming the absence of on-target mutations in the RpoB RRDR. Metabolomic analyses confirm the presence of ribosylated Rif in some of the naturally resistant mycobacteria which may not be entirely surprising but an important confirmation. Presumably M. branderi is highly resistant despite lacking the arr homolog due to the rpoB S45N mutation. M. flavescens has an MIC similar to that of M. smegmatis, despite having both Arr-1 and Arr-X. Various Arr-1 and Arr-X proteins are expressed and characterized for catalytic activity which shows that Arr-X is a faster enzyme,, especially with respect to more hydrophobic rifamycins. M. flavescens has similar MIC values to Rifapentine and Rifabutin to M. smegmatis. Thus, the Arr-1 versus Arr-X comparison does not provide a complete explanation for the underlying reasons driving natural Rif resistance in mycobacteria. Downregulation of Arr-X expression in M. conceptionense confers increased sensitivity to Rifabutin confirming its role as a rifamycin-inactivating enzyme.

      Overall, the comparison of cross-species susceptibility profiles is novel; the demonstration that MIC is not correlated with intracellular drug concentration is important but not sufficiently interrogated, the demonstration that Arr-X is also a Rif ADP-ribosyltransferase is a good confirmation and shows that it is more efficient than Arr-1 on hydrophobic rifamycins is interesting but maybe not entirely surprising. The manuscript seems to have two parts that are related, but the rifamycin modification aspect of the work is not strongly linked to the first part since it interrogates the modification of one drug but not the common cause of natural resistance for other drugs.

      Reviewer #2 (Public review):

      Summary:

      The authors use a variety of methods to investigate the mechanisms of innate drug resistance in mycobacteria. They end up focusing on two primary determinants - drug accumulation, which correlates rather poorly with resistance for many species, and, for the rifamycins, ADP-ribosyltransferases. The latter enzymes do appear to account for a good deal of resistance, though it is difficult to extrapolate quantitatively what their relative contributions are.

      Overall, they make excellent use of biochemical methods to support their conclusions. Though they set out to draw very broad lessons, much of the focus ends up being on rifamycins. This is still a very interesting set of conclusions.

      Strengths:

      (1) A very interesting approach and set of questions.

      (2) Outstanding technical approaches to measuring intracellular drug concentrations and chemical modification of rifamycins.

      (3) Excellent characterization of variant rifamycin ADP-ribosyltransferases

      Weaknesses:

      (1) Figure 3c/d: These panels show the same experiment done twice, yet they display substantially different results in certain cases. For instance, M. smegmatis appears to show an order of magnitude lower RIF accumulation in panel d compared to M. flavescens, despite them displaying equal accumulation in panel c. The authors should provide justification for this variation, particularly as quantitative intra-species comparisons are central to the conclusions of this figure.

      The data in panels 3c and 3d are from different sets of experiments. The reviewer is correct with regards to M. smegmatis. The data indeed is ~ 1 order of magnitude different. However, the data for other species is very similar. The reviewer may also have noticed that the error bars are also larger in 3d, compared to 3c, indicating a greater variation between independent experiments use in 3d. We do not have a good explanation for this, other than the experiments shown in 3d were associated with greater biological variability.

      (2) There are several technical concerns with Figure 3 that affect how to interpret the work. According to the methods, the authors did not appear to normalize to an internal standard, only to an external antibiotic standard (which may account for some of the technical variation alluded to above).

      We agree that using a labeled drug as an internal standard (IS) would be ideal. However, the experiment initially followed an untargeted metabolomics approach, which later shifted to relative drug quantification. At that stage, normalizing with IS was impractical because proper implementation would require multiple IS across the chromatographic range. Therefore, we opted for total ion current (TIC) normalization, which accounts for variability in overall metabolite abundance—even though the experimental setup was already adjusted for each bacterial species’ growth rate. Additionally, we prepared external standard curves for each drug to enable quantification, and the amount of drug added to each plate was considered when reporting these values.

      Second, the authors used different concentrations of drug for each species to try to match the species' MICs. I appreciate the authors' thinking on this, but I think for an uptake experiment it would be more appropriate to treat with the same concentration of drug since uptake is likely saturable at higher drug concentrations. In the current setup, for the species with higher MIC, they have to be able to uptake substantially more antibiotics than the species with low MIC in order to end up with the same normalized uptake value in Figure 3d. It would be helpful to repeat this experiment with a single drug concentration in the media for all species and test whether that gives the same results seen here.

      We respectfully disagree with the reviewer. Experiments such as the one proposed by the review work well when MIC values are a few fold apart, for strains of the same species, but have not been tested when MIC values are 100-1000-fold apart, with different species. Furthermore, what would be the interpretation of compound uptake at 1000-fold the MIC for one species and MIC level for another? By using antibiotic concentrations at the respective MIC for each species we are at least under conditions where we know the biological effect of the antibiotic across species is the same, based on its potency.

      (3) Figure 4f: This panel seems to argue against the idea that the efficacy of RIF ribosylation is what's driving drug susceptibility. M. flavescens is similarly resistant to RIF as M. smegmatis, yet M. flavescens has dramatically lower riboslyation of RIF. This is perhaps not surprising, as the authors appropriately highlight the number of different rif-modifying enzymes that have been identified that likely also contribute to drug resistance. However, I do think this means that the authors can't make the claim that the resistance they observe is caused by rifamycin modification, so those claims in the text and figure legend should be altered unless the authors can provide further evidence to support them. This experiment also has results that are inconsistent with what appears to be an identical experiment performed in Supplemental Figure 5b. The authors should provide context for why these results differ.

      In regard to enzyme efficiency, the apparent rate of all Arr-1 is relatively similar in converting RIF into ADP-Ribosyl-Rif between species. However, Arr-X is much more efficient when compared to Arr-1 in both M. flavescents and M. conceptionense. This is indicated by the apparent rate measured and displayed on figure 5c.

      Proteomics data shows that there is upregulation of Arr-1 and Arr-X upon rifampicin treatment in M. flavescens and M. conceptionense. However, the same experiment was not performed in Arr-1 KD. Therefore, we can’t verify through this approach if the activity observed in vivo directly correlates with a higher expression of Arr-X alone. Of note, likely both enzymes contribute to resistance to rifamycins, as per our results with the Arr-X KD and sensitization of M. conceptionense to RIF.

      Author response image 1.

      It is also worth mentioning that there are other enzymes in the pathway of RIF ribosylation and their efficiency is unknown (Author response image 2). Therefore ADP-Ribosyl-RIF It is not an “end-metabolite” and maybe not the sole determinant of RIF resistance via ADP-ribosylation. Downstream enzymes can also account for the difference observed between M. flavescens and M. smegmatis.

      Author response image 2.

      It is correct that the Rifampicin MIC for M. flavescens is the same as M. smegmatis.

      (4) Fig 4f/5c: M. flavescens has both Arr-1 and Arr-X, yet it appears to not have ribosylated RIF. This result seems to undermine the authors' reliance on the enzyme assay shown in Fig 5c - in that assay, M. flavescens Arr-X is very capable of modifying rifampicin, yet that doesn't appear to translate to the in vivo setting. This is of importance because the authors use this enzyme assay to argue that Arr-X is a fundamentally more powerful RIF resistance mechanism than Arr-1 and that it has specificity for rifabutin. However, the result in Figure 4f would argue that the enzyme assay results cannot be directly translated to in vivo contexts. For the authors to claim that Arr-X is most potent at modifying rifabutin, they could test their CRISPRi knockdowns of Arr-X and Arr-1 under treatment with each of the rifamycins they use in the enzyme assay. The authors mentioned that they didn't do this because all the strains are resistant to those compounds; however, if Arr-X is important for drug resistance, it would be reasonable to expect to see sensitization of the bacteria to those compounds upon knockdown.

      The reviewer is reading Fig. 4f incorrectly, probably because it is plotted in a linear scale instead of logarithmic scale. Ribosylated Rif is present in M. flavescens, just at lower levels than M. conceptionense and M. smegmatis. In species where there is no Arr-1 or Arr-3, ribosylated RIF is not detected at all (e.g. M. tuberculosis), i.e., concentration is zero. Therefore, any detection of ribosylated RIF can be considered significant. In addition, as mentioned before, ADP-ribosylation of RIF is not the final product of the reaction and further studies need to be undertaken to understand subsequent reactions.

      (5) Figure 5d: The authors use this CRISRPi experiment to claim that ArrX from M. conceptionanse is more potent at inactivating rifabutin than Arr-1. This claim depends on there being equal degrees of knockdown of Arr-1 and Arr-X, so the authors should validate the degree of knockdown they get. This is particularly important because, to my knowledge, nobody has used this system in M. conceptionanse before.

      We agree with the reviewer that a qPCR should have been performed to define the extent of interference in the strain. generated Unfortunately, at this time a qPCR was not performed in the strains tested to confirm the extent of down regulation. Although it is the best practice to validate the strain KD, there is no indication that the effect observed is due to unspecific downregulation. The genetic environment in which Arr-X is positioned is different from Arr-1 and the targeting oligonucleotides are specific and would not promiscuously bind to Arr-1. Said that, this is indeed a fault in our setup.

      (6) The authors' arguments about Arr-X and Arr-1 would be strengthened by showing by LC/MS that Arr-X knockdown in M. conceptionense results in more loss of ribosyl-rifabutin than knockdown of Arr-1.

      We agree with the reviewer that performing the LC-MS analysis of the Arr-x knockdown would have strengthened the argument of our paper. Unfortunately, this experiment was not performed.

      Reviewer #3 (Public review):

      This manuscript presents a macroevolutionary approach to the identification of novel high-level antibiotic resistance determinants that takes advantage of the natural genetic diversity within a genus (mycobacteria, in this case) by comparing antibiotic resistance profiles across related bacterial species and then using computational, molecular, and cellular approaches to identify and characterize the distinguishing mechanisms of resistance. The approach is contrasted with "microevolutionary" approaches based on comparing resistant and susceptible strains of the same species and approaches based on ecological sampling that may not include clinically relevant pathogens or related species. The potential for new discoveries with the macroevolution-inspired approach is evident in the diversity of drug susceptibility profiles revealed amongst the selected mycobacterial species and the identification and characterization of a new group of rifamycin-modifying ADP-ribosyltransferase (Arr) orthologs of previously described mycobacterial Arr enzymes. Additional findings that intra-bacterial antibiotic accumulation does not always predict potency within this genus, that M. marinum is a better proxy for M. tuberculosis drug susceptibility than the commonly used saprophyte M. smegmatis, and that susceptibility to semi-synthetic antibiotic classes is generally less variable than susceptibility to antibiotics more directly derived from natural products strengthen the claim that the macroevolutionary lens is valuable for elucidating general principles of susceptibility within a genus.

      There are some limitations to the work. The argument for the novelty of the approach could be better articulated. While the opportunities for new discoveries presented by the identification of discrepant susceptibility results between related species are evident, it is less clear how the macroevolutionary approach is further leveraged for the discovery of truly novel resistance determinants. The example of the discovery of Arr-X enzymes presented here relied upon foundational knowledge of previously characterized Arr orthologs. There is little clarity on what the pipeline for identifying more novel resistance determinants would look like. In other words, what does the macroevolutionary perspective contribute to discovery from the point of finding interspecies differences in susceptibility? Does the framework still remain distinct from other discovery frameworks and approaches? If so, how?

      Thanks for pointing this out, as this is a critical feature of our study and method. Our approach relies on inter-species comparative genomics and phenotypes, and therefore, it is distinct from inter-strains comparison. This difference is dramatic, and it becomes clearer when we are comparing the core genome of M. tuberculosis (one species) 92% with the core genome of the genus, circa of 1%. While we focus on rifamycin in this manuscript, future manuscripts will investigate many of the other dozens of “inconsistencies” observed between the genetic makeup of different mycobacterial species and there actual performance in the presence of different antibiotics.

      While the experimentation and analyses performed appear well-designed and rigorous, there are a few instances in which broad claims are based on inferences from sample sets or data sets that are too limited to provide robust support. For example, the claim that rifampicin modification, and precisely ADP-ribosylation, is the dominant mechanism of resistance to rifampicin in mycobacteria may be a bit premature or an over-generalization, as other enzymatic modification mechanisms and other mechanisms such as helR-mediated dissociation of rifampicin-stalled RNA polymerases, efflux, etc were not examined nor were CRISPRi knockdown experiments conducted beyond an experiment to tease out the role of Arr-X and Arr-1 in one strain. The general claim that intra-bacterial antibiotic accumulation does not predict potency in mycobacteria may be another over-generalization based on the limited number of drugs and species studied, but perhaps the intended assertion was that antibiotic accumulation ALONE does not predict potency.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments

      (1) The metabolomics is done using mycobacteria grown on filters. Initially, mycobacterial cells are grown on the filters for 5 doublings before being transferred to drug-containing (or free) agar for one doubling. Is this based on calculated doubling time in liquid culture or a true determination of the fact that the biomass increases to what would amount to 5 doublings?

      The doubling time used is the one determined in liquid media. Although it is possible that the growth kinetics in solid media is slightly different from liquid (±10%), this experimental design is well established for M. tuberculosis (since Proc Natl Acad Sci U S A. 2010 May 25;107(21):9819-24.) and M. smegmatis (unpublished). Therefore, we used the growth rate as a proxy for having the same biomass of cells for each species tested. A maximum difference of 10% was observed between M. tuberculosis growth in liquid and in solid media, however, cells grow exponentially for much longer in filters. This makes filter-based experiments more reliable, as few growth phase-derived differences are present.

      (2) The demonstration that intrabacterial drug concentrations vary between mycobacterial species in a manner not related to MIC for at least LZD and RIF, is an important finding. However, intrabacterial does not mean cytoplasmic since a considerable fraction could be present in the periplasmic/cell wall layers. Ideally, this would need to be determined but would of course be a massive undertaking since the method needs validation & optimization for each mycobacterial species. Nevertheless, this has to be mentioned. In addition, three drugs are limiting. Measuring additional drug concentrations in these 5 mycobacteria would at least establish some confirmation about the extent of this lack of correlation. Thus, could the authors measure concentrations of additional drugs with intracellular targets?

      Testing additional drugs can be beneficial and would be an expansion of our paper, which will definitely be on future plans for further studies focusing on other antibiotics described here. It would also provide new insights into other possible mechanisms of resistance in mycobacterial species. However, in this study we aimed to first determine the antibiotic response profile in different mycobacterial species, and once we identified interesting resistance phenotypes that could not be readily explained by known mechanisms of resistance, we narrowed it down to certain drugs and species that would potentially provide insights into new mechanisms of antibiotic resistance. Finally, exploring drug concentration across multiple bacterial compartments is a dauting task and it has not been done extensively with any species, not to mention with multiple species, many of which are still lacking any study of their actual cell envelope.

      (3) CRISPRi was used to reduce transcription in M. conceptionense. What was the level of gene downregulation?

      As mentioned previously, a setback from our setup is that the level of KD was not measured at this instance.

      Minor comments:

      (1) The introduction mentions the fast and slow-growing mycobacteria which are classified based on the time that it takes to observe colonies on solid agar. However, in liquid medium, there is less correlation between the reported growth on agar and doubling time in liquid (Figure 1b, Figure 2d). This could be mentioned in the results section. In Figure 2d, the filled circles represent fast-growers but this does not hold well for liquid culture and it might make more sense to not distinguish between fast- and slow-growers in these graphs. A small complication would also be the fact that the doubling time represents growth in a liquid medium with Tyloxapol as a detergent whereas the MIC and metabolomics are done on solid agar with no detergent. The metabolomics is done after a doubling but for those where agar growth and liquid growth have large discrepancies in growth rate, there could be some differences.

      Apologies for this misunderstanding. Fast- and slow-growth phenotypes are determined in Lowenstein-Jensen (LJ) agar, not in 7H10 agar (used in our study and most studies of mycobacteria). Furthermore, this is a qualitative definition, not a quantitative one. Therefore, our measurements do not need to correlate with fast- and slow-growth phenotypes, unless we had used that one specific medium. Furthermore, in liquid medium, we determined growth rate directly, which is never done with LJ medium.

      In addition to adding the same amount of cells to each filter, we also perform TIC normalization, which should account for how rich the samples were – and therefore how much material we had. Therefore, we do not observe discrepancies due to differences in growth rate and the presence/absence of detergent in the media.

      It is also worth mentioning that this experimental set up has been well established in many M. tuberculosis labs that study metabolism. Importantly, the use of detergent drastically affects mass spectrometry, and therefore cannot be used.

      (2) Figure 1g in the text should be Figure 1f.

      Apologies, it has been fixed.

      (3) Figure S1 would be ideal to have in (supplementary) table format.

      This data is now being provided in a table format.

      (4) Table S1 - ethambutol misspelt.

      Spelling has been corrected.

      (5) MIC for species such as M. abscessus could depend on medium (7H9-based medium can give different MIC values than CAMH).

      Indeed, different media can significantly change MIC values, and this is true for many bacterial species, if not all. For this study we used only species that could be grown in 7H9 broth containing 10 % ADC, 0.05% glycerol 0.05% tyloxapol and 7H10 plates containing 10% OADC and 0.05% glycerol. MIC<sub>99</sub> was determined in the latter as we found more efficient and robust to do our tests it in solid media. The goal of our experiment was not to the determined the “true” MIC for the antibiotics tested, as this value does not exist. It was to find lack of correlations between relative values and the presence of genes that can account for it.

      (6) The statement "the experiment was performed at a concentration of antibiotic equal to its MIC" initially seems confusing. It was not equal to the MIC but performed at 6-fold the respective MIC of the species in question. Maybe re-phrasing this would help.

      Apologies for this oversight. It has been corrected.

      (7) Note that some mutations outside the RRDR (eg. V170F and I491F) can also cause Rif resistance.

      Author response image 3.

      A Rainbow diagram of RpoB X-Ray structure coloured according to sequence conservation. Dark purple indicates high conservation, whereas dark orange indicates low conservation. RIF (showed in magenta) is bound to RpoB. Zoomed view displays that the RIF-binding pocket is considerably conserved. B RpoB protein sequence has an 81bp region called Rifampicin Resistance Determining Region (RRDR) that is known to be important for RIF binding and is where most mutations occur in drug-resistant TB. Sequence alignment displays that the RRDR region is conserved with the exception of M. branderi, which has an Asn instead of a Ser residue in position 456 (numbering is related to the M. tuberculosis sequence), highlighted in bold.

      Attached we have a structural alignment of RpoB of the species highlighted on this paper. Although there is variability within the sequences, which is also displayed in Author response image 3 with the conservation analysis, the residues that have been implicated with resistance (including V170 and I491) are conserved. Alignment sent on .fasta file that can be opened in jalview.

      (8) Discuss how the RpoB S450N mutation in M. branderi confers the observed level of resistance.

      That’s a great point, thank you. Now it reads as:

      “The rifampicin (RIF) binding pocket is generally conserved, but Mycobacterium branderi has an S450N mutation in the RRDR region. While this specific mutation hasn't been found in clinical isolates, it's located at the binding site and may confer resistance (273). Although both serine (S) and asparagine (N) have similar side chains, related mutations like S450Q have been linked to resistance (156). Thus, M. branderi may be RIF-resistant due to this mutation. In contrast, M. conceptionense, M. flavescens, and M. smegmatis show no target sequence differences that explain their resistance”

      (9) The statement that the three tested NTM are sensitive to rifabutin ("resistant to all rifamycins except for rifabutin") needs to be interpreted considering what sensitivity means. The MIC is still high (1.6-3.1 ug/mL) when compared to that of Mtb. The 2-fold differences in MIC between M. smegmatis and M. conceptionense do not really prove or disprove the role of Arr-X in rifabutin resistance.

      We fixed the sentence to be more careful with the language on the text. We agree, but it is worth mentioning that generally with bacteria there is a regulation by the CLSI. Each bacterial species has a range that is considered sensitive or resistant, but these are not available for the species used in this study. In general, bacteria with MIC values above 8 µg/mL are considered resistant to rifampin (J Antibiot 2014 67:625).

      (10) Figure 1d: It's hard to quantify the sensitivity of the plates. Can this be done by MIC? Was only rifabutin tested or also rifampicin?

      The initial experiments described on the paper were all performed using Rifampicin only. Then, the MIC for the remaining rifamycins was determined for M. smegmatis, M. flavescens and M. conceptionense, and can be perused on “Supplementary table 4”. Figure 5d is to illustrate the effect of the KD in M. conceptionense sensitivity to rifabutin.

      (11) Is there data to show the ADP-ribosylation of rifabutin in M. conceptionense and the CRISPRi strains?

      Unfortunately, we did not perform LC-MS analysis on M. conceptionense CRISPRi strains exposed to rifabutin to measure potential ADP-ribosylation.

      Reviewer #2 (Recommendations for the authors):

      (1) It would be useful if the authors would complete Figure 1A by determining growth rates for the remaining 18 strains that they currently omitted.

      These growth rates were obtained using roller bottles and in at least 3 independent experiments, unfortunately the throughput is far ideal. The goal of the experiment was to highlight difference in growth rate, beyond fast- and slow-growth, which we did. Adding the remaining values would not change this conclusion. Growth rate variation in 7H9 is significant and the point is made in our figure.

      (2) The authors should justify their choice of species used in Figures 3-4. It would be useful to know, for instance, if the authors chose these species in an unbiased fashion, or if they were chosen because the authors had already determined that they possess rifamycin-modifying enzymes of interest. In that case, they wouldn't necessarily be a representative sample to use for the correlation analysis of antibiotic uptake and potency in Figure 3.

      They were chosen because of their resistance profile for BDQ, LZD and RIF. This has been addressed in the text, which now reads “Given the antibiotic response profiles observed, we selected BDQ, LZD and RIF to explore the molecular causes of these dramatic changes in antibiotic potency observed across the Mycobacterium genus.”

      (3) Figure 4b: The data in this panel appear inconsistent - for instance, M. houstonense appears to grow at 10X Mtb MIC, but fails to grow at 1X Mtb MIC. Repeating this experiment would better establish the validity of the authors' claims about the relative susceptibility of these strains to RIF.

      The figures got rotated when exported from illustrator. Corrected figure is uploaded, and original plate photos are also uploaded for clarity.

      (4) Figure 4e: Does Arr-X get upregulated in these proteomic datasets? The authors' argument that proteomic upregulation correlates with important drug resistance genes would imply that it might be, so that would be useful information to provide.

      Arr-X is slightly upregulated, but not statistically significant – this could be due to the native expression of Arr-1. Data is displayed in a previous answer.

      (5) I wasn't able to find the supplementary tables that the authors allude to - not sure if that was a file mixup, but those tables would be useful for interpreting the manuscript.

      We are sorry that you couldn’t access the table. It must be a file corruption issues, as the other reviewers were able to. We will make sure that all tables are available and accessible.

      (6) For LC/MS, the authors use peak height instead of peak area, which they argue correlates better with the amount of drug in cells because of the poor peak shape they observed for linezolid. This is not standard practice, so the authors should provide evidence to support this claim by running an LC/MS standard curve, then showing the correlation between peak height and amount of compound added as well as the correlation between peak area and compound.

      Thank you for pointing that out, accuracy calculated and displayed. Both peak area and height can be used, but indeed area is standard practice.

      (7) The authors should provide methods information about the LC column and the gradient settings used for LC-MS, as well as the settings of the MS.

      The full method has been added to the paper.

      Reviewer #3 (Recommendations for the authors):

      I have only minor comments aside from the information in the Public Review:

      (1) Results, section on Intra-bacterial antibiotic accumulation, line 8: "experiment was performed at a concentration of antibiotic PROPORTIONAL to its MIC" would be more accurate?

      Agreed and adjusted according to Reviewer’s suggestion.

      (2) Results, section on A minor role for pre-existing target modification, last sentence: the mere presence of RIF-ribosylating enzymes does not, in and of itself indicate that "RIF modification, and precisely ADP-ribosylation, is the dominant mechanism of resistance to RIF in mycobacteria", as other mechanisms and other forms of modifying enzymes are known to confer rifamycin resistance, with redundancy (e.g., other rifampicin-modifying enzymes, or helR-mediated dissociation of rifampicin-stalled RNA polymerases from DNA). It would be more appropriate to suggest the results presented to this point indicate RIF modification is common among mycobacteria. The evidence from the CRISPRi knockdown of Arrs shown in Fig 5d is the kind of evidence that suggests ribosylation as a dominant mechanism, at least against rifabutin in this particular species.

      Absolutely, there are other possible modifying enzymes that could be encoded by these mycobacterial species. There is a possibility that M. flavescens and M. smegmatis encode for a putative helR (attached alignment) but further experiments would need to be carried out to confirm its ability to displace RIF in the RNAP. Interestingly, the presence of both Arr and HelR has been studied in M. abscessus and those mechanisms of resistance are independent from each other (Molecular Cell 2022 82(17):3166-3177.e5).

      (3) Discussion, 2nd sentence needs grammatical editing.

      Rephrased and it reads “Using our mycobacterial library, we identified for the first time high- and ultra-high-level intrinsic resistance (3) to many of the antibiotics tested. Of note, the resistant phenotype is naturally occurring and not a result of mutations due to exposure to the antibiotic in the clinic – which is the more traditional approach for probing mechanisms of antibiotic resistance. Our observations revealed that resistance profiles are highly variable across the genus and do not follow phylogeny, implicating HGT as the key mechanism for acquisition of resistance determinants and evolution of antibiotic resistance in mycobacteria (42).”

      (4) Discussion, page 7, first line: the inclusion of LZD and BDQ in this statement seems at odds with Figure 2c and the statements in the first paragraph of page 5 highlighting these as examples of drugs to which most mycobacteria are susceptible.

      Indeed, many of the species are susceptible, however the MIC<sub>99</sub> levels observed have never been reported before, and therefore we found it to be an interesting finding to highlight. From a treatment perspective, knowing which species are sensitive to which drugs is of course the most useful outcome of our study.

      (5) The next sentence..."We found that resistance to these antibiotics in mycobacteria cannot be explained by uptake/efflux mechanisms..." is a bit of an over-generalization and conflicts with the evidence presented earlier that efflux could be playing a role in BDQ resistance and the published evidence establishing a clinically significant role for efflux-mediated BDQ resistance in M. tuberculosis, M. avium complex and M. abscessus complex.

      We rephrased it to make it more specific to our findings. It reads “We found that resistance to these antibiotics in mycobacteria do not correlate with by uptake/efflux mechanisms in the species tested and it does not correlate with growth rate. Identification of mycobacterial species highly resistant to BDQ and LZD is worrisome as most of this species, if not all, have never been exposed to these drugs.”

      (6) Methods, section on In vitro activity assay of Arr enzymes, line 1: reference(s) should be provided for previously reported methods.

      Reference now added.

      (7) Figure 2d: the low end of the susceptibility range is not well defined.

      In this figure the susceptibility is not defined as the lowest area of the graph, but the lower concentrations are indeed harder to be defined. Hopefully supplementary figure 1 and the additional table containing the MIC can be informative to address this comment.

      (8) Figures 3c,d: the presentation of the relative antibiotic concentrations could be harmonized between the graphs in 3c and those in 3d to enable a more ready comparison.

      We disagree. The goal of these different panels is exactly to illustrate two distinct points. C gives the relative concentration of antibiotic, while D correlates relative concentration with MIC99. The use of log scale in D further clarifies that there is no correlation between intracellular antibiotic concentration and potency (MIC). This information is not present in C.

      (9) Figure 4f and Supplementary Figure 5b: it is difficult to understand the limited amount of ribsosyl-RIF in M. flavescens in Fig 4f relative to Supplementary Figure 5b (esp. when considering M. smeg as a common comparator); and, further, to understand the seeming lack of correlation between RIF susceptibility, ribosylation and Arr number and catalytic efficiency for these two strains without considering additional resistance mechanisms.

      In reality the difference between figure 4f and Supplementary figure 5b is mainly due to M. smegmatis – that has an apparent lower production of ribosyl-RIF in the experiment described in the supplementary figure. The values for M. flavescens are relatively similar. In addition, the ADP-Ribosyl-RIF is not the final metabolite of the pathway.

      In regards of having the entire picture, it is true that we were unable to completely unravel and correlate MIC value, expression of Arr-1, expression of Arr-3, efficiency of each enzyme, production of ADP-Ribosyl-RIF and the presence of other possible mechanisms of resistance and this is indeed a setback in our study, and of most studies ever published, which usually focus on one resistant determinant.

    1. Author response:

      The following is the authors’ response to the original reviews

      Many thanks for your helpful and constructive comments for our work examining the effect of inhibiting both the insulin receptor (IR) and IGF1 receptor (IGF1R) in the podocyte. We are pleased to submit an updated manuscript addressing your concerns.

      (1) A major concern was a lack of mechanistic insight into how deletion (or knock-down) of both receptors caused the spliceosomal phenotype (Reviewer 1 and Reviewer 3).

      We now think this is due to the lack of a network of insulin/IGF phospho-signalling events to a variety of spliceosomal proteins and kinases. The reasons for this are as follows:

      A. Since submitting our paper Turewicz et al have published a comprehensive phospho-proteomic paper examining the effects of 100nM insulin on human primary myotubes (DOI: 10.1038/s41467-025-56335-6). They discovered that multiple post-translational phosphorylation events occur in a variety of spliceosomal proteins at differing time points (1 minute to 60 minutes). Furthermore, they show that mRNA splicing is rapidly modified in response to insulin stimulation in their cells. This follows elegant work from Bastista et al who studied diabetic and non-diabetic iPSC derived human myositis and also detected a spliceosome phosphorylation signature (DOI: 10.1016/j.cmet.2020.08.007).

      B. We have examined phospho-proteosome changes that occur in wild -type podocytes (expressing both the IR and IGF1R) compared to double (IR and IGF1R) knockout cells using phosho-proteomics. We have done this 3 days after inducing receptor knockdown, before major cell loss, and have stimulated the cells with either 10nM insulin or 100mg IGF1.

      Interestingly, we detected several post-translational modifications (PTM) in our data set that are also present in Turewicz’s studies. Of note, 100nM insulin (as used by Turewicz) will signal through both the insulin and IGF1 receptor (and hybrid Insulin/IGF1 receptors) which is relevant to our studies.

      Our work shows a cascade of phospho- signalling events affecting multiple components of the spliceosomal complex and evidence of kinase modulation (phosphorylation) (New Figure 7 and supplementary Figure 5). Also new results section in paper (lines 391-425 in track changes version). We acknowledge that we only studied a single time point after stimulation (10 minutes) and could have missed other PTM in the spliceosomal complex and other kinases. This is mentioned in our new limitations of study section (lines 595-606). This will be a focus of future work. We did not find major PTM differences when stimulating with either insulin or IGF1 in our studies and suspect that the doses of insulin (10nM) and IGF1 (100mg) used are still able to signal through cognate receptors.

      Furthermore, we have examined the relative contributions of the insulin and IGF1 receptor in detail in the model (addressed in point 13 below).

      (2) The phenotype of the mouse is only superficially addressed. The main issues are that the completeness of the mouse KO is never assessed nor is the completeness of the KO in cell lines. The absence of this data is a significant weakness. (Reviewer 1)

      We apologise for not making this clear, but we did assess the level of receptor knockdown in both the animal and cell models. The in vivo model showed variable and non-complete levels of insulin receptor and IGF1 receptor podocyte knock down (shown in supplementary Figure 1C). This is why we made the in vitro floxed podocyte cell lines in which we could robustly knockdown both the IR and IGF1R. We show this using Western blotting (shown in Figure 2A). We agree that calling the models knockout is misleading and have changed all to knock down (KD) now.

      (3) The mouse experiments would be improved if the serum creatinine’s were measured to provide some idea how severe the kidney injury is. (Reviewer 1)

      There is variability in creatinine levels which is not uncommon in transgenic mouse models (probably partly due to variability in receptor knock down levels with cre-lox system). This is part of rationale of developing the robust double receptor knockout cell models where we robustly knocked out both receptors by >80%. We have added measured creatinine levels in a subset of mice in supplementary data (New Supplementary Figure 1E) and mention this in the text (lines 285-286). As some mice died we expect they may have developed acute kidney injury, but we did not serially measure the creatinine’s in every mouse over time. We could have assessed the GFR in a more sensitive way to look at differences. However, we consider the highly significant levels of albuminuria and histological damage observed in our models show a significant kidney phenotype.

      (4) An attempt to rescue the phenotype by overexpression of SF3B4 would also be useful. If this didn't work, an explanation in the text would suffice. (Reviewer 1).

      We did consider doing this but on reflection think it is very unlikely to rescue the phenotype as an array of different spliceosomal proteins quantitatively changed and were differentially phosphorylated / dephosphorylated throughout the complex (as we hope our revised work illustrates now). We think a single protein rescue is highly unlikely to work. We hope this is an appropriate explanation for this action. We have mentioned this in the text now in our discussion (lines 601-602).

      (5) As insulin and IGF are regulators of metabolism, some assessment of metabolic parameters would be an optional add-on. (Reviewer 1).

      Thank you for this suggestion. We did not extensively examine the metabolism of the mice however we did perform blood glucose measurement and weight which are included in the paper (Figure 1A and Figure 1B).

      (6) The authors should caveat the cell experiments by discussing the ramifications of studying the 50% of the cells that survive vs the ones that died. (Reviewer 1).

      We appreciate this and this was the rationale behind cells being studied after 3 days differentiation for total and phospho-proteomics before significant cell loss to avoid the issue of studying the 50% of cells that survive (which happened at 7 days). We have made this clearer in the manuscript. We also have added the data showing less cell death at 3 days in the cell model (New Supp Figure 2B).

      (7) It would be helpful to say that tissue scoring was performed by an investigator masked to sample identity. (Reviewer 2)

      We did this and have added to manuscript (line 113).

      (8) Data are presented as mean/SEM. In general, mean/SD or median/IQR are preferred to allow the reader to evaluate the spread of the data. There may be exceptions where only SEM is reasonable. (Reviewer 2)

      All graphs have now been changed to SD rather than SEM.

      (9) It would be useful to for the reader to be told the number of over-lapping genes (with similar expression between mouse groups) and the results of a statistical test comparing WT and KO mice. The overlap of intron retention events between experimental repeats was about 30% in both knock-out podocytes. This seems low and I am curious to know whether this is typical for this method; a reference could be helpful. (Reviewer 2)

      This is an excellent question. We had 30% overlap as the parameters used for analysis were very stringent. We suspect we could get more than 30% by being less stringent, which still be considered as similar events if requested. Our methods were based on FLAIR analysis (PMID: 32188845). We have added this reference to the manuscript (Line 242 & 680).

      (10) With the GLP1 agonists providing renal protection, there is great interest in understanding the role of insulin and other incretins in kidney cell biology. It is already known that Insulin and IGFR signaling play important roles in other cells of the kidney. So, there is great interest in understanding these pathways in podocytes. The major advance is that these two pathways appear to have a role in RNA metabolism, the major limitations are the lack of information regarding the completeness of the KO's. If, for example, they can determine that in the mice, the KO is complete, that the GFR is relatively normal, then the phenotype they describe is relatively mild. (Reviewer 1)

      Thank you. The receptor knock-out (KO) in the mice is highly unlikely to be complete (Please see comments above and Supplementary Figure 1C). There are many examples of “KO” animal models targeting other tissues showing that complete KO of these receptors seems difficult to achieve, particularly in reference to the IGF1 receptor. In the brain, which also contains terminally differentiated cells, barely 50% of IGF1R knockdown was achieved in the target cells (PMID:28595357). In ovarian granulosa cells (PMID:28407051) -several tissue specific drivers tried but couldn't achieve any better than 80%. The paper states that 10% of IGF1R is sufficient for function in these cells so they conclude that their knockdown animals are probably still responding to IGF1. Finally, in our recent IGF1R podocyte knockdown model we found Cre levels were important for excision of a single homozygous floxed gene (PMID: 38706850) hence we were not surprised that trying to excise two homozygous floxed genes (insulin receptor and IGF1 receptor) was challenging. This was the rationale for making the double receptor knockout cell lines to understand processes / biology in more detail. As stated earlier, we have changed our description of the mice and cell lines from knock-out to knock-down throughout the revised manuscript as this is more accurate.

      (11) For the in vivo studies, the only information given is for mice at 24 weeks of age. There needs to be a full-time course of when the albuminuria was first seen and the rate of development. Also, GFR was not measured. Since the podocin-Cre utilized was not inducible, there should be a determination of whether there was a developmental defect in glomeruli or podocytes. Were there any differences in wither prenatal post-natal development or number of glomeruli? (Reviewer 3)

      We have added further urinary Albumin:creatinine ratio (uACR) data at 12, 16 and 20 weeks to manuscript. We do not think there was a major developmental phenotype as albuminuria did not become significantly different until several months of age (new Supp Figure 1B). We did consider using a doxycycline inducible model but we know the excision efficiency is much less than the constitutive podocin-cre driven model Author response image 1. This would likely give a very mild (if any) phenotype when attempting to knockout both receptors and not reveal the biology adequately. We acknowledge the weaknesses of the animal model and this was the rationale for generating the cell models.

      (12) Although the in vitro studies are of interest, there are no studies to determine if this is the underlying mechanism for the in vivo abnormalities seen in the mice. Cultured podocytes may not necessarily reflect what is occurring in podocytes in vivo. (Reviewer 3)

      This is a good point. We have now immune-stained the DKD and WT mice for Sf3b4 (a spliceosomal change in our in vitro proteomics) and also find a significant reduction in this protein in podocytes of the DKD mice (New Figure 3F).

      (13) Given that both receptors are deleted in the podocyte cell line, it is not clear if the spliceosome defect requires deletion of both receptors or if there is redundancy in the effect. The studies need to be repeated in podocyte cell lines with either IR or IGFR single deletions. (Reviewer 3)

      We have now performed proteomics and phospho-proteomics in all 4 cell types (Wild-type, Insulin receptor knock down, IGF1R knockdown and double knockdown) at 3 days (New Figure 8 and supplementary Figure 6. Also new results section lines 425 to 450). This shows that both receptors contribute to the pathways (and hence there is a high level of compensation built into the system). For total proteins we detected that spliceosomal tri-snRNP was only reduced when both receptors were lacking but other proteins / pathways had an incremental effect of losing the insulin or IGF1 receptor. Likewise, the spliceosomal phospho-signaling events can go through either the insulin or igf1 receptors predominantly or through both. We think this reflects the complexity of this system and how evolutioatily it has developed in mammals to protect against its loss.

      Finally in revision we have rewritten the discussion with a “limitations of the study” section and hopefully in an easier to read fashion for the readership.

      Author response image 1.

      (A) mT/mG reporter mouse crossed to constitutional podocin Cre heterozygous mouse. Illustrates podocyte specificity for Cre driver and excision Of reporter Figure shows GFP expression in Cre producing cells (top panel scale bar=250vm; bottom panel scale bar=50pm). Cre expression causes GFP to be switched on. (B) mT/mG reporter mouse crossed to podocin RtTA— tet-o-cre heterozygous mouse shows podocyte specificity for driver and approximately 60% excision. (top and bottom panels scale bar=250pm; middle panel scale bar=50pm). Doxycycline required for expression showing not leaky.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors show that genetic deletion of the orphan tumor necrosis factor receptor DR6 in mice does not protect peripheral axons against degeneration after axotomy. Similarly, Schwann cells in DR6 mutant mice react to axotomy similarly to wild-type controls. These negative results are important because previous work has indicated that loss or inhibition of DR6 is protective in disease models and also against Wallerian degeneration of axons following injury. This carefully executed counterexample is important for the field to consider.

      Strengths:

      A strength of the paper is the use of two independent mouse strains that knock out DR6 in slightly different ways. The authors confirm that DR6 mRNA is absent in these models (western blots for DR6 protein are less convincingly null, but given the absence of mRNA, this is likely an issue of antibody specificity). One of the DR6 knockout strains used is the same strain used in a previous paper examining the effects of DR6 on Wallerian degeneration.

      The authors use a series of established assays to evaluate axon degeneration, including light and electron microscopy on nerve histological samples and cultured dorsal root ganglion neurons in which axons are mechanically severed and degeneration is scored in time-lapse microscopy. These assays consistently show a lack of effect of loss of DR6 on Wallerian degeneration in both mouse strains examined.

      Therefore, in the specific context of these experiments, the author's data support their conclusion that loss of DR6 does not protect against Wallerian degeneration.

      Weaknesses:

      (1) The major weaknesses of this paper include the tone of correcting previously erroneous results and the lack of reporting on important details around animal experiments that would help determine whether the results here really are discordant with previous studies, and if so, why.

      The authors do not report the genetic strain background of the mice used, the sex distributions of their experimental cohorts, or the age of the mice at the time the experiments were performed. All of these are important variables.

      (Response 1) We thank the reviewer for emphasizing the importance of reporting the sex, age, and genetic background of the experimental animals used in our axon protection analyses. We have incorporated this information into the revised manuscript wherever available. The sole exception concerns the genetic background of the conditional DR6 mice generated by Genentech, which remains unknown. The original publication describing these mice (Tam et al., 2012, Dev Cell, PMID 22340501) did not report this information, and we were unable to obtain it directly from Genentech. Details regarding the genetic background of the Wld<sup>S</sup> and aPhr1 mutant mice are provided in their respective original publications, which are cited in our manuscript. Because the Gamage et al. study from the Deppmann laboratory did not report the sex or age of the animals used, we were unable to assess whether these variables might contribute to the differences observed between the two studies. Moreover, we are not aware of published evidence identifying sex or age as modifiers of structural axon preservation in axotomized peripheral nerve stumps in mouse models of delayed Wallerian degeneration. Furthermore, in the original publications describing the phenotypes of transgenic Nmnat2 and Wld<sup>S</sup> mice, as well as Sarm1 or Phr1 knockout mice, sex and age of the animals used in the Wallerian degeneration assays were not reported (PMIDs 23995269, 12106171, 22678360, 23665224). Although, to our knowledge, no large-scale systematic studies have been conducted, over the last 15 years we have never observed any sex-based differences in Wallerian degeneration phenotypes in these mutants exhibiting prominent axon protection. This topic was discussed informally at conferences, and we are also not aware of other investigators having observed such effects.

      In response to the reviewer’s comment regarding “tone”, we made sure that our data and interpretations are presented in a professional, balanced, and objective manner, including a detailed discussion of potential alternative explanations for the discrepant findings.

      (2) The DR6 knockout strain reported in Gamage et al. (2017) was on a C57BL/6.129S segregating background. Gamage et al. reported that loss of DR6 protected axons from Wallerian degeneration for up to 4 weeks, but importantly, only in 38.5% (5 out of 13) mice they examined. In the present paper, the authors speculate on possible causes for differences between the lack of effect seen here and the effects reported in Gamage et al., including possible spontaneous background mutations, epigenetic changes, genetic modifiers, neuroinflammation, and environmental differences. A likely explanation of the incomplete penetrance reported by Gamage et al. is the segregating genetic background and the presence of modifier loci between C57BL/6 and 129S. The authors do not report the genetic background of the mice used in this study, other than to note that the knockout strain was provided by the group in Gamage et al. However, if, for example, that mutation has been made congenic on C57BL/6 in the intervening years, this would be important to know. One could also argue that the results presented here are consistent with 8 out of 13 mice presented in Gamage et al.

      (Response 2) As noted above, we now provide information on the genetic background of the mice in the revised manuscript, where available. We have not backcrossed the constitutive DR6 knockout mice obtained from the Deppmann laboratory (Gamage et al.) to a C57BL/6 background; our colony was maintained primarily through intercrosses of heterozygous animals. Similarly, the conditional DR6 mutant mice used in this study were also not backcrossed to C57BL/6 mice.

      We respectfully hold a different view regarding the reviewer’s final point. We understand it is not appropriate to infer consistency between two datasets by disregarding the subset of results that do not align. By the same logic, it would be flawed to draw conclusions from the Gamage et al. study based solely on the single Wld<sup>S</sup> mouse out of five that did not show axon preservation after nerve injury. Selectively omitting conflicting data does not provide a valid basis for establishing phenotype concordance across studies.

      To further strengthen our study, we note that we performed additional analyses on three more nerve samples from constitutive DR6 null mice during the revision process and have incorporated the resulting data in Fig. 1.

      (3) Age is also an important variable. The protective effects of the spontaneous WldS mutation decrease with age, for example. It is unclear whether the possible protective effects of DR6 also change with age; perhaps this could explain the variable response seen in Gamage et al. and the lack of response seen here.

      (Response 3) As discussed above, we now provide the age information for the mice used for the Wallerian degeneration assessments in the respective figure legends. To our knowledge, there are no prior reports suggesting that age is a significant determinant of structural axon preservation in the indicated mutants. Electrophysiological function and neuromuscular junction preservation decrease with age in axotomized Wld<sup>S</sup> mice (e.g., PMIDs 12231635, 19158292, 15654865), but these parameters are not subject of our study, and we have not studied them. Unfortunately, a direct comparison of ages between our DR6 mutant mice and those used in Gamage et al. (2017) is not possible, as the earlier study from the Deppmann laboratory did not report this information.

      (4) It is unclear if sex is a factor, but this is part of why it should be reported.

      (Response 4) We now report the requested sex information for our axon preservation analyses during nerve injury-induced Wallerian degeneration in the DR6 mouse models in Figs. 1 and 2.

      (5) The authors also state that they do not see differences in the Schwann cell response to injury in the absence of DR6 that were reported in Gamage et al., but this is not an accurate comparison. In Gamage et al., they examined Schwann cells around axons that were protected from degeneration 2 and 4 weeks post-injury. Those axons had much thinner myelin, in contrast to axons protected by WldS or loss of Sarm1, where the myelin thickness remained relatively normal. Thus, Gamage et al. concluded that the protection of axons from degeneration and the preservation of Schwann cell myelin thickness are separate processes. Here, since no axon protection was seen, the same analysis cannot be done, and we can only say that when axons degenerate, the Schwann cells respond the same whether DR6 is expressed or not.

      (Response 5) We appreciate the reviewer’s detailed comments. Our intention was not to directly compare our findings with those of Gamage et al. regarding the myelin behavior at these time points (because we never observed axon protection), but rather to note that we did not observe any DR6-dependent alterations in Schwann cell responses under conditions where axons undergo normal Wallerian degeneration. As the reviewer correctly points out, Gamage et al. analyzed Schwann cell myelin surrounding axons that were protected from degeneration for extended periods, a context fundamentally different from the complete lack of axon protection observed in our DR6-deficient models. Therefore, the specific dissociation between axon preservation and myelin maintenance claimed by Gamage et al. cannot be evaluated in our study. A statement to make this point clearer has been incorporated in the revised manuscript.

      We fully agree with the reviewer’s concluding point: in our experiments, once axons degenerate, Schwann cell responses proceed similarly regardless of DR6 expression. This agreement reinforces one of the central conclusions of our work.

      (6) The authors also take issue with Colombo et al. (2018), where it was reported that there is an increase in axon diameter and a change in the g-ratio (axon diameter to fiber diameter - the axon + myelin) in peripheral nerves in DR6 knockout mice. This change resulted in a small population of abnormally large axons that had thinner myelin than one would expect for their size. The change in g-ratio was specific to these axons and driven by the increased axon diameter, not decreased myelin thickness, although those two factors are normally loosely correlated. Here, the authors report no changes in axon size or g-ratio, but this could also be due to how the distribution of axon sizes was binned for analysis, and looking at individual data points in supplemental figure 3A, there are axons in the DR6 knockout mice that are larger than any axons in wild type. Thus, this discrepancy may be down to specifics and how statistics were performed or how histograms were binned, but it is unclear if the results presented here are dramatically at odds with the results in Colombo et al. (2018).

      (Response 6) Several points raised by the reviewer appear to reflect differences in interpretation of the findings reported in Colombo et al. (2018). That study did not report altered myelination in DR6 null mice at stages when myelination is largely complete (P21). Instead, modest changes were observed at P1, which were reduced by P7, and P21 mutants were reported to be indistinguishable from controls. No analyses of peripheral nerves in older animals were presented, and the authors concluded in the discussion that myelination in young adult DR6 null mice appears normal. In contrast, our analysis of constitutive DR6 null mice at P1 does not reproduce the increase in the number of myelinated fibers per unit area reported by Colombo et al. We obtained similar results in the independent conditional DR6 knockout mouse line. Differences in nerve tissue processing, embedding, staining, or in the microscopic imaging and quantification of thinly myelinated axons in P1 sciatic nerve cross-sections may have contributed to the observed discrepancy. However, because the relevant methodological details were not described in Colombo et al., the underlying reasons for these differences cannot be determined and remain speculative.

      (7) Finally, it is important to note that previously reported effects of DR6 inhibition, such as protection of cultured cortical neurons from beta-amyloid toxicity, are not necessarily the same as Wallerian degeneration of axons distal to an injury studied here. The negative results presented here, showing that loss of DR6 is not protective against Wallerian degeneration induced by injury, are important given the interest in DR6 as a therapeutic target, but they are specific to these mice and this mechanism of induced axon degeneration. The extent to which these findings contradict previous work is difficult to assess due to the lack of detail in describing the mouse experiments, and care should be taken in attempting to extrapolate these results to other disease contexts, such as ALS or Alzheimer's disease.

      (Response 7) We agree with the reviewer’s point and emphasize that our manuscript carefully differentiates our data regarding the function of DR6 in Wallerian degeneration from the potential involvement of DR6 in other forms of axon degeneration. Our findings do not conflict with previous work on DR6 in the context of in vitro beta-amyloid and prion toxicity as well as in vitro models of ALS and multiple sclerosis. We believe these distinctions are explicitly and appropriately articulated throughout the entire manuscript and in more detail in the discussion section.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should include additional information about the mice used, including strain background for both the DR6 mice and the Cre transgenes crossed into the DR6 conditional knockout, the age of the mice when the nerve crush experiments were performed, and the sex distributions of the experimental cohorts. This information is critical for reproducibility in animal experiments, and that point is compounded here, where the major focus of this paper is taking issue with the reproducibility of previous work.

      (Response 8) This information has been included in the revision. See above responses.

      (2) In the abstract, reference 5 is cited as a study on the response to Schwann cells to injury in a DR6 background, but this probably should be reference 10.

      (Response 9) This typo has been corrected.

      (3) "Site-by-site comparison" in line 201 should be side-by-side?

      (Response 10) This typo has been corrected.

      (4) The paper contains a lot of self-evaluative wording, "surprising contrast," "compelling evidence," "robust results." Whether those adjectives apply should be for the reader to decide, and a drier, more objective tone in the presentation would improve the paper.

      (Response 11) We agree that excessive self-evaluative wording can weaken objectivity. In the manuscript, such phrasing is used sparingly and intentionally to highlight differences from previously published studies, guide the reader, and convey scholarly judgment. We do not consider this limited use to be counterproductive. The adjectives “surprising,” “compelling,” and “robust” each appear only one to three times across the entire manuscript, and the specific phrase “robust results” does not appear at all.

      (5) In Figure 2A, DR6-/-, there is no significant difference, but there is also a lot of variability, and one could argue the authors are seeing axon protection comparable to WldS in 40% of their samples (2/5), which is very similar to Gamage et al.

      (Response 12) We respectfully disagree with this reasoning as it relies on selectively emphasizing only a subset of the data. Please also see our response #2 for more detailed discussion.

      (6) Overall, the data presented here are convincing and support the conclusions drawn, but the paper needs to focus more on the negative results at hand and less on bashing previous studies, particularly when the results presented here do definitively show that the previous studies were incorrect and plausible explanations for differences in outcome exist.

      (Response 13) We have carefully revisited the wording of the manuscript and are confident that our emphasis remains on the central negative finding that DR6 does not regulate axon degeneration and Schwann cell injury responses during Wallerian degeneration. We do not believe the manuscript “bashes” previous studies; nonetheless, we thoroughly re-examined all relevant sections to ensure that our language is neutral, accurate, and non-inflammatory. We believe the current phrasing presents our interpretations in an appropriately balanced, objective, and professional manner.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Beirowski, Huang, and Babetto revisits the proposed role of Death Receptor 6 (DR6/Tnfrsf21) in Wallerian degeneration (WD). A prior study (Gamage et al., 2017) suggested that DR6 deletion delays axon degeneration and alters Schwann cell responses following peripheral nerve injury. Here, the authors comprehensively test this claim using two DR6 knockout mouse models (the line used in the earlier report plus a CMV-Cre derived floxed ko line) and multiple WD assays in vivo and in vitro, aligned with three positive controls, Sarm1 WldS and Phr1/Mycbp2 mutants. Contrary to the prior findings, they find no evidence that DR6 deletion affects axon degeneration kinetics or Schwann cell dynamics (assessed by cJun expression or [intact+degenerating] myelin abundance after injury) during WD. Importantly, in DRG explant assays, neurites from DR6-deficient mice degenerated at rates indistinguishable from controls. The authors conclude that DR6 is dispensable for WD, and that previously reported protective effects may have been due to confounding factors such as genetic background or spontaneous mutations.

      Strengths:

      The authors employ two independently generated DR6 knockout models, one overlapping with the previously published study, and confirm loss of DR6 expression by qPCR and Western blotting. Multiple complementary readouts of WD are applied (structural, ultrastructural, molecular, and functional), providing a robust test of the hypothesis.

      Comparisons are drawn with established positive controls (WldS, SARM1, Phr1/Mycbp2 mutants), reinforcing the validity of the assays.

      By directly addressing an influential but inconsistent prior report, the manuscript clarifies the role of DR6 and prevents potential misdirection of therapeutic strategies aimed at modulating WD in the PNS. The discussion thoughtfully considers possible explanations for the earlier results, including colony-specific second-site mutations that could explain the incomplete penetrance of the earlier reported phenotype of only 36%.

      Weaknesses:

      (1) The study focuses on peripheral nerves. The manuscript frequently refers to CNS studies to argue for consistency with their findings. It would be more accurate to frame PNS/CNS similarities as reminiscences rather than as consistencies (e.g., line 205ff in the Discussion).

      (Response 14) Axon protection in all key genetic models of delayed axon degeneration, including Wld<sup>S</sup>, SARM1, Phr1/Mycbp2 mutants, has been demonstrated in both the peripheral and central nervous systems. This observation supports the view that core molecular mechanisms regulating axon degeneration are conserved across neuronal populations throughout the entire nervous system. We have scrutinized the wording in our manuscript and are not aware that we frequently refer to CNS studies in regards to axon degeneration. Nevertheless, we have replaced the term “consistent” to avoid potential ambiguity when we discuss the earlier study showing normal Wallerian degeneration in the optic nerves from DR6 knockout mice.

      (2) The DRG explant assays are convincing, though the slight acceleration of degeneration in the DR6 floxed/Cre condition is intriguing (Figure 4E). Could the authors clarify whether this is statistically robust or biologically meaningful?

      (Response 15) We thank the reviewer for noting this aspect of our in vitro data in Fig. 4. The difference observed in the DR6 floxed/Cre condition is statistically significant at the 6h time point following disconnection, as indicated by the p value shown in Fig. 4E. However, a similarly statistically significant acceleration of axon degeneration was not observed in DRG axotomy experiments using constitutive DR6 knockout preparations, although a trend toward more rapid axon breakdown is apparent at 6 h post-axotomy (Fig. 4B). These observations may suggest reduced stability of DR6-deficient axons in this specific neuron-only in vitro context. Further investigation would be required to determine the biological significance of this effect. In contrast, our in vitro quantitative analyses of the initiation and early phases of Wallerian degeneration (Fig. 2) revealed no evidence of accelerated axon disintegration in the DR6 mutant mouse models, highlighting potential differences between in vitro and in vitro systems.

      (3) In the summary (line 43), the authors refer to Hu et al. (2013) (reference 5) as the study that previously reported AxD delay and SC response alteration after injury. However, this study did not investigate the PNS, and I believe the authors intended to reference Gamage et al. (2017) (reference 10) at this point.

      (Response 16) Thanks for pointing this out. We have corrected this typo in the revised manuscript.

      (4) In line 74ff of the results section, the authors claim that developmental myelination is not altered in DR6 mutants at postnatal day 1. However, the variability in Figure S2 appears substantial, and the group size seems underpowered to support this claim. Colombo et al. (2018) (reference 11) reported accelerated myelination at P1, but this study likewise appears underpowered. Possible reasons for these discrepancies and the large variability could be that only a defined cross-sectional area was quantified, rather than the entire nerve cross-section.

      (Response 17) We confirm that the quantification of thinly myelinated axons was performed on entire sciatic nerves from P1 mouse pups, as described in the methods section in our original manuscript. The data shown in Fig. S2 were obtained from 5-9 pups per experimental group. Sample sizes were determined based on a priori power analyses using pilot data, which indicated that a minimum of five biological replicates was sufficient to detect statistically significant differences with acceptable confidence. Comparable sample sizes have been used in our previous studies and by other groups to assess early postnatal myelination (e.g., PMIDs 21949390, 28484008). Several published studies have reported analyses using 3-4 animals per group (e.g., PMIDs 28484008, 25310982, 29367382). For comparison, the study by Colombo et al. used 3-8 pups for the analysis presented in their Fig. 3. We note that the apparent variability in Fig. S2 may be accentuated by the scaling of the y-axis, which was chosen to ensure that individual data points are clearly resolved and visible.

      (5) The authors stress the data of Gamage et al. (2017) on altered SC responses in DR6 mutants after injury. They employed cJun quantification to show that SC reprogramming after injury is not altered in DR6 mutants. This approach is valid and the conclusion trustworthy. Here, the addition of data showing the combined abundance of intact and degenerated myelin does not add much insight. However, Gamage et al. (2017) reported altered myelin thickness in a subset of axons at 14 days after injury, which is considerably later than the time points analyzed in the present study. While, in the Reviewer's view, the thin myelin observed by Gamage et al. in fact resembles remyelination, the authors may wish to highlight the difference in the time points analyzed.

      (Response 18) We consider the additional quantification of the area occupied by intact myelin and myelin debris to provide complementary information that supports the c-Jun-based conclusion that Schwann cell injury responses are normal in DR6-deficient nerves following lesion. We agree with this reviewer that the thin myelin observed by Gamage et al. resembles remyelination, raising the possibility that axon regeneration occurred into the distal nerve stump at the studied 14d post-injury time point (see their Fig. 3). This may have been interpreted as axon protection in this study. In our study, it was impossible to examine such myelin effects since axon protection was never observed in any of the DR6 mutant models at any of the time point we investigated. We have incorporated appropriate additional text to highlight this difference. See also response #5 above.

      Reviewer #3 (Public review):

      Summary:

      The authors revisit the role of DR6 in axon degeneration following physical injury (Wallerian degeneration), examining both its effects on axons and its role in regulating the Schwann cell response to injury. Surprisingly, and in contrast to previous studies, they find that DR6 deletion does not delay the rate of axon degeneration after injury, suggesting that DR6 is not a mediator of this process.

      Overall, this is a valuable study. As the authors note, the current literature on DR6 is inconsistent, and these results provide useful new data and clarification. This work will help other researchers interpret their own data and re-evaluate studies related to DR6 and axon degeneration.

      Strengths:

      (1) The use of two independent DR6 knockout mouse models strengthens the conclusions, particularly when reporting the absence of a phenotype.

      (2) The focus on early time points after injury addresses a key limitation of previous studies. This approach reduces the risk of missing subtle protective phenotypes and avoids confounding results with regenerating axons at later time points after axotomy.

      Weaknesses:

      (1) The study would benefit from including an additional experimental paradigm in which DR6 deficiency is expected to have a protective effect, to increase confidence in the experimental models, and to better contextualize the findings within different pathways of axon degeneration. For example, DR6 deletion has been shown in more than one study to be partially axon protective in the NGF deprivation model in DRGs in vitro. Incorporating such an experiment could be straightforward and would strengthen the paper, especially if some of the neuroprotective effects previously reported are confirmed.

      (Response 19) We thank the reviewer for these suggestions. We would like to highlight that our study addresses the role of DR6 in Wallerian degeneration, whereas in vitro NGF deprivation has been used to model developmental axon pruning. Previous work indicates fundamental biological differences between these regressive pathways regulating the stereotyped removal of axon segments. We feel that studying this alternative form of axon degeneration is beyond the scope of the current work and could be addressed in a separate manuscript. Although additional tests will be needed, we note that our preliminary data using samples from both DR6 knockout mouse models suggest no axon protection after NGF-deprivation in DRG neuron preparations in our hands (deprivation of the growth factor and administration of anti-NGF antibody).

      (2) The quality of some figures could be improved, particularly the EM images in Figure 2. As presented, they make it difficult to discern subtle differences.

      (Response 20) We have pseudocolored intact (turquoise) and degenerated (magenta) myelinated fibers on the high-resolution semithin micrographs (not electron micrographs) in the new Fig. 2 to make the distinction between the two fiber categories clearer.

      Reviewer #3 (Recommendations for the authors):

      (1) Line 121: The authors mention toluidine blue staining, but it does not appear to be shown in Figure S5.

      (Response 21) This appears to be a misunderstanding. Fig. S5A shows the ultrastructure of dedifferentiated Schwann cells in transmission electron micrographs, while Figs. S5B and C show quantification of the area occupied by myelin sheaths and myelin debris profiles on osmium tetroxide and toluidine blue stained nerve sections from the two DR6 mutant models, based on semithin light microscopy. These are two different aspects of the analysis. The text has been modified in the revised manuscript to make the distinction clearer.

      (2) Line 175: The authors should add NMNAT2 to the list of enzymes implicated in the regulation of Wallerian degeneration in mammals.

      (Response 22) Nmnat2 and a literature reference (Milde et al., 2013) has been incorporated in the discussion of the revised manuscript to address this point.

      (3) Line 201: Please correct the typo "site-by-site" to "side-by-side."

      (Response 23) This typo has been corrected.

    1. Author response:

      We appreciate that the reviewers provided an overall positive assessment of our manuscript and offered constructive suggestions for improvement. All three reviewers noted that a key strength of our study is the implementation of a gut microbiome model for the characterization of interbacterial antagonism pathways such as the type VI secretion system (T6SS) that approaches natural complexity. They note our work represents a significant advance in microbiome research, and generates resources that will be of use to many researchers in the field. Two of the reviewers point out that the complexity of our model limits the nature of measurements we can make, and suggest we temper the strength of the some of the conclusions we draw. As noted in more detail below, in our revised manuscript, we will be more precise in the wording we use to characterize our findings, and we will be more explicit about what the measurements we are able to make allow us to conclude about the physiological role of the T6SS in the gut microbiome.

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors investigate the physiological role of the Type VI secretion system (T6SS) in a naturally evolved gut microbiome derived from wild mice (the WildR microbiome). Focusing on Bacteroides acidifaciens, the authors use newly developed genetic tools and strain-replacement strategies to test how T6SS-mediated antagonism influences colonization, persistence, and fitness within a complex gut community. They further show that the T6SS resides on an integrative and conjugative element (ICE), is distributed among select community members, and can be horizontally transferred, with context-dependent effects on colonization and persistence. The authors conclude that the T6SS stabilizes strain presence in the gut microbiome while imposing ecological and physiological constraints that shape its value across contexts.

      This study is likely to have a significant impact on the microbiome field by moving experimental tests of T6SS function out of simplified systems and into a naturally co-evolved gut community. The WildR system, together with the strain replacement strategy, ICE-seq approach, and genetic toolkit, represents a powerful and reusable platform for future mechanistic studies of microbial antagonism and mobile genetic elements in vivo.

      The datasets, including isolate genomes, metagenomes, and ICE distribution maps, will be a valuable community resource, particularly for researchers interested in strain-resolved dynamics, horizontal gene transfer, and ecological context dependence. Even where mechanistic resolution is incomplete, the work provides a strong experimental foundation upon which such questions can be directly addressed.

      Overall, this study occupies a space between system building and mechanistic dissection. The authors demonstrate that the T6SS influences persistence and community structure in vivo, but the physiological basis of these effects remains unresolved. Interpreting the results as evidence of fitness costs or selective advantage, therefore, requires caution, as multiple ecological and host-mediated processes could produce similar abundance trajectories.

      Placing the findings within the broader literature on microbial antagonism, particularly work emphasizing measurable costs, benefits, and tradeoffs, would help readers better contextualize what is directly demonstrated here versus what remains an open question. Viewed in this light, the principal contribution of the study is to show that such questions can now be addressed experimentally in a realistic gut ecosystem.

      We thank the reviewer for this thoughtful summary of our study. We were glad to read they conclude our work will have a significant impact on the microbiome field and that the resources we have developed will be of value to the community.

      Strengths:

      A major strength of this study is that it directly interrogates the physiological role of the T6SS in a naturally evolved gut microbiome, rather than relying on simplified pairwise or in vitro systems. By working within the WildR community, the authors advance beyond descriptive surveys of T6SS prevalence and address function in an ecologically relevant context.

      The authors provide clear genetic evidence that Bacteroides acidifaciens uses a T6SS to antagonize co-resident Bacteroidales, and that loss of T6SS function specifically compromises long-term persistence without affecting initial colonization. This temporal separation is well designed and supports the conclusion that the T6SS contributes to maintenance rather than establishment within the community.

      Another strength is the identification of the T6SS on an integrative and conjugative element (ICE) and the demonstration that this element is distributed among, and exchanged between, community members. The use of ICE-seq to track distribution and transfer provides strong support for horizontal mobility and adds mechanistic depth to the study.

      Finally, the transfer of the T6SS-ICE into Phocaeicola vulgatus and the observation of context-dependent colonization benefits followed by decline is a compelling result that moves the study beyond simple "T6SS is beneficial" narratives and highlights ecological contingency.

      We appreciate this detailed and nuanced characterization of the strengths of our study.

      Weaknesses:

      Despite these strengths, there is a mismatch between the precision of the claims and the precision of the measurements, particularly regarding fitness costs, physiological burden, and the mechanistic role of the T6SS.

      We acknowledge that in some places, our manuscript could benefit from greater precision in the language we use when linking the outcomes we observe in our study to their potential underlying causes. Specific revisions we propose to address this concern are described below.

      First, while the authors conclude that the T6SS "stabilizes strain presence" and that its value is constrained by fitness costs, these costs are not directly measured. Persistence, abundance trajectories, and eventual loss are informative outcomes, but they do not uniquely identify fitness tradeoffs. Decline could arise from multiple non-exclusive mechanisms, including community restructuring, host-mediated effects, incompatibilities of the ICE in new hosts, or ecological retaliation, none of which are disentangled here.

      We agree that multiple mechanisms could explain why P. vulgatus carrying a T6SS-encoding ICE declines over time. Our use of the term “fitness cost” to describe this trend was not meant to imply any particular underlying mechanism, but was rather our attempt to characterize the phenotypic outcome we observed in simplified terms. We note that ecological context is an important determinant of the fitness cost or benefit of any given trait, and our study sheds light on the importance of the presence of the WildR community and the mouse intestinal environment to the fitness contribution of the ICE to P. vulgatus. Nonetheless, to avoid implying an overly simplistic interpretation of our results, we propose to modify the language used in the manuscript when describing the contribution of the T6SS to species persistence in WildR-colonized mice.

      Second, the manuscript frames the T6SS as having a defined physiological role, yet the data do not resolve which physiological processes are under selection. The experiments demonstrate that T6SS activity affects persistence, but they do not distinguish whether this occurs via direct killing, resource release, niche modification, or higher-order community effects. As a result, "physiological role" remains underspecified and risks being conflated with ecological outcome.

      We acknowledge that our study does not fully resolve the physiological processes under selection that mediate role of the T6SS in maintaining B. acidifaciens populations in WildR-colonized mice. Indeed, several of the outcomes of T6SS activity the reviewer lists, such as target cell killing and nutrient release, are inextricably linked and thus inherently difficult to disentangle. We note that we did attempt to measure higher-order community effects of T6SS activity with metagenomic sequencing, but acknowledge that this approach may not have been sufficiently sensitive to detect small community shifts mediated by a relatively low-abundance species. To address the concern that our current framing implies more of a mechanistic understanding that our study achieves, we propose to substitute “ecological” for “physiological” where appropriate when summarizing our key findings.

      Third, although the authors emphasize context dependence, the study offers limited quantitative insight into what aspects of context matter. Differences between native and recipient hosts, or between early and late colonization phases, are described but not mechanistically interrogated, making it difficult to generalize beyond the specific cases examined.

      We are not entirely clear what the reviewer means by “differences between native and recipient hosts”, but we agree that additional quantitative studies will be needed to address the generalizability of our findings. Future studies are also needed to address the mechanistic basis for the difference in the benefit conferred by the T6SS that we observed between P. vulgatus and B. acidifaciens.

      Fourth is the lack of engagement with recent experimental literature demonstrating functional roles of the T6SS beyond simple interference competition. While the authors focus on persistence and competitive outcomes, they do not adequately situate their findings within recent work demonstrating that T6SS-mediated antagonism can serve additional physiological functions, including resource acquisition and DNA uptake, thereby linking killing to measurable benefits and tradeoffs. The absence of this literature makes it difficult to place the authors' conclusions about physiological role and fitness cost within the current conceptual framework of the field. Without this context, the physiological interpretation of the results remains incomplete, and alternative functional explanations for the observed dynamics are underexplored.

      We thank the reviewer for specifically highlighting the potential pertinence of this literature to our study. Indeed, we did not cite studies indicating a link between T6SS activity and the uptake of DNA and other resources released by targeted cells. As we note above, the release of intracellular contents from target cells is an inevitable consequence of the delivery of lytic effectors. Thus, distinguishing between fitness benefits conferred from the elimination of competitor species and those arising from scavenging the nutrients released during this process is not straightforward. Measuring the benefits deriving from the uptake of certain released molecules, such as DNA, was not immediately feasible in the system employed in this study and instead we focused on the direct lytic consequences of the effectors delivered via the T6SS. We will revise our Discussion to include reference to how downstream consequences of T6SS activity on target cells could impact the community, and thus the adaptive role of the T6SS in the microbiome.

      A further limitation concerns the taxonomic scope of the functional analysis. The authors state that the role of the T6SS in the murine environment is functionally investigated using genetically tractable Bacteroides species, citing the lack of genetic tools for Mucispirillum schaedleri. While this is a reasonable, practical choice, it means that a substantial fraction of T6SS-encoding species in the WildR community are not experimentally interrogated. Consequently, conclusions about the role of the T6SS in the murine gut necessarily reflect the subset of taxa that are genetically accessible and may not fully capture community-level or niche-specific functions of T6SS activity. Given that M. schaedleri is represented as a metagenome-assembled genome, its isolation and genetic manipulation would be technically challenging. Nonetheless, explicitly acknowledging this limitation and slightly tempering claims of generality would strengthen the manuscript.

      The reviewer points out that studying the T6SS activity in M. schadleri would potentially expand the generality of our claims. We agree that having an isolate of this species along with genetic tools for its manipulation would allow us to probe the importance of the T6SS in the gut microbiome more broadly. At the suggestion of the reviewer, we will add explicit mention for the need to develop such tools, an endeavor that lies outside of the scope of the current study.

      Finally, several interpretations would benefit from more cautious language. In particular, claims invoking fitness costs, selective advantage, or physiological burden should be explicitly framed as inferences from persistence dynamics, rather than as direct measurements, unless supported by additional quantitative fitness or growth assays.

      We agree with the reviewer that invoking fitness costs, selective advantages or physiological burdens should be done cautiously, and in our revised manuscript we will carefully re-evalute our usage of those terms. However, we would also argue invoking fitness costs and benefits when describe strain persistence dynamics in mice has substantial precedent in the literature ((Feng et al. 2020, Brown et al. 2021, Park et al. 2022, Segura Munoz et al. 2022), to list a handful of representative examples published by different groups). It is unclear to us what additional in vivo growth measurements could be taken to substantiate our claim that the T6SS provides a fitness benefit to B. acidifaciens during prolonged gut colonization, or that carrying the ICE imposes a fitness cost on P. vulgatus during long-term colonization. Our in vitro experiments evaluating the competitiveness conferred by T6SS activity provide a measure of insight into its fitness benefits, but as our in vivo strain persistence data and the work of many others show, in vitro measurements do not necessarily capture in vivo parameters.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors set out to determine how a contact-dependent bacterial antagonistic system contributes to the ability of specific bacterial strains to persist within a complex, native gut community derived from wild animals. Rather than focusing on simplified or artificial models, the authors aimed to examine this system in a biologically realistic setting that captures the ecological complexity of the gut environment. To achieve this, they combined controlled laboratory experiments with animal colonization studies and sequencing-based tracking approaches that allow individual strains and mobile genetic elements to be followed over time.

      Strengths:

      A major strength of the work is the integration of multiple complementary approaches to address the same biological question. The use of defined but complex communities, together with in vivo experiments, provides a strong ecological context for interpreting the results. The data consistently show that the antagonistic system is not required for initial establishment but plays a critical role in long-term strain persistence. This insight that moves beyond traditional invasion-based views of microbial competition. The observation that transferable genetic elements can confer only temporary advantages, and may impose longer-term costs depending on community context, adds important nuance to current understanding of microbial fitness.

      We thank the reviewer for the positive feedback and are glad they agree our study provides new insight into the role of interbacterial antagonism in natural communities.

      Weaknesses:

      Overall, there is not a lack of evidence, but a deliberate trade-off between ecological realism and mechanistic resolution, which leaves some causal pathways open to interpretation.

      The reviewer makes a good point that the complexity of the experimental system we employ precludes some lines of experimentation that would yield more mechanistic information. As the reviewer notes, we were aware of the tradeoff between mechanistic resolution and ecological realism when selecting our experimental system. Our deliberate choice to favor biological complexity over mechanistic clarity in this study stemmed from our perception that a major gap in understanding of the T6SS and other antagonism pathways lies in defining their ecological function in complex microbial communities.

      Reviewer #3 (Public review):

      Summary:

      Shen et al. investigate the contribution of the type VI secretion system of Bacteroidales in the gut microbiome assembly and targeting of closely related species. They demonstrate that B. acidifaciens relies on T6SS-mediated antagonism to prevent displacement by co-resident Bacteroidales and other members of the microbiome, allowing B. acidifaciens to persist in the gut.

      Strengths:

      Using a gnotobiotic model colonized with a wild-mouse microbiome is a significant strength of this study. This approach allows tracking of microbiome changes over time and directly examining targeting by Bacteroidales carrying T6SS in a more natural setting. The development of ICE-seq for mapping the distribution of the T6SS in the microbiome is remarkable, enabling the study of how this bacterial weapon is transferred between microbiome members without requiring long-read metagenomics methods.

      We thank the reviewer for their enthusiasm toward our study.

      Weaknesses:

      Some conclusions are based on only four mice per condition. The author should consider increasing the sample size.

      We agree that in some experiments it would be beneficial to increase the sample size from four mice. However, the experiments we performed for this study are time and resource intensive. Additionally, the experiments on which we base our primary conclusions were all independently replicated with similar results. Given these factors, we determined that the extra confidence that might be afforded by increasing our sample size did not merit the delay in publication and investment in resources that would be required.

      Overall, the authors successfully achieved their objectives, and their experimental design and results support their findings. As mentioned in the discussion, it would be important to investigate the role of the T6SS in resilience to disturbances in the microbiome, such as antibiotics, diet, or pathogen invasion. This work represents a step forward in understanding how contact-dependent competition influences the gut microbiome in relevant ecological contexts.

      We agree that investigating the role of the T6SS during perturbations of the microbiome is a key next step for this work and thank the reviewer for highlighting this important future direction.

      References

      Brown, E. M., H. Arellano-Santoyo, E. R. Temple, Z. A. Costliow, M. Pichaud, A. B. Hall, K. Liu, M. A. Durney, X. Gu, D. R. Plichta, C. A. Clish, J. A. Porter, H. Vlamakis and R. J. Xavier (2021). "Gut microbiome ADP-ribosyltransferases are widespread phage-encoded fitness factors." Cell Host Microbe 29(9): 1351-1365 e1311.

      Feng, L., A. S. Raman, M. C. Hibberd, J. Cheng, N. W. Griffin, Y. Peng, S. A. Leyn, D. A. Rodionov, A. L. Osterman and J. I. Gordon (2020). "Identifying determinants of bacterial fitness in a model of human gut microbial succession." Proc Natl Acad Sci U S A 117(5): 2622-2633.

      Park, S. Y., C. Rao, K. Z. Coyte, G. A. Kuziel, Y. Zhang, W. Huang, E. A. Franzosa, J. K. Weng, C. Huttenhower and S. Rakoff-Nahoum (2022). "Strain-level fitness in the gut microbiome is an emergent property of glycans and a single metabolite." Cell 185(3): 513-529 e521.

      Segura Munoz, R. R., S. Mantz, I. Martinez, F. Li, R. J. Schmaltz, N. A. Pudlo, K. Urs, E. C. Martens, J. Walter and A. E. Ramer-Tait (2022). "Experimental evaluation of ecological principles to understand and modulate the outcome of bacterial strain competition in gut microbiomes." ISME J 16(6): 1594-1604.

    1. Author response:

      We thank the editors and reviewers for their careful and constructive evaluation of our manuscript. We appreciate the recognition of the conceptual novelty and in vivo relevance of our findings. We have carefully considered all comments and outline below the major revisions and additional analyses we will undertake. For clarity, we address the reviewers’ comments in thematic sections.

      Cell-autonomous contribution of Tent5a to phenotype

      We agree that the use of a complete knockout model raises the possibility of indirect or non-cell-autonomous effects on tooth development, particularly given the observed dentin alterations. To address this point directly, we are generating and analyzing an ameloblast-specific conditional model we have already on shelf (Ambn-Cre; Tent5a<sup>flox/flox</sup>) to determine whether the enamel phenotype arises from cell-autonomous loss of TENT5A in the secretory epithelium. This approach will allow us to distinguish epithelial-intrinsic effects from potential secondary contributions of odontoblasts or mesenchymal tissues. Results from this model will be incorporated into the revised manuscript.

      Mechanistic basis and substrate specificity

      We agree that the mechanism underlying substrate selectivity of TENT5A requires further clarification. We have performed multiple classical RNA–protein interaction assays, including CLIP-based approaches, without identifying a clear sequence-specific recognition motif. In the revised manuscript, we will present substrate specificity as an open mechanistic question rather than implying a defined recognition mechanism.

      To strengthen this aspect, we will extend our analysis to include combined immunoprecipitation strategies and investigation of potential ribosome-associated or co-translational interactions of TENT5A.

      In addition, we will further validate selected high-confidence TENT5A interactors identified in our dataset in context of putative changes in AmelX-polyA tail length.

      Poly(A) tail length and functional causality

      We acknowledge that shortening of the poly(A) tail alone does not formally establish causality. However, our data consistently show that TENT5A-dependent shortening of poly(A) tails correlates with reduced mRNA and protein levels of key enamel matrix components. In the revised manuscript, we will clarify this mechanistic framework more explicitly, integrating poly(A) length, transcript abundance, and protein-level data in a structured manner, while clearly distinguishing correlation from formal proof of causality.

      We will also perform additional functional assays, including mRNA stability measurements in vitro in cells with genetic ablation of Tent5a, to further test the link between poly(A) shortening and reduced AmelX protein levels.

      Quantitative microCT and enamel morphology

      We will include quantitative microCT analyses of enamel thickness and mineral density from multiple biological replicates per genotype (n ≥ 3). Sample numbers will be explicitly stated throughout. Additional high-resolution scans of isolated incisors will be provided. We will also quantify occlusal angle and include whole-skull reconstructions to document malocclusion. Maxillary enamel will be analyzed and quantified alongside mandibular enamel.

      SEM terminology will be corrected (e.g., replacing “crystal structure” with “rod/interrod organization”), and structural parameters such as rod diameter and interprismatic matrix proportion will be quantitatively assessed.

      We agree that ultrastructural analysis of ameloblast secretory morphology is important. We have experience with TEM analysis of demineralized incisors and will perform additional ultrastructural examination to assess the integrity of Tomes’ processes and the secretory apparatus in Tent5a-deficient ameloblasts. These data will allow us to distinguish between primary alterations in secretory morphology and downstream effects on matrix organization.

      Amelx splice variants

      We will re-analyze our RNA-seq data with specific attention to exon 4-containing isoforms and clarify the distribution of splice variants in WT and KO samples. These findings will be explicitly discussed in the context of prior literature.

      Co-localization and self-assembly claims

      We agree that conventional light microscopy cannot directly resolve nanoscale self-assembly events. In Figure 3, our intention was to demonstrate differential subcellular distribution and partial segregation of AMELX and AMBN within secretory compartments, rather than to claim direct visualization of molecular self-assembly. In the revised manuscript, we will clarify this distinction, moderate the terminology accordingly, and provide explicit quantitative co-localization analyses across multiple biological replicates.

      TENT5 family paralogs

      To address potential redundancy within the TENT5 family, we will analyze published single-cell RNA-seq datasets (Sharir et al., 2019; Krivanek et al., 2020) to assess expression of TENT5 paralogs in ameloblasts. These findings will be validated using targeted transcriptional analyses.

      Human clinical relevance

      We appreciate the suggestion to examine potential human enamel phenotypes. We will pursue retrospective analysis of clinical and imaging data from patients carrying TENT5A variants through our collaborations with rare disease networks and specialized centers in Europe and the United States. Any relevant findings will be incorporated into the revised manuscript.

      Tissue sampling clarification

      We apologize for imprecise terminology regarding transcriptomic sampling. The analyzed tissue corresponds to the proximal incisor region up to the mineralization stage. We will include a schematic and clarify nomenclature throughout the manuscript.

      Language and data clarity

      The manuscript will be thoroughly revised for clarity, consistency of terminology, figure referencing, and accuracy of citations. We will explicitly clarify the methodology used for protein quantification, including normalization strategy and densitometric analysis, to address inconsistencies noted in the supplementary data. We will also expand the discussion to address the biological relevance of moderate poly(A) shortening, referencing established literature demonstrating that even subtle changes in tail length can significantly influence translational efficiency.

      Although AMELX is the most abundant enamel matrix protein and exhibits a consistent TENT5A-dependent poly(A) shortening phenotype, our data demonstrate that multiple secreted proteins are similarly affected. We will revise the text to clearly articulate that the enamel phenotype likely reflects the combined contribution of multiple TENT5A-regulated secretory factors rather than a single-gene effect.

      We believe these revisions will substantially strengthen the mechanistic, quantitative, and conceptual framework of the study and provide a clearer foundation for interpreting TENT5A-dependent regulation of enamel biomineralization.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The data in Figure 1 is not novel, similar data has been reported elsewhere.

      We are grateful for the critical evaluation of our finding. Although there have been a few researches indicating the prevalence of FGFR2-amplified GC patients, our research provided a novel dataset of 161 GC patients using next-generation sequencing (NGS) in China, further emphasizing the high frequency of FGFR2 amplification in gastric cancer patients. Moreover, the proportion of FGFR2-amplified GC patients in our center (6.2%) is relatively higher than that of TCGA cohort (5%).

      We have transferred the original Figure 1C and 1D to the supplementary figures, and constructed a novel pie chart for Nanjing Drum Tower Hospital cohort to compare with the TCGA cohort.

      It is unclear why the two panels in Fig 2a and 2b can not be integrated into one panel, which will make it easier to compare the activities.

      Thanks for pointing this out. In the first figure of Figure 2a and 2b, we performed gradient concentration CCK8 detection on the cytotoxicity of SHP099 against tumor cells. In the second figure, we selected 10 μm (IC50) as the fixed concentration of SHP099 for combined efficacy testing with gradient concentration of AZD4547. Moreover, the units of the horizontal axis in both figure 2a and 2b cannot be unified. Therefore, we believe that the two figures in figures 2a and 2b are not suitable for merging into one figure.

      For the convenience of observation, we integrated the first panel of figure 2a and 2b into one panel, and integrated the second panel in the same way.

      The synergetic effects of azd4547 and shp099 are not significant in Fig 2e and 2f, as well as in Fig. 3g and fig. 4f

      In Fig 2e and 2f, we not only analyzed the synergetic effects of 3 nM (a relatively lower dose) AZD4547 and 10 μm SHP099, but also 10 nM (a relatively higher dose) AZD4547 and 10 μm SHP099. The synergetic effects of different dosage combinations should be compared correctly. From our perspective, the combination treatment led to a stronger inhibition of phospho-FGFR, phospho-SHP2 and FGFR2-initiated downstream signaling molecules, especially in KATOIII.

      For ease of comparison, we circled 10 μm SHP099, 10nM AZD4547 and 10nM AZD4547+10 μm SHP099 in red.

      Author response image 1.

      Author response image 2.

      We also circled 10μM SHP099, 3nM AZD4547 and 3nM AZD4547+10 μm SHP099 in blue.

      Author response image 3.

      Author response image 4.

      For ease of comparison, we also conducted grayscale value analysis and normalization using image J.

      Author response image 5.

      Author response image 6.

      Author response image 7.

      Author response image 8.

      In Fig. 3g, the combination therapy exhibited relatively stronger inhibitory effects on phospho-ERK, phospho-AKT and phospho-mTOR.

      For ease of comparison, we conducted grayscale value analysis and normalization using image J.

      The unclear effect of combination therapy may be due to the presence of impurities other than tumor cells in patient’s ascites.

      Author response image 9.

      In Fig. 4f, it was obvious that phospho-AKT and phospho-mTOR were further suppressed in combination group.

      For ease of comparison, we conducted grayscale value analysis and normalization using image J.

      Author response image 10.

      Therefore, in our opinions, our data could relatively sufficiently confirm the synergetic effects of AZD4547 and SHP099.

      Data in Fig. 5 is weak and can be removed. It is unclear why FGFR inhibitor has some activities toward t cells since t cells do not express FGFR.

      The activation effect of SHP099 on T cells has been validated in many articles. In a previous study published in Cancer Immunology Research, it was pointed out that the combination of FGFR2 inhibitor erdafitinib and PD-1 antibody can activate T cells and downregulate T cell surface exhaustion related factors (including PD-1) in vivo Therefore, the anti-tumor immune effect of FGFR2 inhibitor cannot be ignored. Although T cells do not express FGFR, FGFR2 inhibitors may still affect PD-1 expression on the surface of T cells in some other ways, which requires further research. We have deleted fig.5D in our article. We believe that the combination of FGFR2 inhibitor and SHP2 inhibitor not only has a direct killing effect on tumor cells, but also promotes anti-tumor immunity by activating T cells. Therefore, we believe that the in vitro data in Figure 5 is also meaningful.

      Reviewer #2 (Public review):

      Strengths:

      The data is generally well presented and the study invokes a novel patient data set which could have wider value. The study provides additional evidence to support the combined therapeutic approach of RTK and phosphatase inhibition.

      We sincerely thank the reviewer for the critical evaluation and appreciation of our findings.

      Weaknesses:

      Combined therapy approaches targeting RTKs and SHP2 have been widely reported. Indeed, SHP099 in combination with FGFR inhibitors has been shown to overcome adaptive resistance in FGFR-driven cancers. Furthermore, the inhibition of SHP2 has been documented to have important implications in both targeting proliferative signalling as well as immune response. Thus, it is difficult to see novelty or a significant scientific advance in this manuscript. Although the data is generally well presented, there is inconsistency in the interpretation of the experimental outcomes from ex vivo, patient and mouse systems investigated. In addition, the study provides only minor or circumstantial understanding of the dual mechanism.

      We acknowledge that our research on the mechanism of dual inhibition is not deep enough. There remain more in-depth mechanisms of the combination of SHP2 inhibitor and RTK inhibitors needed to be explored, and it would be the main direction of our future study.

      Using data from a 161 patient cohort FGFR2 was identified as displaying amplification of FGFR2 in ~6% with concomitant elevation of mRNA of patients which correlated with PTPN11 (SHP2) mRNA expression. The broader context of this data is of value and could add a different patient demographic to other data on gastric cancer. However, there is no detail on patient stratification or prior therapeutic intervention.

      Thanks for pointing this out and we have added a table on patients’ stratification such as age, gender and so on. Unfortunately, data on patients’ prior therapeutic intervention weren’t collected.

      In SNU16 and KATOIII cells the combined therapy is shown to be effective and appears to be correlated with increased apoptotic effects (i.e. not immune response).

      Fig 2E suggests that the combined therapy in SNU16 cells is a little better than FGFR2-directed AZD457 inhibitor alone, particularly at the higher dose.

      The individual patient case study described via Fig 3 suggests efficacy of the combined therapy (at very high dosage), however, the cell biopsies only show reduced phosphorylation of ERK, but not AKT. This is at odds with the ex vivo cell-based assays. Thus, it is not clear how relevant this study is.

      The mouse xenograft study shows a convincing reduction in tumor mass/volume and clear reduction in pAKT, whilst pERK remains largely unaffected by the combined therapeutic approach. This is in conflict with the previous data which seems to show the opposite effect. In all, the impact of the dual therapy is unclear with respect to the two pathways mediated by ERK and AKT.

      Thank you for the comment. Previous researches have confirmed that both RAS/ERK and PI3K/AKT pathways are two important downstream signaling of FGFR2. In Fig 2E and F, we observed that in FGFR2-amplified cell lines dual blockade had significant inhibitory effects both on p-ERK and p-AKT, and the inhibitory effect on p-ERK is greater than that on p-AKT. Similarly, in Fig 3G, dual blockade mainly suppressed p-ERK, and slightly inhibited p-AKT and p-mTOR in cancer cells derived from the individual patient. Thus, in the two types in-vitro models, dual inhibition simultaneously inhibited both RAS/ERK and PI3K/AKT pathways, and primarily inhibited RAS/ERK pathway, which is not contradictory.

      Author response image 11.

      Author response image 12.

      Author response image 13.

      For the in-vivo animal model. Although dual inhibition had inhibitory effects on both pathways, it mainly suppressed p-AKT.

      In both in vivo and in vitro models, combination therapy has a certain inhibitory effect on the RAS/ERK and PI3K/AKT pathways, but the emphasis on the two is not the same in vivo and in vitro. Considering the significant differences between in vivo and in vitro models, we believe that this difference in emphasis is understandable.

      Author response image 14.

      Finally, the authors demonstrate the impact of SHP2 on PD-1 expression and propose that the SHP099/AZD4547 combination therapy significantly induces the production of IFN-γ in CD8+ T cells. This part of the study is unconvincing and would benefit from the investigation of the tumor micro-environment to assess T cell infiltration.

      To investigate the tumor micro-environment to assess T cell infiltration, we have to establish our research model in immunocompetent mice. However, there is currently only one type of gastric cancer cell line derived from mice, MFC, which is not a cell line with FGFR2 amplification. We attempted to transfect FGFR2 amplification plasmids into MFC, but the transfection effect was poor, making it difficult to conduct in vivo animal experiments.

      Reviewer #3 (Public review):

      Strengths:

      The authors demonstrate that FGFR2 amplification positively correlates with PTPN11 in human gastric cancer samples, providing rationale for combination therapies. Furthermore, convincing data are provided demonstrating that targeting both FGFR and SHP2 is more effective than targeting either pathway alone using in vitro and in vivo models. The use of cells derived from a gastric cancer patient that progressed following treatment with an FGFR inhibitor is also a strength. The findings from this study support the conclusion that SHP2 inhibitors enhance the efficacy of FGFR-targeted therapies in cancer patients. This study also suggests that targeting SHP2 may also be an effective strategy for targeting cancers that are resistant to FGFR-targeted therapies.

      Weaknesses:

      The main caveat with these studies is the lack of an immune competent model with which to test the finding that this combination therapy enhances T cell cytotoxicity in vivo. Discussing this limitation within the context of these findings and future directions for this work, particularly since the combination therapy appears to work quite well without the presence of T cells in the environment, would be beneficial.

      Thank you for the great suggestion. To investigate the tumor micro-environment to assess T cell infiltration, we have to establish our research model in immunocompetent mice. However, there is currently only one type of gastric cancer cell line derived from mice, MFC, which is not a cell line with FGFR2 amplification. We attempted to transfect FGFR2 amplification plasmids into MFC, but the transfection effect was poor, making it difficult to conduct in vivo animal experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points. The manuscript is poorly written and loaded with language errors.

      We sincerely thank you for your constructive suggestion and we are sorry for the mistake. We have polished the article and corrected these language errors.

      Reviewer #2 (Recommendations for the authors):

      In addition to the comments made in the Public Review the manuscript lacks detail on statistical analysis of experimental results.

      Thank you for your advice. In response to the feedback, we have supplemented detail on statistical analysis of experimental results in the “Methods” part.

      Reviewer #3 (Recommendations for the authors):

      There are numerous grammatical errors throughout, and incorrect wording is used in some places (such as "syngeneic mouse tumor model" rather than "xenograft tumor model", line 253). Careful proofreading and editing of this manuscript is recommended.

      Thank you for your suggestion. We have made corrections to the relevant content of the article.

      AZD4547 is an FGFR-selective inhibitor and is not specific for FGFR2 as it also targets FGFR1 and FGFR3, this should be clarified in the text.

      Thank you for rasing this point. We have clarified that AZD4547 is an FGFR-selective inhibitor targeting FGFR1-3 in the “Introduction” part.

      The specific FGFR inhibitor(s) used to treat the patient with FGFR2 amplification, are the authors able to provide this information?

      Thank you for raising this important issue. Indeed, due to the difficulty of small molecule drug development, the fastest clinical progress currently is in FGFR pan inhibitors. Recently, Relay Therapeutics has also developed a highly FGFR2-selective inhibitor, RLY-4008, in phase I/II clinical trials, but lacks preclinical research on gastric cancer.

      Figure 2F: the p38 and p-p38 bands are cut off at the bottom

      We sincerely thank you for your thoughtful feedback. we have improved our experimental methods and retested the two p38 and p-p38 in Figure 2F by western blotting.

      Author response image 15.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper investigates the thermal and mechanical unfolding pathways of the doubly knotted protein TrmD-Tm1570 using molecular simulations, optical tweezers experiments, and other methods. In particular, the detailed analysis of the four major unfolding pathways using a well-established simulation method is an interesting and valuable result.

      Strengths:

      A key finding that lends credibility to the simulation results is that the molecular simulations at least qualitatively reproduce the characteristic force-extension distance profiles obtained from optical tweezers experiments during mechanical unfolding. Furthermore, a major strength is that the authors have consistently studied the folding and unfolding processes of knotted proteins, and this paper represents a careful advancement building upon that foundation.

      We appreciate and we thank the reviewer for reading our manuscript.

      Weaknesses:

      While optical tweezers experiments offer valuable insights, the knowledge gained from them is limited, as the experiments are restricted to this single technique.

      The paper mentions that the high aggregation propensity of the TrmD-Tm1570 protein appears to hinder other types of experiments. This is likely the reason why a key aspect, such as whether a ribosome or molecular chaperones are essential for the folding of TrmD-Tm1570, has not been experimentally clarified, even though it should be possible in principle.

      We appreciate the suggestion that clarifying the requirement for molecular chaperones or the ribosome in TrmD-Tm1570 folding is crucial. We are pleased to report that the experiment investigating the role of molecular chaperones in the folding of TrmD-Tm1570 is currently under investigation in our laboratory. These results will provide the clarification on this aspect and will be incorporated into a future manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors combined coarse-grained structure-based model simulation, optical tweezer experiments, and AI-based analysis to assess the knotting behavior of the TrmD-Tm1570 protein. Interestingly, they found that while the structure-based model can fold the single knot from TrmD and Tm1570, the double-knot protein TrmD-Tm1570 cannot form a knot itself, suggesting the need for chaperone proteins to facilitate this knotting process. This study has strong potential to understand the molecular mechanism of knotted proteins, supported by much experimental and simulation evidence. However, there are a few places that appear to lack sufficient details, and more clarification in the presentation is needed.

      Strengths:

      A combination of both experimental and computational studies.

      We appreciate and we thank the reviewer for reading our manuscript.

      Weaknesses:

      There is a lack of detail to support some statements.

      (1) The use of the AI-based method, SOM, can be emphasized further, especially in its analysis of the simulated unfolding trajectories and discovery of the four unfolding/folding pathways. This will strengthen the statistical robustness of the discovery.

      We thank the reviewer for this observation. However, the AI-based method, SOM, was applied to obtain the main representative trajectories for the mechanical unfolding MD simulations. Specifically, for the TrmD, Tm1570, and fusion protein (TrmD-Tm1570) we extracted the representative conformational states by selecting the most highly populated SOM clusters shown in SI Figure 5 - figure supplement 3. Then, by identifying the cluster centroid, we selected the nearest point (simulations). These correspond to the clusters number 1 for Tm1570, number 11 for TrmD, and number 7 for TrmD-Tm1570. A sentence was added in the main manuscript to clarify how the main representative confirmation was obtained.

      On the other hand, no AI‑based methods were applied to the thermal unfolding simulations. The four thermal unfolding trajectories shown in Figure 3 were obtained as follows: (i) trajectories where TrmD unfolds first and its knot unties before Tm1570 unfolds, corresponding to pathway 1 (Figure 3A and E); (ii) trajectories where Tm1570 unfolds and unties first, followed by TrmD, corresponding to pathway 3 (Figure 3C and G); and (iii) trajectories where TrmD unfolds first, then Tm1570, after which the TrmD knot unties and finally the Tm1570 knot unties—this corresponds to pathway 2. Pathway 4 follows the same sequence but in the reverse order.

      (2) The manuscript would benefit from a clearer description of the correlation between the simulation and experimental results. The current correlation, presented in the paragraph starting from Line 250, focuses on measured distances. The authors could consider providing additional evidence on the order of events observed experimentally and computationally. More statistical analyses on the experimental curves presented in Figure 4 supplement would be helpful.

      We thank the reviewer for this suggestion. In response, we prepared additional statistical analyses in a table format reporting the average length‑change increments together with their standard deviations, and we clarified in the revised text that the ± values correspond to standard deviations. In addition, we quantified the percentage of TrmD, Tm1570, and TrmD-Tm1570 unfold completely, providing a clearer comparison of the order of events observed experimentally and computationally. These analyses have been incorporated into the revised manuscript, Tables 1 and 2.

      (3) How did the authors calibrate the timescale between simulation and experiment? Specifically, what is the value \tau used in Line 270, and how was it calculated? Relevant information would strengthen the connection between simulation and experiment.

      In our model time unit is defined by a relation , where m is the reduced mass unit, is an average average mass of an amino acid, m = 110 Da = 1.66 x 10<sup>-27</sup> kg, 𝜀 is the reduced energy unit, an average interaction energy between amino acids. We may assume that ε is around 2-3 kcal/mol = 2-3 x 6.95 x 10<sup>-21</sup> J, is a distance unit and is equal to 1 nm.

      After plugging this values into the equation defining 𝜏 , we get: 𝜏 = 3.2 ps.

      The definition of the time unit comes from the fact that this is how one can combine units of mass, distance and energy into an expression that has an unit of time.

      The pulling speeds used in the simulations (0.05–0.15 Å/) correspond to approximately 1.6 -4.7 m/s in real units. These speeds are necessarily much higher than the experimental pulling The pulling speeds used in the simulations (0.05–0.15 Å/ ) correspond to approximately 1.6 - speed (20 nm/s), which is a well‑known limitation of steered molecular dynamics. However, our coarse‑grained model is run in an implicit solvent regime and does not explicitly include hydrodynamic friction. As a consequence, the simulated dynamics do not reproduce absolute real time kinetics. Instead, the comparison between simulation and experiment is made through relative unfolding pathways, force extension behavior, and contour length changes, which remain robust across the range of simulated pulling speeds.

      Thus, 𝜏 = 3.2 ps is derived directly from the coarse‑grained model parameters rather than calibratedτ to experiment, and the connection between simulation and experiment is established through mechanistic agreement rather than matching absolute timescales.

      We have now added a clarifying sentence to the manuscript (Methods and Materials - Mechanical unfolding simulations) explaining how the timescale was defined and how the value of  was obtained.

      Reference: 

      Szymczak, P., and Marek Cieplak. "Stretching of proteins in a uniform flow." The Journal of chemical physics 125.16 (2006).

      (4) In Line 342, the authors comment that whether using native contacts or not, they cannot fold double-knotted TrmD-Tm1570. Could the authors provide more details on how non-native interactions were analyzed?

      To analyze the role of non‑native interactions, we calculated two non‑native contact maps, first using a distance cutoff criterion and second by identifying the highly frustrated contacts based on the frustration index using Frustratometer (http://frustratometer.qb.fcen.uba.ar/) - figure below. From this procedure, the non‑native interactions were incorporated in the SBM C-alpha model to potentially assist refolding or knot formation. However, in neither case we observe successful refolding or the formation of the double‑knotted native topology. These results indicate that the addition of these non‑native contacts are insufficient to drive the refolding of the TrmD–Tm1570 protein. This result may suggest that the protein needs the support of chaperones or the active role of ribosomes to tie the two knots. We have now clarified this point more explicitly in the revised manuscript .

      Author response image 1.

      Native and non‑native contact maps for TrmD–Tm1570. The upper triangle (blue dots) corresponds to the cutoff‑based contact map and shows only unique contacts not present in the native contact map. The lower triangle (red dots) represents highly frustrated contacts, again showing only unique contacts absent from the native map. Black dots indicate the native contacts derived from the structure, and the contact map was generated using the Shadow Contact Map software. The blue and orange shadows correspond to the knot position for TrmD and Tm1570 proteins, respectively. 

      (5) It appears that the manuscript lacks simulation or experimental evidence to support the statement at Line 343: While each domain can self-tie into its native knot, this process inhibits the knotting of the other domain. Specifically, more clarification on this inhibition is needed.

      Explaining this phenomenon remains challenging, and several contributing factors are likely.

      (1) The folding success rates of the individual TrmD and Tm1570 domains are low (<3%); folding of the double-knotted protein is therefore expected to be even less efficient. 

      (2) While formation of a single knot is observed when the two domains are examined, the folded domain adopts a native-like but not fully native conformation, regardless of whether it is TrmD or Tm1570. (2A) Fluctuations of the unfolded second domain may impose a destabilizing load, promoting unfolding of the folded domain. (2B) Conversely, folding of one domain restricts the conformational space available to the other. Such restriction may have either stabilizing or destabilizing effects: although reduced conformational space (crowding) is generally thought to increase the probability of knot formation in polymers, in this system the constraint is localized rather than global.

      (3) It is possible that extending the simulations to much longer timescales would allow formation of the second knot; however, within the timescales accessible here, unfolding of the first knot is observed instead.

      (4) The TrmD–Tm1570 protein forms a dimer with a well-defined interface, whereas our simulations were performed on a monomeric unit. Consequently, both domains are solvent-exposed, forming an open two-domain system with tRNA-binding elements that are not stabilized by intermolecular interactions.

      Taken together, these factors preclude a quantitative assessment of the dominant contribution. Our results suggest that efficient folding may require assistance from molecular chaperones or an active role of the ribosome in coordinating formation of the two knots.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The paper notes at the beginning of its results section that simulations aiming to fully fold the TrmD-Tm1570 protein from a denatured state were unsuccessful. While the failure to achieve complete folding is itself an instructive and important result, there is room for improvement in how it's presented. The authors provide no specific details on what actually occurred during these simulations. It is plausible that some intermediate state was reached, and one can imagine that the knotting of the C-terminal part, Tm1570, was partially completed. A more detailed description of these outcomes would have been beneficial.

      In the main manuscript (Figure 3), we reported the folding trajectories and the probability of native contact formation for the TrmD–Tm1570 protein, focusing on the four main observed unfolding pathways from our simulations. In addition to these common pathways, we also examined a small number of trajectories which one or both domains may refold. These are presented in Figure 3 - figure supplements 1 and 2, where we highlight a set of trajectories that we classify as rare events. In these rare trajectories, partial refolding and the formation of intermediate states can indeed be observed. However, as described in the main text, successful refolding of the fusion protein only occurs when the knot remains close to its native position and does not undergo large fluctuations along the chain. When the knot drifts significantly, refolding is not completed.

      Figure 3 - figure supplement 1 shows six representative examples of intermediate states sampled during these simulations. As the reviewer suggested, some intermediate conformations were reached, including partial reformation of structural elements. However, only the trajectory which maintains the knot sufficiently close to its native location is able to do substantial refolding. We have now clarified this point more explicitly in the revised manuscript to better explain why full folding was not achieved and how the knot dynamics constrain the refolding process.

      (2) Is it not possible to plot the degree of knot formation as a function of time or Q in Figure 3A-H? Doing so would make the verbally described results much clearer.

      We thank the reviewer for the suggestion. Based on your observation, we have added a new figure in the SI manuscript (Figure 3 - figure supplement 3) showing the knot translocation as a function of the frames with their respective structure representations from the transitions, from folded to unfolded state and knot untied processes.

      (3) Placement of a paragraph starting from line 250 looks odd to me. The paragraph describes simulation results of the mechanical unfolding, which is fully described in the following section. Specifically, the simulation result is discussed before describing its method/outline, which is to be avoided as far as possible.

      According to the standard journal style, the Method section is described after the Discussion section. However, in the simulation's results, a sentence addressing the methods was included to guide the reader through the text. 

      (4) This is only an optional request. It is highly desired to examine the in vitro folding of TrmD-Tm1570 with and without molecular chaperones. At least, authors can envision/discuss this direction.

      We agree that examining the in vitro folding of TrmD–Tm1570 with and without molecular chaperones would provide important mechanistic insights into the role of the fold of knotted proteins. We are planning to perform these experiments as part of our ongoing work, and in the revised manuscript we will add a discussion on this direction and its potential impact.

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 6C was not referenced or discussed in the manuscript.

      We thank the reviewer for pointing this out. Figure 6C is indeed referenced and discussed in the manuscript.

      (2) Several places refer to figures in the Supporting Information, and should be updated to refer to the supplement figures associated with the main figures. 

      In the revised version we ensure that all references are updated and clearly labeled.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Since dimerization is essential for SARS-CoV-2 Mpro enzymatic activity, the authors investigated how different classes of inhibitors, including peptidomimetic inhibitors (PF-07321332, PF-00835231, GC376, boceprevir), non-peptidomimetic inhibitors (carmofur, ebselen, and its analog MR6-31-2), and allosteric inhibitors (AT7519 and pelitinib), influence the Mpro monomer-dimer equilibrium using native mass spectrometry. Further analyses with isotope labeling, HDX-MS, and MD simulations examined subunit exchange and conformational dynamics. Distinct inhibitory mechanisms were identified: peptidomimetic inhibitors stabilized dimerization and suppressed subunit exchange and structural flexibility, whereas ebselen covalently bound to a newly identified site at C300, disrupting dimerization and increasing conformational dynamics. This study provides detailed mechanistic evidence of how Mpro inhibitors modulate dimerization and structural dynamics. The newly identified covalently binding site C300 represents novelty as a druggable allosteric hotspot.

      Strengths:

      This manuscript investigates how different classes of inhibitors modulate SARS-CoV-2 main protease dimerization and structural dynamics, and identifies a newly observed covalent binding site for ebselen.

      Weaknesses:

      The major concern is the absence of mutagenesis data to support the proposed inhibitory mechanisms, particularly regarding the role of the inhibitor binding site.

      We thank the reviewer for the comments and recognition of our study. We agree that mutagenesis experiments are very helpful to validate the proposed mechanisms. We will perform site-directed mutagenesis of the key residue C300 and assess the effects of those C300 mutants on dimerization and enzymatic activity of Mpro, and integrate the results and discussion into the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This is a mechanistic study that provides new insights into the inhibition of SARS-CoV-2 Mpro.

      Strengths:

      The identification of dimer interface stabilization/destabilization as distinct inhibitory mechanisms and the discovery of C300 as a potential allosteric site for ebselen are important contributions to the field. The experimental approach is modern, multi-faceted, and generally well-executed.

      We thank the reviewer for the positive comments and recognition of our study.

      Weaknesses:

      The primary weaknesses relate to linking the biophysical observations more directly to functional enzymatic outcomes and providing more quantitative rigor in some analyses. While the study is overall strong, addressing its weaknesses and limitations would elevate the impact and translational relevance of the current manuscript.

      We thank the reviewer for the comments that are very helpful for improving the quality and impact of our manuscript.

      (1) Correlation with Functional Activity:

      The most significant gap is the lack of direct enzymatic activity assays under the exact conditions used for MS and HDX. While EC50 values are listed from literature, demonstrating how the observed dimer stabilization (by peptidomimetics) or dimer disruption (by ebselen) directly correlates with inhibition of proteolytic activity in the same experimental setup would solidify the functional relevance of the biophysical observations. For instance, does the fraction of monomer measured by native MS quantitatively predict the loss of activity? Also, the single inhibitor concentration used in each MS experiment needs to be specified in the main text and legends. A discussion on whether the inhibitor concentrations required to observe these dimerization effects (in native MS) or structural dynamics (in HDX-MS) align with EC50 values would be helpful for contextualizing the findings.

      We thank the reviewer for the points and agree that directly linking our biophysical observations to functional outcomes under identical conditions would be more meaningful. We will perform enzymatic activity assays to investigate whether the fraction of monomer measured by native MS can predict the loss of activity. The inhibitor concentrations used in each MS experiment will be explicitly stated in the main text and figure legends, and we will also discuss how these concentrations relate to the EC50/IC50 values, providing content for the biophysical observations.

      (2) For the two Cys residues found to be targeted by ebselen, what are their respective modification stoichiometry related to the ebselen concentration? Especially for the covalent binding site C300, which is proposed in this study to represent a novel allosteric inhibition mechanism of ebselen, more direct experimental evidence is needed to support this major hypothesis. Does mutation or modification of C300 affect the Mpro dimerization/monomer equilibrium and alter the enzymatic activity? If ebselen acts as a covalent inhibitor linked to multiple Cys, why is its activity only in the uM range?

      We thank the reviewer for the insightful comments. To address the stoichiometry of ebselen modification, we will further analyze the data and discuss accordingly. To display more direct evidence of C300 as a novel allosteric inhibition site of ebselen, we will perform site-directed mutagenesis and investigate whether these C300 mutants affect the Mpro dimerization and enzymatic activity. Regarding the modification of C300, several independent studies have been cited in this manuscript and showed that oxidation (by glutathione, Davis et., 2021) or chemical modification of C300 (by glutathione bismuth drugs, Tao et al., 2021, and Tixocortol, Davis et., 2024) leads to Mpro inactivation and promotes monomer formation. We will cite and further discuss these studies in the Discussion. The µM-range activity of ebselen can be explained by its multi-target covalent binding to multiple cysteines. The variable efficacy of cysteine modification may account for ebselen's moderate potency, as not all modifications equally inhibit their targets.

      (3) For the allosteric inhibitor pelitinib with low-uM activity, no significant differences in deuterium uptake of Mpro were observed. In terms of the binding affinity, what is the difference between pelitinib and ebselen? Some explanations could be provided about the different HDX-MS results between the two non-peptidomimetic inhibitors with similar activities.

      Pelitinib has non-covalent binding with Mpro, while the binding between ebselen and Mpro is covalent. We will add some explanations and discussion about their different HDX-MS results in the revised version.

      (4) Native MS Quantification:

      The analysis of monomer-dimer ratios from native MS spectra appears qualitative or semi-quantitative. A more rigorous and quantified analysis of the percentage of dimer/monomer species under each condition, with statistical replicates, would strengthen the equilibrium shift claims. For native MS analysis of each inhibitor, the representative spectrum can be shown in the main figure together with quantified dimer/monomer fractions from replicates to show significance by statistical tests.

      We thank the reviewer for the suggestion, and we will perform a more rigorous and quantitative analysis of the monomer-dimer equilibrium. For each condition (unbound Mpro and Mpro bound to each inhibitor), native MS experiments will be shown in triplicate. As suggested, we will include a representative native MS spectrum for each condition. The quantified monomer/dimer ratios from replicates will be added. The results with statistical analysis will be provided to show significance.

      (5) Changes of HDX rates in certain regions seem very subtle. For example, as it states 'residues 296-304 in the C-terminal region of M pro were more flexible upon ebselen binding (Figure 4c)', the difference is barely observable. The percentage of HDX rate changes between two conditions (with p values) can be specified in the text for each fragment discussed, and any change below 5% or 10% is negligible.

      We agree with the reviewer about the need for quantitative rigor in reporting HDX changes. We will calculate the fractional deuterium uptake difference for each peptide fragment discussed in the text between the inhibitor-bound and unbound states. These values, along with their statistical significance (p-values from a two-tailed t-test), will be provided in the revised figures.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have adequately addressed all of my concerns. I have no further questions or concerns.

      We thank the Reviewer #1. 

      Reviewer #2 (Recommendations for the authors):

      We thank the Reviewer #2 for thoughtful recommendations.

      (1) Figure 1A, 1B, 2B, 2C, etc.: The Y-axis label is confusing. I assume the intention was to make big numbers small by dividing by 1000. The comma makes the label confusing. Perhaps, make the label more "mathematical" as in "Avp density ((transcript/µm2) * 10-3)" or rearrange the math to be clearer as in "Avp density (transcript/1000 per µm2)".

      Great suggestion and done exactly as suggested in Figures 1, 2 and 4.

      (2) Figure 1B and 1C: The figure and legend do not match up. Either switch the figures or the legends. Currently, legend 1B describes image 1C.

      Agreed and done as suggested.

      (3) Figure 2A is broken up into separate pages/panels. It could be integrated better or separated to make A and B, then shift B and C to C and D.

      Great suggestion and we have done exactly as suggested.

      (4) Figure 2 legend: I recommend putting the scale bar info with (A) rather than at the end. The stars used in the figure are not explained in the legend.

      Good points. We have made all necessary changes as suggested.

      (5) Supplementary Figure 1B: The legend states that the data are the number of transcript-containing cells, but the figure states transcript number.

      We thank the Reviewer for pointing out this typo. We corrected all graph legends in the Supplementary Figure 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The authors use a confusing timeline for their behavioral experiments, i.e., day 1 is the first day of training in the MWM, and day 6 is the probe trial, but in reality, day 6 is the first day after the last training day. So this is really day 1 post-training, and day 20 is 14 days post-training.

      We have revised the timeline accordingly. Briefly, mice were trained in the Morris water maze (MWM) with a hidden platform for five consecutive days (training days 1–5). Probe tests were then conducted on day 6 and day 20, which correspond to post-training day 1 and post-training day 15, respectively. We clearly stated as such in the revised manuscript (see results, line 108 – 113) and figure S1 (see figure legend, line 1747 – 1749).

      (2) The authors inaccurately use memory as a term. During the training period in the MWM, the animals are learning, while memory is only probed on day 6 (after learning). Thus, day 6 reflects memory consolidation processes after learning has taken place.

      We have revised the manuscript to distinguish between "learning" and "memory". We refer to the performance during the 5-day training period as "spatial learning" and restrict the term "memory" to the probe tests on day 6, which reflect memory consolidation after learning has taken place.

      (3) The NAT10 cKO mice are useful... but all the experiments used AAV-CRE injections in the dorsal hippocampus that showed somewhat modest decreases... For these experiments, it would be better to cross the NAT10 floxed animals to CRE lines where a better knockdown of NAT10 can be achieved, with less variability.

      We want to clarify the reason for using AAV-Cre injection rather than Cre lines. Indeed, we attempted to generate Nat10 conditional knockouts by crossing Nat10<sup>flox/flox</sup> mice with several CNS-specific Cre lines. Crossing with Nestin-Cre and Emx1-Cre resulted in embryonic and premature lethality, respectively, consistent with the essential housekeeping function of NAT10 during neurodevelopment. We will use the Camk2α-Cre line which starts to express Cre after postnatal 3 weeks specifically in hippocampal pyramidal neurons (Tsien et al., 1996).

      (4) Because knockdown is only modest (~50%), it is not clear if the remaining ac4c on mRNAs is due to remaining NAT10 protein or due to an alternative writer (as the authors pose).

      Our results suggest the existence of alternative writers. As shown in Figure 6D, we identified a population of "NAT10-independent" MISA mRNAs (present in MISA but not downregulated in NASA). Remarkably, these mRNAs possess a consensus motif (RGGGCACTAACY) that is fundamentally different from the canonical NAT10 motif (AGCAGCTG). This distinct motif usage suggests that the residual ac4C signals are not merely due to incomplete knockdown of NAT10, but reflect the activity of other, as-yet-unidentified ac4C writers. We will perform ac4C immunostaining in Nat10-reporter mice which express red fluorescent proteins in Nat10-positive cells. The results that ac4C is expressed in both Nat10-positive and negative cells will support the presence of as-yet-unidentified ac4C writers.

      Reviewer #2 (Public review):

      (1) It is known that synaptosomes are contaminated with glial tissue... So the candidate mRNAs identified by acRIP-seq might also be mixed with glial mRNAs. Are the GO BP terms shown in Figure 3A specifically chosen, or unbiasedly listed for all top ones?

      This reviewer is correct that some ac4C-mRNAs identified by acRIP-seq from the synaptosomes are highly expressed in astrocytes, such as Aldh1l1, ApoE, Sox9 and Aqp4 (see list of ac4C-mRNAs in the synaptosomes, Table S3). In agreement, we found that NAT10 was also expressed in astrocyte in addition to neurons. We have provided a representative image showing NAT10-Cre expression in astrocytes in the revised manuscript (Figure 4F and H). In the figure 3A of original submission, we showed 10 out of 16 top BP items for MISA mRNAs. In the figure 3A of revised manuscript, we showed all the top 16 BP items for MISA mRNAs, which are unbiasedly chosen (also see Table S4).

      (2) Where does NAT10-mediated mRNA acetylation take place within cells generally? Is there evidence that NAT10 can catalyze mRNA acetylation in the cytoplasm?

      The previous studies from non-neuronal cells showed that NAT10 can catalyze mRNA acetylation in the cytoplasm and enhance translational efficiency (Arango et al., 2018; Arango et al., 2022). In this study, we showed that mRNA acetylation occurred both in the homogenates and synapses (see ac4C-mRNA lists in Table S2 and S3). However, spatial memory upregulated mRNA acetylation mainly in the synapses rather than in the homogenates (Fig. 2 and Fig. S2).

      (3) "The NAT10 proteins were significantly reduced in the cytoplasm (S2 fraction) but increased in the PSD fraction..." The small increase in synaptic NAT10 might not be enough to cause a decrease in soma NAT10 protein level.

      We showed that the NAT10 protein levels were increased by one-fold in the PSD fraction, but were reduced by about 50% in the cytoplasm after memory formation (Fig. 5J and K). The protein levels of NAT10 in the homogenates and nucleus were not altered after memory formation (Fig. 5F and I). Due to these facts, we hypothesized that NAT10 proteins may have a relocation from cytoplasm to synapses after memory formation, which was also supported by the immunofluorescent results from cultured neurons (Fig. S4). However, we agree with this reviewer that drawing such a conclusion may require the time-lapse imaging of NAT10 protein trafficking in living animals, which is technically challenging at this moment.

      (4) It is difficult to separate the effect on mRNA acetylation and protein mRNA acetylation when doing the loss of function of NAT10.

      This is a good point. We agree with this reviewer that NAT10 may acetylate both mRNA and proteins. We examined the acetylation levels of a-tubulin and histone H3, two substrate proteins of NAT10 in the hippocampus of Nat10 cKO mice. As shown in Fig S5C, E, and F, the acetylation levels of a-tubulin and histone H3 remained unchanged in the Nat10 cKO mice, likely due to the compensation by other protein acetyltransferases. In contrast, mRNA ac4C levels were significantly decreased in the Nat10 cKO mice (Figure S5G–H). These results suggest that the memory deficits seen in Nat10 cKO mice may be largely due to the impaired mRNA acetylation. Nonetheless, we believe that developing a new technology which enables selective erasure of mRNA acetylation would be helpful to address the function of mRNA acetylation. We discussed these points in the MS (see discussion, line 582-589).

      Reference

      Arango, D., Sturgill, D., Alhusaini, N., Dillman, A. A., Sweet, T. J., Hanson, G., Hosogane, M., Sinclair, W. R., Nanan, K. K., & Mandler, M. D. (2018). Acetylation of cytidine in mRNA promotes translation efficiency. Cell, 175(7), 1872-1886. e1824.

      Arango, D., Sturgill, D., Yang, R., Kanai, T., Bauer, P., Roy, J., Wang, Z., Hosogane, M., Schiffers, S., & Oberdoerffer, S. (2022). Direct epitranscriptomic regulation of mammalian translation initiation through N4-acetylcytidine. Molecular cell, 82(15), 2797-2814. e2711.

      Tsien, J. Z., Chen, D. F., Gerber, D., Tom, C., Mercer, E. H., Anderson, D. J., Mayford, M., Kandel, E. R., & Tonegawa, S. (1996). Subregion-and cell type–restricted gene knockout in mouse brain. Cell, 87(7), 1317-1326.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study examines the role of E2 ubiquitin enzyme, Uev1a in tissue resistance to oncogenic RasV12 in Drosophila melanogaster polyploid germline cells and human cancer cell lines. The incomplete evidence suggests that Uev1a works with the E3 ligase APC/C to degrade Cyclin A, and the strength of evidence could be increased by addressing the expression of CycA in the ovaries and the uev1a loss of function in human cancer cells. This work would be of interest to researchers in germline biology and cancer.

      Thank you for your valuable assessment. The requested data on CycA expression (Figure 4E-G) and uev1a loss-of-function in human cancer cells (Figure 8 and Figure 8-figure supplement 2) have been added to the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study uncovers a protective role of the ubiquitin-conjugating enzyme variant Uev1A in mitigating cell death caused by over-expressed oncogenic Ras in polyploid Drosophila nurse cells and by RasK12 in diploid human tumor cell lines. The authors previously showed that overexpression of oncogenic Ras induces death in nurse cells, and now they perform a deficiency screen for modifiers. They identified Uev1A as a suppressor of this Ras-induced cell death. Using genetics and biochemistry, the authors found that Uev1A collaborates with the APC/C E3 ubiquitin ligase complex to promote proteasomal degradation of Cyclin A. This function of Uev1A appears to extend to diploid cells, where its human homologs UBE2V1 and UBE2V2 suppress oncogenic Ras-dependent phenotypes in human colorectal cancer cells in vitro and in xenografts in mice.

      Strengths:

      (1) Most of the data is supported by a sufficient sample size and appropriate statistics.

      (2) Good mix of genetics and biochemistry.

      (3) Generation of new transgenes and Drosophila alleles that will be beneficial for the community.

      We greatly appreciate your comments.

      Weaknesses:

      (1) Phenotypes are based on artificial overexpression. It is not clear whether these results are relevant to normal physiology.

      Downregulation of Uev1A, Ben, and Cdc27 together significantly increased the incidence of dying nurse cells in normal ovaries (Figure 5-figure supplement 2), indicating that the mechanism we uncovered also protects nurse cells from death during normal oogenesis.

      (2) The phenotype of "degenerating ovaries" is very broad, and the study is not focused on phenotypes at the cellular level. Furthermore, no information is provided in the Materials and Methods on how degenerating ovaries are scored, despite this being the most important assay in the study.

      Thank you for pointing out this issue. We quantified the phenotype of nurse cell death using “degrading/total egg chambers per ovary”, not “degenerating ovaries”. Normal nurse cell nuclei exhibit a large, round morphology in DAPI staining (see the first panel in Figure 1D). During early death, they become disorganized and begin to condense and fragment (see the second panel in Figure 1D). In late-stage death, they are completely fragmented into small, spherical structures (see the third panel in Figure 1D), making cellular-level phenotypic quantification impossible. Since all nurse cells within the same egg chamber are interconnected, their death process is synchronous. Thus, quantifying the phenotype at the egg-chamber level is more practical than at the cellular level. We have added the description of this death phenotype and its quantification to the main text (Lines 104-108).

      (3) In Figure 5, the authors want to conclude that uev1a is a tumor-suppressor, and so they over-express ubev1/2 in human cancer cell lines that have RasK12 and find reduced proliferation, colony formation, and xenograft size. However, genes that act as tumor suppressors have loss-of-function phenotypes that allow for increased cell division. The Drosophila uev1a mutant is viable and fertile, suggesting that it is not a tumor suppressor in flies. Additionally, they do not deplete human ubev1/2 from human cancer cell lines and assess whether this increases cell division, colony formation, and xenograph growth.

      We apologize for any misleading description. We aimed to demonstrate that UBE2V1/2, like Uev1A in Drosophilanos>Ras<sup>G12V</sup>+bam-RNAi” germline tumors, suppress oncogenic KRAS-driven overgrowth in diploid human cancer cells. Importantly, this function of Uev1A and UBE2V1/2 is dependent on Ras-driven tumors; there is no evidence that they act as broad tumor suppressors in the absence of oncogenic Ras. Drosophila uev1a mutants were lethal, not viable (see Lines 135-137), and germline-specific knockdown of uev1a (nos>uev1a-RNAi) caused female sterility without inducing tumors. These findings suggest that Uev1A lacks tumor-suppressive activity in the Drosophila female germline in the absence of Ras-driven tumors. We have revised the manuscript to prevent misinterpretation. Furthermore, we have added data demonstrating that the combined knockdown of UBE2V1 and UBE2V2 significantly promotes the growth of KRAS-mutant human cancer cells, as suggested (Figure 8 and Figure 8-figure supplement 2).

      (4) A critical part of the model does not make sense. CycA is a key part of their model, but they do not show CycA protein expression in WT egg chambers or in their over-expression models (nos.RasV12 or bam>RasV12). Based on Lilly and Spradling 1996, Cyclin A is not expressed in germ cells in region 2-3 of the germarium; whether CycA is expressed in nurse cells in later egg chambers is not shown but is critical to document comprehensively.

      We appreciate your critical comment. CycA is a key cyclin that partners with Cdk1 to promote cell division (Edgar and Lehner, 1996). Notably, nurse cells are post-mitotic endocycling cells (Hammond and Laird, 1985) and typically do not express CycA (Lilly and Spradling, 1996) (see the last sentence, page 2518, paragraph 3 in this 1996 paper). However, their death induced by oncogenic Ras<sup>G12V</sup> is significantly suppressed by monoallelic deletion of either cycA or cdk1 (Zhang et al., 2024). Conversely, ectopic CycA expression in nurse cells triggers their death (Figure 4C, D). These findings suggest that polyploid nurse cells exhibit high sensitivity to aberrant division-promoting stress, which may represent a distinct form of cellular stress unique to polyploid cells. In the revised manuscript, we have provided the CycA-staining data, comparing its expression in normal nurse cells versus cells undergoing oncogenic Ras<sup>G12V</sup>-induced death (Figure 4E-G).

      (5) The authors should provide more information about the knowledge base of uev1a and its homologs in the introduction.

      Thank you for your suggestion. In the revised introduction, we have provided a more detailed description of Uev1A (Lines 72-79). Additionally, we have introduced its human homologs, UBE2V1 and UBE2V2, in the main text (Lines 143-145).

      Reviewer #2 (Public review):

      Summary:

      The authors performed a genetic screen using deficiency lines and identified Uev1a as a factor that protects nurse cells from RasG12V-induced cell death. According to a previous study from the same lab, this cell death is caused by aberrant mitotic stress due to CycA upregulation (Zhang et al.). This paper further reveals that Uev1a forms a complex with APC/C to promote proteasome-mediated degradation of CycA.

      In addition to polyploid nurse cells, the authors also examined the effect of RasG12V-overexpression in diploid germline cells, where RasG12V-overexpression triggers active proliferation, not cell death. Uev1a was found to suppress its overgrowth as well.

      Finally, the authors show that the overexpression of the human homologs, UBE2V1 and UBE2V2, suppresses tumor growth in human colorectal cancer xenografts and cell lines. Notably, the expression of these genes correlates with the survival of colorectal cancer patients carrying the Ras mutation.

      Strength:

      This paper presents a significant finding that UBE2V1/2 may serve as a potential therapy for cancers harboring Ras mutations. The authors propose a fascinating mechanism in which Uev1a forms a complex with APC/C to inhibit aberrant cell cycle progression.

      We greatly appreciate your comments.

      Weakness:

      The quantification of some crucial experiments lacks sufficient clarity.

      Thank you for highlighting this issue. We have provided more details regarding the quantification data in the revised manuscript.

      References

      Edgar, B.A., and Lehner, C.F. (1996). Developmental control of cell cycle regulators: a fly's perspective. Science 274, 1646-1652.

      Hammond, M.P., and Laird, C.D. (1985). Chromosome structure and DNA replication in nurse and follicle cells of Drosophila melanogaster. Chromosoma 91, 267-278.

      Lilly, M.A., and Spradling, A.C. (1996). The Drosophila endocycle is controlled by Cyclin E and lacks a checkpoint ensuring S-phase completion. Genes Dev 10, 2514-2526.

      Zhang, Q., Wang, Y., Bu, Z., Zhang, Y., Zhang, Q., Li, L., Yan, L., Wang, Y., and Zhao, S. (2024). Ras promotes germline stem cell division in Drosophila ovaries. Stem Cell Reports 19, 1205-1216.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The figure legends insufficiently describe the figures. One example is Figure 3, where there are no details in the figure legend about what conditions apply to each panel and each lane of the gels.

      For clarity and brevity, detailed experimental conditions are described in the Materials and Methods section. Figure legends therefore focus on summarizing the key findings. Thank you for your understanding!

      (2) The font size on the figure is too small.

      Thank you for your constructive suggestion. In response, we have enlarged all font sizes to improve readability.

      (3) There are places where the authors overstate their results, and there are issues with the clarity of the text:

      (3a) Lines 170: "excessive" is not appropriate. Their prior study showed a mild increase in proliferation.

      “Excessive” has been removed in the revised manuscript (Lines 215-216).

      (3b) Line 187-8: The authors should restate this sentence. Here's a possibility. Over-expression of Uev1a suppressed the phenotypes caused by CycA over-expression.

      This sentence has been restated as “Notably, this cell death was suppressed by co-overexpression of CycA and Uev1A, indicating a genetic interaction between them”. (Lines 229-231).

      (3c) Lines 266-7: The properties of Uev1a (ie, lacking a conserved Cys) should be in the introduction.

      This information has been added to the revised introduction (Lines 74-76).

      (3d) Line 318: "markedly" is an overstatement of the prior results.

      Our quantification data revealed that “nos>Ras<sup>G12V</sup>; bam<sup>-/-</sup>” ovaries are three times larger than “nos>GFP; bam<sup>-/-</sup>” control ovaries (see Figure 4A-C in Zhang et al., Stem Cell Reports 19, 1205-1216). Given this substantial difference, we think that using "markedly" is not an overstatement.

      (4) Data not shown occurs in a few places in the text. Given the ability to supply supplemental information in eLife preprints, these data should be shown.

      Thanks for your suggestion. All “not shown” data have been added to the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Major Comments

      (1) Cyclin A (CycA) is a key player in this study, but the authors do not provide evidence showing the upregulation of CycA following Ras overexpression in either polyploid or diploid cells. Data on CycA expression should be included.

      Thank you for your constructive suggestion. These data have been added to the revised manuscript (Figure 4E-G).

      (2) DNA replication stress, cellular senescence, and cell death should be assessed under Ras overexpression (RasOE) and RasOE + Uev1A RNAi conditions to support the model proposed in Figure 4F.

      We apologize for any confusion caused by our initial model. We do not have evidence that DNA replication stress and cellular senescence occur under these conditions. Cell death can be readily detected through the presence of fragmented nuclei and condensed DNA (see Figure 1D). The model has been updated accordingly (Figure 9E).

      (3) Appropriate controls should be performed alongside the experimental sets. The same nos>Ras+GFPi data set was repeatedly used in Figures 1I, 2B, 2H, and Figures 2, S2B, which is not ideal.

      All these experiments were performed under identical conditions. Therefore, we deem it appropriate to use the same control data across these analyses.

      (4) Overall, the microscopic images are too small and hard to see.

      Thank you for raising this important point. In the revised manuscript, all images and the font size on figures have been enlarged for improved clarity.

      (5) Figure 1H

      Why is the frequency of egg chamber degradation quite less in nos>RasG12V+GFP-RNAi (about 40%) than nos > RasG12V (about 80%)? And the authors do not show that there is a significant difference between those two conditions, although it should be there. We will need the explanation from the authors on why there is a difference here.

      These overexpression experiments were conducted using the GAL4/UAS system. While both “nos>Ras<sup>G12V</sup>+GFP-RNAi” and “nos>Ras<sup>G12V</sup>” contain a single nos-GAL4 driver, they differ in UAS copy number: the former incorporates two UAS elements compared to only one in the latter (see the detailed genotypes in Source data 2). These results demonstrate that UAS copy number impacts experimental outcomes in our system.

      In the previous paper (Zhang et al. (2024), Figure 7H shows that the frequency of egg chambers in nos>RasG12V is 33%, although this paper shows it as about 80%. There seems to be a difference in flies' age (previous paper: 7d, this paper: 3d), but this data raises the question of why nos>RasG12V shows more egg chamber degradation this time.

      We greatly appreciate your careful observation. The nurse-cell-death phenotype exhibits a spectrum from mild to severe manifestations [see Figure 1D and our response to weekness (2) in Reviewer #1’s public reviews]. While our 2024 paper exclusively quantified egg chambers with severe phenotypes as degrading, the current study included both mild and severe cases in this classification. We do not think fly age could account for this substantial phenotypic difference. A detailed description of the nurse-cell-death phenotype and its quantification have been added to the revised manuscript (Lines 104-108).

      In the following experiments, only nos>RasG12V+GFP-RNAi is used as a control (Figures 2B, H, S2B). I wonder if these results would give us a different conclusion if nos>RasG12V were used as a control.

      As explained above, the UAS copy number does matter in our analyses, so it is important to keep them identical for comparison.

      (6) In the abstract, the authors mention that uev1a is an intrinsic factor to protect cells from RasG12V-induced cell death. RasG12V does not induce much cell death of cystocytes with bam-gal4, whereas it induces a lot of nurse cells' death. Does it mean the intrinsic expression level of uev1a is low in nurse cells (or polyploid cells) compared to cystocytes (or diploid cells)?

      Overexpression of Ras<sup>G12V</sup> driven by bam-GAL4 exhibited only minimal nurse cell death (Figure 1D, E). Additionally, Uev1A exhibited low intrinsic expression levels in both cystocytes and nurse cells (Figure 3E and Figure 5-figure supplement 1).

      (7) Is uev1a-RNAi alone sufficient to induce egg chamber degradation? Or does it have any effect on ovarian development? (Related to question #1 in minor comments)

      While nos>uev1a-RNAi resulted in female sterility, it alone was insufficient to induce egg chamber degradation. However, simultaneous downregulation of Uev1A, Ben, and Cdc27 triggered significant egg chamber degradation (Figure 5-figure supplement 2).

      (8) Which stages of egg chambers get degraded with RasG12V induction?

      This is a good question. In our analyses, we noted that degrading egg chambers exhibited considerable size variability (Figure 1D). Because degradation disrupts normal morphological cues, precise staging of these egg chambers is nearly impossible.

      (9) I suggest testing the cellular senescence marker as well if the authors mention that CycA-degradation by Uev1a-APC/C complex prevents cellular senescence induced by RasG12V in a schematic image of Figure 4 (e.g., Dap/p21, SA-β-gal).

      As addressed in our response to your Major Comment (2), we lacked experimental evidence to support cellular senescence in this context. We have therefore revised the model accordingly (Figure 9E). While this study focuses specifically on cell death, investigating potential roles of cellular senescence remains an important direction for future research. Thank you for your suggestion!

      Minor Comments

      (1) Figure 1D: Df#7584

      It seems that the late-stage egg chamber is missing in this condition. Why does this occur without egg chamber degradation? Is there a possibility that we do not see egg chamber degradation because this deficiency line does not have a properly developed egg chamber that can have a degradation?

      While this image represents only a single sample, we have confirmed the presence of late-stage egg chambers in other samples. If “Df#7584/+” females were unable to support late-stage egg chamber development, complete sterility would be expected due to the lack of mature eggs. However, as shown in this image (Figure 1D), the ovary contains mature eggs, and the “Df#7584/+” fly strain remains fertile.

      (2) Based on the results that DDR signaling functions as keeping egg chambers from degradation, the authors may be better to check the DNA-damage markers in nos>RasG12V, nos>RasG12V +uev1a. (e.g. γ-H2AX)

      Thank you for your constructive recommendation. These data have been added to the revised manuscript (Figure 3C).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Points to be addressed:

      (1) As a statistical test, the authors report having used unpaired t-tests; however, often three groups are compared for which t-tests are inadequate. This is faulty as, amongst other things, it does not take multiple comparison testing into account.

      We have adopted the reviewers' suggestions and conducted a variance analysis (ANOVA) to reanalyze the experimental results with three or more different condition groups. At the same time, we have retained the t-test results for experiments with only two condition groups.

      (2) Both B-Actin and GAPDH seem to have been used for protein-level normalization. Why? The Figure 2HL first panel reports B-actin, whereas the other three report GAPDH. The same applies to Figures 3E-F, where both are shown, and it is not mentioned which of the two has been used. Moreso, uncropped blots seem to be unavailable as supplementary data for proper review. These should be provided as supplementary data.

      In Figures 2G and 3E-F, β-actin and GAPDH both have been used for protein level normalization. The main issue is the mixed use of these two housekeeping proteins, without taking consistency into account in advance. In addition, the expression levels of these two proteins show no significant differences in response to different fluid shear stresses. The uncropped blot images have been organized and provided in the supplementary data.

      (3) LSS and MSS were compared based on transcriptomic analysis. Conversely, RNA sequencing was not reported for the HSS. Why is this data missing? It would be valuable to assess transcriptomics following HSS, and also to allow transcriptomic comparison of LSS and HSS.

      In the current study, we have only conducted the transcriptomic comparative analysis between LSS and MSS conditions, mainly considering that most of current researches focuses on the endothelial dysfunction and atherosclerosis under LSS. Since our HSS condition is overall about 24 dyn/cm<sup>2</sup>, which is also recognized within the normal physiological range in some reports. Moreover, the transcriptomic data are primarily used to identify the targets in our study. Interestingly, for these selected genes, they share the same trend involved in endothelial cell ferroptosis induced by LSS and HSS. At the same time, we strongly agree with the reviewer’s claim that the RNA sequencing results under HSS are also valuable. Therefore, in the future, we are planning to perform the transcriptomic sequencing analysis under the HSS or higher level of shear stress, aiming to discover new insights.

      (4) Actual sample sizes should be reported rather than "three or more". Moreso, it would be beneficial to show individual datapoints in bar graphs rather than only mean with SD if sample sizes are below 10 (e.g., Figures 1B-H, Figure 2G, etc.).

      After rechecking our original data, All analyzed results were from three biological replicates, so they are uniformly marked as 'n=3' in the article. According to the reviewer's suggestion, the position of each data point has been added in the chart of the statistical results along with the standard deviation bars.

      (5) The authors claim that by modifying the thickness of the middle layer, shear stress could be modified, whilst claiming to keep on-site pressure within physiological ranges (approx. 70 mmHg) as a hallmark of their microfluidic devices. Has it been experimentally verified that pressures indeed remain around 70 mmHg.

      It is a very interesting question. In this article, the cross-sectional areas of different tunnel-like channel is related to the thickness of the middle layer, resulting in different level of shear stress. Since all flow rates under three conditions keep same at 1.6 ml/min, the average pressure is calculated to be around 70 mmHg based on our previously reported formula (PMID: 37662690). To address the reviewer's question about the actual pressure values, we used a water-filled tube connected to a chip and measured the height of the water surface in the elevated end relative to the chip position, as shown in the Author response image 1. As expected, when the height of the middle layer bulging to the same value (0.7 mm) as under the LSS condition, the water level reaches to 900 mm, which is corresponding to about 70 mmHg.

      Author response image 1.

      Schematic diagram of on-chip pressure detection

      (6) A coculture model (VSMC, EC, monocytes) is mentioned in the last part of the results section without any further information. Information on this model should be provided in the methods section (seeding, cell numbers, etc.). Moreover, comparison of LSS vs LSS+KLF6 OE and HSS vs HSS+KLF6 OE is shown. It would benefit the interpretation of the outcomes if MSS were also shown. It would also be beneficial to demonstrate differences between LSS, MSS, and HSS in this coculture model (without KLF6 OE).

      The specific methods for constructing the co-culture models (vascular smooth muscle cells, endothelial cells, monocytes) mentioned in the results section have been introduced in our previous paper. For the convenience for reading this article, we have added a brief description in the section of “Methods and materials” in this paper, including cell seeding and numbers. In this study, the results of LSS vs LSS+KLF6 OE and HSS vs HSS+KLF6 OE are presented to verify the role of KLF6 in LSS- or HSS-induced promotion of early atherosclerotic events. In our previously published paper (PMID: 37662690), we have showed the effects of three different shear stresses on the atherosclerotic events (shown in Fig. 4 in that paper). Those results have demonstrated that both LSS and HSS significantly promote early atherosclerotic events compared with the MSS.

      (7) The experiments were solely performed with a venous endothelial cell line (HUVECs). Was the use of an arterial endothelial cell line considered? It may translate better towards atherosclerosis, which occurs within arteries. HUVECs are not accustomed to the claimed near-physiological pressures.

      The human umbilical vein endothelial cell (HUVEC) is a commonly used cell line for many in vitro studies of vascular endothelium under fluid shear stress conditions. Although human arterial endothelial cells (HAECs) may be more suitable than HUVECs for responding to physiologically relevant pressure, HUVECs are more easy to obtain and maintain. However, we are going to order HAECs and will use them to validate the conclusion for the potential translatability.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Information on seeding of the microfluidic device is absent in the methods section (i.e., seeding, cell density, passage number, confluence, etc.). Moreso, treatment with Fer-1 is not reported in the methods section.

      We have described the cell seeding information in‘Preparation of cell culture in the microfluidic chip’ and the Fer-1 treatment in ‘Cell death assay’ in the Method section.

      (2) Figure 3F has "MSS", "HSS", and "LSS+KLF6" as groups on the x-axis; the latter should probably be "HSS+KLF6".

      Thank you for pointing out this error in Figure 3F. We have made the correction.

      (3) Data should be made available in online repositories rather than "making it available upon reasonable request". As it was not provided, the sequencing data could not be reviewed. In addition, it was stated that a preprint was available on BioRxiv, but I could not find it.

      Thank you for the suggestion. We have uploaded the RNA-seq data to the NCBI GEO database, which was publicly available on December 9, 2025.