10,000 Matching Annotations
  1. Last 7 days
    1. eLife Assessment

      This Review Article explores the intricate relationship between humans and Mycobacterium tuberculosis (Mtb), providing an additional perspective on TB disease. Specifically, this review focuses on the utilization of systems-level approaches to study TB, while highlighting challenges in the frameworks used to identify the relevant immunologic signals that may explain the clinical spectrum of disease. The work could be further enhanced by better defining key terms that anchor the review, such as "unified mechanism" and "immunological route." This review will be of interest to immunologists as well as those interested in evolution and host-pathogen interactions.

    2. Reviewer #1 (Public review):

      Summary:

      This is an interesting and useful review highlighting the complex pathways through which pulmonary colonisation or infection with Mycobacterium tuberculosis (Mtb) may progress to develop symptomatic disease and transmit the pathogen. I found the section on immune correlates associated with individuals who have clearly been exposed to and reacted to Mtb but did not develop latent infections particularly valuable. However, several aspects would benefit from clarification.

      Strengths:

      The main strengths lie in the arguments presented for a multiplicity of immune pathways to TB disease.

      Weaknesses:

      The main weaknesses lie in clarity, particularly in the precise meanings of the three figures.

      I accept that there is a 'goldilocks zone' that underpins the majority of TB cases we see and predominantly reflects different patterns of immune response, but the analogies used need to be more clearly thought through.

    3. Reviewer #2 (Public review):

      Summary:

      This is a thought-provoking perspective by Reichmann et al, outlining supportive evidence that Mycobacterium tuberculosis co-evolved with its host Homo Sapiens to both increase susceptibility to infection and reduce rates of fatal disease through decreased virulence. TB is an ancient disease where two modes of virulence are likely to have evolved through different stages of human evolution: one before the Neolithic Demographic Transition, where humans lived in sparse hunter-gatherer communities, which likely selected for prolonged Mtb infection with reduced virulence to allow for transmission across sparse populations. Conversely, following the agricultural and industrial revolutions, Mtb virulence is likely to have evolved to attack a higher number of susceptible individuals. These different disease modalities highlight the central idea that there are different immunological routes to TB disease, which converge on a disease phenotype characterized by high bacterial load and destruction of the extracellular matrix. The writing is very clear and provides a lot of supportive evidence from population studies and the recent clinical trials of novel TB vaccines, like M72 and H56. However, there are areas to support the thesis that have been described only in broad strokes, including the impact of host and Mtb genetic heterogeneity on this selection, and the alternative model that there are likely different TB diseases (as opposed to different routes to the same disease), as described by several groups advancing the concept of heterogeneous TB endotypes. I expand on specific points below.

      Strengths:

      (1) The idea that Mtb evolved to both increase transmission (and possible commensalism with humans) with low rates of reactivation is intriguing. The heterogeneous TB phenotypes in the collaborative cross model (PMID: 35112666) support this idea, where some genetic backgrounds can tolerate a high bacterial load with minimal pathology, while others show signs of pathogenesis with low bacterial loads. This supports the idea that the underlying host state, driven by a number of factors like genetics and nutrition, is likely to explain whether someone will co-exist with Mtb without pathology, or progress to disease. I particularly enjoyed the discussion of the protective advantages provided by Mtb infection, which may have rewired the human immune system to provide protection against heterologous pathogens- this is supported by recent studies showing that Mtb infection provides moderate protection against SARS-CoV-2 (PMID: 35325013, and 37720210), and may have applied to other viruses that are likely to have played a more significant role in the past in the natural selection of Homo Sapiens.

      (2) Modeling from Marcel Behr and colleagues (PMID: 31649096) indeed suggests that there are at least TB clinical phenotypes that likely mirror the two distinct phases of Mtb co-evolution with humans. Most of the TB disease progression occurs rapidly (within 1-2 years of exposure), and the rest are slow cases of reactivation over time. I enjoyed the discussion of the difference between the types of immune hits needed to progress to disease in the two scenarios, where you may need severe immune hits for rapid progression, a phenotype that likely evolved after the Neolithic transition to larger human populations. On the other hand, a series of milder immune events leading to reactivation after a long period of asymptomatic infection likely mirrors slow progression in the hunter-gatherer communities, to allow for prolonged transmission in scarce populations. Perhaps a clearer analysis of these models would be helpful for the reader.

      Weaknesses:

      (1) The discussion of genetic heterogeneity is limited and only discusses evidence from MSMD studies. Genetics is an important angle to consider in the co-evolution of Mtb and humans. There is a large body of literature on both host and Mtb genetic associations with TB disease. The very fact that host variants in one population do not necessarily cross-validate across populations is evidence in support of population-specific adaptations. Specific Mtb lineages are likely to have co-evolved with distinct human populations. A key reference is missing (PMID: 23995134), which shows that different lineages co-evolved with human migrations. Also, meta-analyses of human GWAS studies to define variants associated with TB are very relevant to the topic of co-evolution (e.g., PMID: 38224499). eQTL studies can also highlight genetic variants associated with regulating key immune genes involved in the response to TB. The authors do mention that Mtb itself is relatively clonal with ~2K SNPs marking Mtb variation, much of which has likely evolved under the selection pressure of modern antibiotics. However, some of this limited universe of variants can still explain co-adaptations between distinct Mtb lineages and different human populations, as shown recently in the co-evolution of lineage 2 with a variant common in Peruvians (PMID: 39613754).

      (2) Although the examples of anti-TNF and anti-PD1 treatments are relevant as drivers of TB in limited clinical contexts, the bigger picture is that they highlight major distinct disease endotypes. These restricted examples show that TB can be driven by immune deficiency (as in the case of anti-TNF, HIV, and malnutrition) or hyperactivation (as in the case of anti-PD1 treatment), but there are still certainly many other routes leading to immune suppression or hyperactivation. Considering the idea of hyper-activation as a TB driver, the apparent higher rate of recurrence in the H56 trial referenced in the review is likely due to immune hyperactivation, especially in the context of residual bacteria in the lung. These different TB manifestations (immune suppression vs immune hyperactivation) mirror TB endotypes described by DiNardo et al (PMID: 35169026) from analysis of extensive transcriptomic data, which indicate that it's not merely different routes leading to the same final endpoint of clinical disease, but rather multiple different disease endpoints. A similar scenario is shown in the transcriptomic signatures underlying disease progression in BCG-vaccinated infants, where two distinct clusters mirrored the hyperactivation and immune suppression phenotypes (PMID: 27183822). A discussion of how to think about translating the extensive information from system biology into treatment stratification approaches, or adjunct host-directed therapies, would be helpful.

    4. Reviewer #3 (Public review):

      Summary:

      This perspective article by Reichmann et al. highlights the importance of moving beyond the search for a single, unified immune mechanism to explain host-Mtb interactions. Drawing from studies in immune profiling, host and bacterial genetics, the authors emphasize inconsistencies in the literature and argue for broader, more integrative models. Overall, the article is thought-provoking and well-articulated, raising a concept that is worth further exploration in the TB field.

      Strengths:

      Timely and relevant in the context of the rapidly expanding multi-omics datasets that provide unprecedented insights into host-Mtb interactions.

      Weaknesses (Minor):

      (1) Clarity on the notion of a "unified mechanism". It remains unclear whether prior studies explicitly proposed a single unifying immunological model. While inconsistencies in findings exist, they do not necessarily demonstrate that earlier work was uniformly "single-minded". Moreover, heterogeneity in TB has been recognized previously (PMIDs: 19855401, 28736436), which the authors could acknowledge.

      (2) Evolutionary timeline and industrial-era framing. The evolutionary model is outdated. Ancient DNA studies place the Mtb's most recent common ancestor at ~6,000 years BP (PMIDs: 25141181; 25848958). The Industrial Revolution is cited as a driver of TB expansion, but this remains speculative without bacterial-genomics evidence and should be framed as a hypothesis. Additionally, the claim that Mtb genomes have been conserved only since the Industrial Revolution (lines 165-167) is inaccurate; conservation extends back to the MRCA (PMID: 31448322).

      (3) Trained immunity and TB infection. The treatment of trained immunity is incomplete. While BCG vaccination is known to induce trained immunity (ref 59), revaccination does not provide sustained protection (ref 8), and importantly, Mtb infection itself can also impart trained immunity (PMID: 33125891). Including these nuances would strengthen the discussion.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This Review Article explores the intricate relationship between humans and Mycobacterium tuberculosis (Mtb), providing an additional perspective on TB disease. Specifically, this review focuses on the utilization of systems-level approaches to study TB, while highlighting challenges in the frameworks used to identify the relevant immunologic signals that may explain the clinical spectrum of disease. The work could be further enhanced by better defining key terms that anchor the review, such as "unified mechanism" and "immunological route." This review will be of interest to immunologists as well as those interested in evolution and host-pathogen interactions.

      We thank the editors for reviewing our article and for the primarily positive comments. We accept that better definition and terminology will improve the clarity of the message, and so have changed the wording as suggested above in the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is an interesting and useful review highlighting the complex pathways through which pulmonary colonisation or infection with Mycobacterium tuberculosis (Mtb) may progress to develop symptomatic disease and transmit the pathogen. I found the section on immune correlates associated with individuals who have clearly been exposed to and reacted to Mtb but did not develop latent infections particularly valuable. However, several aspects would benefit from clarification.

      Strengths:

      The main strengths lie in the arguments presented for a multiplicity of immune pathways to TB disease.

      Weaknesses:

      The main weaknesses lie in clarity, particularly in the precise meanings of the three figures.

      We accept this point, and have completely changed figure 2, and have expanded the legends for figure 1 and 3 to maximise clarity.

      I accept that there is a 'goldilocks zone' that underpins the majority of TB cases we see and predominantly reflects different patterns of immune response, but the analogies used need to be more clearly thought through.

      We are glad the reviewer agrees with the fundamental argument of different patterns of immunity, and have revised the manuscript throughout where we feel the analogies could be clarified.

      Reviewer #2 (Public review):

      Summary:

      This is a thought-provoking perspective by Reichmann et al, outlining supportive evidence that Mycobacterium tuberculosis co-evolved with its host Homo Sapiens to both increase susceptibility to infection and reduce rates of fatal disease through decreased virulence. TB is an ancient disease where two modes of virulence are likely to have evolved through different stages of human evolution: one before the Neolithic Demographic Transition, where humans lived in sparse hunter-gatherer communities, which likely selected for prolonged Mtb infection with reduced virulence to allow for transmission across sparse populations. Conversely, following the agricultural and industrial revolutions, Mtb virulence is likely to have evolved to attack a higher number of susceptible individuals. These different disease modalities highlight the central idea that there are different immunological routes to TB disease, which converge on a disease phenotype characterized by high bacterial load and destruction of the extracellular matrix. The writing is very clear and provides a lot of supportive evidence from population studies and the recent clinical trials of novel TB vaccines, like M72 and H56. However, there are areas to support the thesis that have been described only in broad strokes, including the impact of host and Mtb genetic heterogeneity on this selection, and the alternative model that there are likely different TB diseases (as opposed to different routes to the same disease), as described by several groups advancing the concept of heterogeneous TB endotypes. I expand on specific points below.

      Strengths:

      The idea that Mtb evolved to both increase transmission (and possible commensalism with humans) with low rates of reactivation is intriguing. The heterogeneous TB phenotypes in the collaborative cross model (PMID: 35112666) support this idea, where some genetic backgrounds can tolerate a high bacterial load with minimal pathology, while others show signs of pathogenesis with low bacterial loads. This supports the idea that the underlying host state, driven by a number of factors like genetics and nutrition, is likely to explain whether someone will co-exist with Mtb without pathology, or progress to disease. I particularly enjoyed the discussion of the protective advantages provided by Mtb infection, which may have rewired the human immune system to provide protection against heterologous pathogens- this is supported by recent studies showing that Mtb infection provides moderate protection against SARS-CoV-2 (PMID: 35325013, and 37720210), and may have applied to other viruses that are likely to have played a more significant role in the past in the natural selection of Homo Sapiens.

      We thank the reviewer for their positive comments, and also for pointing out work that we have overlooked citing previously. We now discuss and cite the work above as suggested

      Modeling from Marcel Behr and colleagues (PMID: 31649096) indeed suggests that there are at least TB clinical phenotypes that likely mirror the two distinct phases of Mtb co-evolution with humans. Most of the TB disease progression occurs rapidly (within 1-2 years of exposure), and the rest are slow cases of reactivation over time. I enjoyed the discussion of the difference between the types of immune hits needed to progress to disease in the two scenarios, where you may need severe immune hits for rapid progression, a phenotype that likely evolved after the Neolithic transition to larger human populations. On the other hand, a series of milder immune events leading to reactivation after a long period of asymptomatic infection likely mirrors slow progression in the hunter-gatherer communities, to allow for prolonged transmission in scarce populations. Perhaps a clearer analysis of these models would be helpful for the reader.

      We agree that we did not present these concepts in as much detail as we should, and so we now discuss this more on lines 81 – 83 and 184 - 187)

      Weaknesses:

      The discussion of genetic heterogeneity is limited and only discusses evidence from MSMD studies. Genetics is an important angle to consider in the co-evolution of Mtb and humans. There is a large body of literature on both host and Mtb genetic associations with TB disease. The very fact that host variants in one population do not necessarily cross-validate across populations is evidence in support of population-specific adaptations. Specific Mtb lineages are likely to have co-evolved with distinct human populations. A key reference is missing (PMID: 23995134), which shows that different lineages co-evolved with human migrations. Also, meta-analyses of human GWAS studies to define variants associated with TB are very relevant to the topic of co-evolution (e.g., PMID: 38224499). eQTL studies can also highlight genetic variants associated with regulating key immune genes involved in the response to TB. The authors do mention that Mtb itself is relatively clonal with ~2K SNPs marking Mtb variation, much of which has likely evolved under the selection pressure of modern antibiotics. However, some of this limited universe of variants can still explain co-adaptations between distinct Mtb lineages and different human populations, as shown recently in the co-evolution of lineage 2 with a variant common in Peruvians (PMID: 39613754).

      We thank the reviewer for these comments and agree we failed to cite and discuss the work from Sebastian Gagneux’s group on co-migration, which we now discuss. We include a new paragraph discussing co-evolution as suggested on lines 145 – 155 and 218 -220 , citing the work proposed, which we agree enhances the arguments about co-evolution.

      Although the examples of anti-TNF and anti-PD1 treatments are relevant as drivers of TB in limited clinical contexts, the bigger picture is that they highlight major distinct disease endotypes. These restricted examples show that TB can be driven by immune deficiency (as in the case of anti-TNF, HIV, and malnutrition) or hyperactivation (as in the case of anti-PD1 treatment), but there are still certainly many other routes leading to immune suppression or hyperactivation. Considering the idea of hyper-activation as a TB driver, the apparent higher rate of recurrence in the H56 trial referenced in the review is likely due to immune hyperactivation, especially in the context of residual bacteria in the lung. These different TB manifestations (immune suppression vs immune hyperactivation) mirror TB endotypes described by DiNardo et al (PMID: 35169026) from analysis of extensive transcriptomic data, which indicate that it's not merely different routes leading to the same final endpoint of clinical disease, but rather multiple different disease endpoints. A similar scenario is shown in the transcriptomic signatures underlying disease progression in BCG-vaccinated infants, where two distinct clusters mirrored the hyperactivation and immune suppression phenotypes (PMID: 27183822). A discussion of how to think about translating the extensive information from system biology into treatment stratification approaches, or adjunct host-directed therapies, would be helpful.

      We agree with the points made and that the two publications above further enhance the paper. We have added discussion of the different disease endpoints on line 65 - 67, the evidence regarding immune herpeactivation versus suppression in the vaccination study on lines 162 - 164, and expanded on the translational implications on lines 349 – 352.

      Reviewer #3 (Public review):

      Summary:

      This perspective article by Reichmann et al. highlights the importance of moving beyond the search for a single, unified immune mechanism to explain host-Mtb interactions. Drawing from studies in immune profiling, host and bacterial genetics, the authors emphasize inconsistencies in the literature and argue for broader, more integrative models. Overall, the article is thought-provoking and well-articulated, raising a concept that is worth further exploration in the TB field.

      Strengths:

      Timely and relevant in the context of the rapidly expanding multi-omics datasets that provide unprecedented insights into host-Mtb interactions.

      Weaknesses (Minor):

      Clarity on the notion of a "unified mechanism". It remains unclear whether prior studies explicitly proposed a single unifying immunological model. While inconsistencies in findings exist, they do not necessarily demonstrate that earlier work was uniformly "single-minded". Moreover, heterogeneity in TB has been recognized previously (PMIDs: 19855401, 28736436), which the authors could acknowledge.

      We accept this point and have toned down the language, acknowledging that we are expanding on an argument that others have made, whilst focusing on the implications for the systems immunology era, and cite the previous work as suggested.

      Evolutionary timeline and industrial-era framing. The evolutionary model is outdated. Ancient DNA studies place the Mtb's most recent common ancestor at ~6,000 years BP (PMIDs: 25141181; 25848958). The Industrial Revolution is cited as a driver of TB expansion, but this remains speculative without bacterial-genomics evidence and should be framed as a hypothesis. Additionally, the claim that Mtb genomes have been conserved only since the Industrial Revolution (lines 165-167) is inaccurate; conservation extends back to the MRCA (PMID: 31448322).

      Our understanding is that the evolutionary timeline is not fully resolved, with conflicting evidence proposing different dates. The ancient DNA studies giving a timeline of 6,000 years seem to oppose the evidence of evidence of Mtb infection of humans in the middle east 10,000 years ago, and other estimates suggesting 70,000 years. Therefore, we have cited the work above and added a sentence highlighting that different studies propose different timelines. We would propose the industrial revolution created the ideal societal conditions for the expansion of TB, and this would seem widely accepted in the field, but have added a proviso as suggested. We did not intent to claim that Mtb genomes have been conserved since the industrial revolution, the point we were making is that despite rapid expansion within human populations, it has still remained conserved. We therefore have revised our discussion of the conservation of the Mtb genomes on lines and 72 – 74, 81 – 83 and 185 – 190.

      Trained immunity and TB infection. The treatment of trained immunity is incomplete. While BCG vaccination is known to induce trained immunity (ref 59), revaccination does not provide sustained protection (ref 8), and importantly, Mtb infection itself can also impart trained immunity (PMID: 33125891). Including these nuances would strengthen the discussion.

      We have refined this section. We did cite PMID: 33125891 in the original submission but have changed the wording to emphasise the point on line …

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Abstract

      Line 30: What is an immunological route? Suggest

      ”...host-pathogen interaction, with diverse immunological processes leading to TB disease (10%) or stable lifelong association or elimination. We suggest these alternate relationships result from the prolonged co-evolution of the pathogen with humans and may even confer a survival advantage in the 90% of exposures that do not progress to disease.”

      Thank you, we have reworded the abstract along the lines suggested above, but not identically to allow for other reviewer comments.

      Introduction

      Ln 43: It is misleading to suggest that the study of TB was the leading influence in establishing the Koch's postulates framework. Many other infections were involved, and Jacob Henle, one of Koch's teachers, is credited with the first clear formulation (see Evans AS. 1976 THE YALE JOURNAL OF BIOLOGY AND MEDICIN PMID: 782050).

      We have downplayed the language, stating that TB “contributed” to the formulation if Koch’s postulated.

      Ln 46: While the review rightly emphasises intracellular infection in macrophages, the importance and abundance of extracellular bacilli should not be ignored, particularly in transmission and in cavities.

      We agree, and have added text on the importance of extracellular bacteria and transmission.

      Ln: 56: This is misleading as primary disease prevention is implied, whereas the vaccine was given to individuals presumed to be already infected (TST or IGRA positive). Suggest ..."reduces by 50% progression to overt TB disease when given to those with immunological evidence of latent infection.

      Thank you, edit made as suggested

      Ln 62: Not sure why it is urgent. Suggest "high priority".

      Wording changed as suggested.

      Figure 1 needs clarification. The colour scale appears to signify the strength or vigour of the immune response so that disease is associated with high (orange/red) or low (green/blue) activity. The arrows seem to imply either a sequence or a route map when all we really have is an association with a plausible mechanistic link. They might also be taken to imply a hierarchy that is not appropriate. I'm not sure that the X-rays and arrows add anything, and the rectangle provides the key information on its own. Clarify please.

      We have clarified the figure legend. We feel the X-rays give the clinical context, and so have kept them, and now state in the legend that this is highlighting that there are diverse pathways leading to active disease to try to emphasise the point the figure is illustrating.

      Ln 149-157: I agree that the current dogma is that overt pulmonary disease is required to spread Mtb and fuel disease prevalence. It is vitally important to distinguish the spread of the organism from the occurrence of disease (which does not, of itself, spread). However, both epidemiological (e.g. Ryckman TS, et al. 2022Proc Natl Acad Sci U S A:10.1073/pnas.2211045119) and recent mechanistic (Dinkele R, et al. 2024iScience:10.1016/j.isci.2024.110731, Patterson B, et al. 2024Proc Natl Acad Sci U S A:10. E1073/pnas.2314813121, Warner DF, et al. 2025Nat Rev Microbiol:10.1038/s41579-025-01201-x) studies indicate the importance of asymptomatic infections, and those associated with sputum positivity have recently been recognised by WHO. I think it will be important to acknowledge the importance of this aspect and consider how immune responses may or may not contribute. I regard the view that Mtb is an obligate pathogen, dependent on overt pTB for transmission, as needing to be reviewed.

      We agree that we did not give sufficient emphasis to the emerging evidence on asymptomatic infections, and that this may play an important part in transmission in high incidence settings. We now include a discussion on this, and citation of the papers above, on lines 168 – 170.

      Ln 159: The terms colonise and colonisation are used, without a clear definition, several times. My view is that both refer to the establishment and replication of an organism on or within a host without associated damage. Where there is associated damage, this is often mediated by immune responses. In this header, I think "establishment in humanity" would be appropriate.

      We agree with this point and have changed the header as suggested, and clarified our meaning when we use the term colonisation, which the reviewer correctly interprets.

      Ln 181-: I strongly support the view that Mtb has contributed to human selection, even to the suggestion that humanity is adapted to maintain a long-term relationship with Mtb

      Thank you, and we have expanded on this evidence as suggested by other reviewers.

      Ln 189: improved.

      Apologies, typo corrected.

      Figure 2: I was also confused by this. The x-axis does not make sense, as a single property should increase. Moreover, does incidence refer to incidence in individuals with that specific balance of resistance and susceptibility, or contribution to overall global incidence - I suspect the latter (also, prevalence would make more sense). At the same time, the legend implies that those with high resistance to colonisation will be infrequent in the population, suggesting that the Y axis should be labelled "frequency in human population". Finally, I can't see what single label could apply to the X axis. While the implication that the majority of global infections reflect a balance between the resistance and susceptibilities is indicated, a frequency distribution does not seem an appropriate representation.

      The reviewer is correct that the X axis is aiming to represent two variables, which is not logical, and so we have completely changed this figure to a simple one that we hope makes the point clearly and have amended the legend appropriately. We are aiming to highlight the selective pressures of Mtb on the human population over millennia.

      Ln 244: Immunological failure - I agree with the statement but again find the figure (3) unhelpful. Do we start or end in the middle? Is the disease the outside - if so, why are different locations implied? The notion of a maze has some value, but the bacteria should start and finish in the same place by different routes.

      We are attempting to illustrate the concept that escape from host immunological control can occur through different mechanisms. As this comment was just from one reviewer, we have left the figure unchanged but have expanded the legend to try to make the point that this is just a conceptual illustration of multiple routes to disease.

      Ln 262 onward: I broadly agree with the points made about omic technologies, but would wish to see major emphasis on clear phenotyping of cases. There is something of a contradiction in the review between the emphasis on the multiplicity of immunological processes leading ultimately to disease and the recommendation to analyse via omics, which, in their most widely applied format, bundle these complexities into analyses of the humoral and cellular samples available in blood. Admittedly, the authors point out opportunities for 3-dimensional and single-cell analyses, but it is difficult to see where these end without extrapolation ad infinitum.

      We totally agree that clear phenotyping of infection is critical, and expand on this further on lines 307 - 309.

      Reviewer #2 (Recommendations for the authors):

      I suggest expanding on the genetic determinants of Mtb/host co-evolution.

      Thank you, we have now expanded on these sections as suggested.

      Reviewer #3 (Recommendations for the authors):

      We are in an era of exploding large-scale datasets from multi-omics profiling of Mtb and host interactions, offering an unprecedented lens to understand the complexity of the host immune response to Mtb-a pathogen that has infected human populations for thousands of years. The guiding philosophy for how to interpret this tremendous volume of data and what models can be built from it will be critical. In this context, the perspective article by Reichmann et al. raises an interesting concept: to "avoid unified immune mechanisms" when attempting to understand the immunology underpinning host-Mtb interactions. To support their arguments, the authors review studies and provide evidence from immune profiling, host and bacterial genetics, and showcase several inconsistencies. Overall, this perspective article is well articulated, and the concept is worthwhile for further exploration. A few comments for consideration:

      Clarity on the notion of a "unified mechanism". Was there ever a single, clearly proposed unified immunological mechanism? For example, in lines 64-65, the authors criticize that almost all investigations into immune responses to Mtb are based on the premise that a unifying disease mechanism exists. However, after reading the article, it was not clear to me how previous studies attempted to unify the model or what that unifying mechanism was. While inconsistencies in findings certainly exist, they do not necessarily indicate that prior work was guided by a unified framework. I agree that interpreting and exploring data from a broader perspective is valuable, but I am not fully convinced that previous studies were uniformly "single-minded". In fact, the concept of heterogeneity in TB has been previously discussed (e.g., PMIDs: 19855401, 28736436).

      We accept this point, and that we have overstated the argument and not acknowledged previous work sufficiently. We now downplay the language and cite the work as proposed.

      However, we would propose that essentially all published studies imply that single mechanisms underly development of disease. The authors are not aware of any manuscript that concludes “Therefore, xxxx pathway is one of several that can lead to TB disease”, instead they state “Therefore, xxxx pathway leads to TB disease”. The implication of this language is that the mechanism described occurs in all patients, whilst in fact it likely only is involved in a subset. We have toned down the language and expand on this concept on line 268 – 270.

      Evolutionary timeline and industrial-era framing. The evolutionary model needs updating. The manuscript cites a "70,000-year" origin for Mtb, but ancient-DNA studies place the most recent common ancestor at ~6,000 years BP (PMIDs: 25141181; 25848958). The Industrial Revolution is invoked multiple times as a driver of TB expansion, yet the magnitude of its contribution remains debated and, to my knowledge, lacks direct bacterial-genomics evidence for causal attribution; this should be framed as a hypothesis rather than a conclusion. In addition, the statement in lines 165-167 is inaccurate: at the genome level, Mtb has remained highly conserved since its most recent common ancestor-not specifically since the Industrial Revolution (PMID: 31448322).

      We accept these points and have made the suggested amendments, as outlined in the public responses. Our understanding is that the evidence about the most common ancestor is controversial; if the divergence of human populations occurred concurrently with Mtb, then this must have been significantly earlier than 6,000 years ago, and so there are conflicting arguments in this domain.

      Trained immunity and TB infection. The discussion of trained immunity could be expanded. Reference 59 suggests the induction of innate immune training, but reference 8 reports that revaccination does not confer protection against sustained TB infection, indicating that at least "re"-vaccination may not enhance protection. Furthermore, while BCG is often highlighted as a prototypical inducer of trained immunity, real-world infection occurs through Mtb itself. Importantly, a later study demonstrated that Mtb infection can also impart trained immunity (PMID: 33125891). Integrating these findings would provide a more nuanced view of how both vaccination and infection shape innate immune training in the TB context.

      We thank the reviewer for these suggestions and have edited the relevant section to include these studies.

    1. eLife Assessment

      This important study describes the progressive transformation of olfactory information across five different brain regions in the olfactory pathway, including a comparison of responses to familiar and unfamiliar odors. This dataset is of broad interest for olfactory researchers and provides a solid analysis of a graded change in representations of odor identity and experience in different locations in the pathway.

    2. Reviewer #1 (Public review):

      In this important study, the authors characterized the transformation of neural representations of olfactory stimuli from primary sensory cortex to multisensory regions in the medial temporal lobe and investigated how they were affected by non-associative learning. The authors used high-density silicon probe recordings from five different cortical regions while familiar vs. novel odors were presented to a head-restrained mouse. This is a timely study because unlike other sensory systems (e.g., vision), the progressive transformation of olfactory information is still poorly understood. The authors report that both odor identity and experience are encoded by all of these five cortical areas but nonetheless, some themes emerge. Single neuron tuning of odor identity is broad in the sensory cortices but becomes narrowly tuned in hippocampal regions. Furthermore, while experience affects neuronal response magnitudes in early sensory cortices, it changes the proportion of active neurons in hippocampal regions. Thus, this study is an important step forward in the ongoing quest to understand how olfactory information is progressively transformed along the olfactory pathway.

      The study is well-executed. The direct comparison of neuronal representations from five different brain regions is impressive. Conclusions are based on single neuronal level as well as population level decoding analyses. Among all the reported results, one stands out for being remarkably robust. The authors show that the anterior olfactory nucleus (AON), which receives direct input from the olfactory bulb output neurons, was far superior at decoding odor identity as well as novelty compared to all the other brain regions. This is perhaps surprising because the other primary sensory region - the piriform cortex - has been thought to be the canonical site for representing odor identity. A vast majority of studies have focused on aPCx, but direct comparisons between odor coding in the AON and aPCx are rare. The experimental design of this current study allowed the authors to do so and the AON was found to convincingly outperform aPCx. Although this result goes against the canonical model, it is consistent with a few recent studies including one that predicted this outcome based on anatomical and functional comparisons between the AON-projecting tufted cells vs. the aPCx-projecting mitral cells in the olfactory bulb.

      Future experiments are needed to probe the circuit mechanisms underlying the differential importance of the two primary olfactory cortices, as well as their potential causal roles in odor identification. Moreover, future work should test whether the decoding accuracy of odor identity and experience from neural data (as reported here) can predict the causal contributions of these regions, as revealed through perturbations during behavioral tasks that explicitly probe odor identification and/or experience.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates how olfactory representations are transformed along the cortico-hippocampal pathway in mice during a non-associative learning paradigm involving novel and familiar odors. By recording single-unit activity in several key brain regions (AON, aPCx, LEC, CA1, and SUB), the authors aim to elucidate how stimulus identity and experience are encoded and how these representations change across the pathway.

      The study addresses an important question in sensory neuroscience regarding the interplay between sensory processing and signaling novelty/familiarity. It provides insights into how the brain processes and retains sensory experiences, suggesting that the earlier stations in the olfactory pathway, the AON aPCx, play a central role in detecting novelty and encoding odor, while areas deeper into the pathway (LEC, CA1 & Sub) are more sparse and encodes odor identity but not novelty/familiarity. However, there are several concerns related to methodology, data interpretation, and the strength of the conclusions drawn.

      Strengths:

      The authors combine the use of modern tools to obtain high-density recordings from large populations of neurons at different stages of the olfactory system (although mostly one region at a time) with elegant data analyses to study an important and interesting question.

      Weaknesses:

      The first and biggest problem I have with this paper is that it is very confusing, and the results seem to be all over the place. In some parts, it seems like the AON and aPCx are more sensitive to novelty; in others, it seems the other way around. I find their metrics confusing and unconvincing. For example, the example cells in Figure 1C shows an AON neuron with a very low spontaneous firing rate and a CA1 with a much higher firing rate, but the opposite is true in Fig. 2A. So, what are we to make of Fig. 2C that shows the difference in firing rates between novel vs. familiar odors measured as a difference in spikes/sec. The meaning of this is unclear. The authors could have used a difference in Z-scored responses to normalize different baseline activity levels. (This is just one example of a problem with the methodology.)

      There are a lot of high-level data analyses (e.g., decoding, analyzing decoding errors, calculating mutual information, calculating distances in state space, etc.) but very little neural data (except for Fig. 2C, and see my comment above about how this is flawed). So, if responses to novel vs. familiar odors are different in the AON and aPCx, how are they different? Why is decoding accuracy better for novel odors in CA1 but better for familiar odors in SUB (Fig. 3A)? The authors identify a small subset of neurons that have unusually high weights in the SVM analyses that contribute to decoding novelty, but they don't tell us which neurons these are and how they are responding differently to novel vs. familiar odors.

      The authors call AON and aPCx "primary sensory cortices" and LEC, CA1, and Sub "multisensory areas". This is a straw man argument. For example, we now know that PCx encodes multimodal signals (Poo et al. 2021, Federman et al., 2024; Kehl et al., 2024), and LEC receives direct OB inputs, which has traditionally been the criterion for being considered a "primary olfactory cortical area". So, this terminology is outdated and wrong, and although it suits the authors' needs here in drawing distinctions, it is simplistic and not helpful moving forward.

      Why not simply report z-scored firing rates for all neurons as a function of trial number? (e.g., Jacobson & Friedrich, 2018). Fig. 2C is not sufficient. For example, in the Discussion, they say, "novel stimuli caused larger increases in firing rates than familiar stimuli" (L. 270), but what does this mean? Odors typically increase the firing in some neurons and suppress firing in others. Where does the delta come from? Is this because novel odors more strongly activate neurons that increase their firing or because familiar odors more strongly suppress neurons?

      Ls. 122-124 - If cells in AON and aPCx responded the same way to novel and familiar odors, then we would say that they only encode for odor and not at all for experience. So, I don't understand why the authors say these areas code for a "mixed representation of chemical identity and experience." "On the other hand," if LEC, CA1, and SUB are odor selective and only encode novel odors, then these areas, not AON and aPCx, are the jointly encoding chemical identity and experience. Also, I do not understand why, here, they say that AON and PCx respond to both while LEC, CA1, and SUB were selective for novel stimuli, but the authors then go on to argue that novelty is encoded in the AON and PCx, but not in the LEC, CA1, and SUB.

      Ls. 132-140 - As presented in the text and the figure, this section is unclear and confusing. Their use of the word "shuffled" is a major source of this confusion, because this typically is the control that produces outcomes at chance level. More importantly, it seems as though they did the wrong analysis here. A better way to do this analysis is to train on some of the odors and test on an untrained odor (i.e., what Bernardi et al., 2021 called "cross-condition generalization performance"; CCGP).

      Comments on revisions:

      I think the authors have done an adequate job addressing the reviewers' concerns. Most importantly, I found the first version of the manuscript quite confusing, and the consequent clarifications have addressed this issue.

      In several cases, I see their point, while I still disagree with whether they made the best decisions. However, the issues here do not fundamentally change the big-picture outcome, and if they want to dig in with their approaches (e.g., only using auROC or just reporting delta firing rates without any normalization), it's their choice.

    4. Reviewer #3 (Public review):

      In this manuscript, the authors investigate how odor-evoked neural activity is modulated by experience within the olfactory-hippocampal network. The authors perform extracellular recordings in the anterior olfactory nucleus (AON), the anterior piriform (aPCx) and lateral entorhinal cortex (LEC), the hippocampus (CA1) and the subiculum (SUB), in naïve mice and in mice repeatedly exposed to the same odorants. They determine the response properties of individual neurons and use population decoding analyses to assess the effect of experience on odor information coding across these regions.

      The authors' findings show that odor identity is represented in all recorded areas, but that the response magnitude and selectivity of neurons are differentially modulated by experience across the olfactory-hippocampal pathway.

      Overall, this work represents a valuable multi-region data set of odor-evoked neural activity. However, a few limitations in experimental design and analysis restrict the conclusions that can be drawn from this study.

      Main limitations:

      The authors use a non-associative learning paradigm - repeated odor exposure - to test how experience modulates odor responses along the olfactory-hippocampal pathway. While repeated odor exposure clearly modulates sampling behavior and odor-evoked neural activity, the relevance of this modulation across different brain areas remains difficult to assess.

      The authors discuss the olfactory-hippocampal pathway as a transition from primary sensory (AON, aPCx) to associative areas (LEC, CA1, SUB). While this is reasonable, given the known circuit connectivity, other interpretations are possible. For example, AON, aPCx, and LEC receive direct inputs from the olfactory bulb ('primary cortex'), while CA1 and SUB do not; AON receives direct top-down inputs from CA1 ('associative cortex'), while aPCx does not. In fact, the data presented in this manuscript do not appear to support a consistent transformation from sensory to associative, as implied by the authors.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public review):

      In this important study, the authors characterized the transformation of neural representations of olfactory stimuli from the primary sensory cortex to multisensory regions in the medial temporal lobe and investigated how they were affected by non-associative learning. The authors used high-density silicon probe recordings from five different cortical regions while familiar vs. novel odors were presented to a head-restrained mouse. This is a timely study because unlike other sensory systems (e.g., vision), the progressive transformation of olfactory information is still poorly understood. The authors report that both odor identity and experience are encoded by all of these five cortical areas but nonetheless some themes emerge. Single neuron tuning of odor identity is broad in the sensory cortices but becomes narrowly tuned in hippocampal regions. Furthermore, while experience affects neuronal response magnitudes in early sensory cortices, it changes the proportion of active neurons in hippocampal regions. Thus, this study is an important step forward in the ongoing quest to understand how olfactory information is progressively transformed along the olfactory pathway.

      The study is well-executed. The direct comparison of neuronal representations from five different brain regions is impressive. Conclusions are based on single neuronal level as well as population level decoding analyses. Among all the reported results, one stands out for being remarkably robust. The authors show that the anterior olfactory nucleus (AON), which receives direct input from the olfactory bulb output neurons, was far superior at decoding odor identity as well as novelty compared to all the other brain regions. This is perhaps surprising because the other primary sensory region - the piriform cortex - has been thought to be the canonical site for representing odor identity. A vast majority of studies have focused on aPCx, but direct comparisons between odor coding in the AON and aPCx are rare. The experimental design of this current study allowed the authors to do so and the AON was found to convincingly outperform aPCx. Although this result goes against the canonical model, it is consistent with a few recent studies including one that predicted this outcome based on anatomical and functional comparisons between the AON-projecting tufted cells vs. the aPCx-projecting mitral cells in the olfactory bulb (Chae, Banerjee et. al. 2022). Future experiments are needed to probe the circuit mechanisms that generate this important difference between the two primary olfactory cortices as well as their potential causal roles in odor identification.

      The authors were also interested in how familiarity vs. novelty affects neuronal representation across all these brain regions. One weakness of this study is that neuronal responses were not measured during the process of habituation. Neuronal responses were measured after four days of daily exposure to a few odors (familiar) and then some other novel odors were introduced. This creates a confound because the novel vs. familiar stimuli are different odorants and that itself can lead to drastic differences in evoked neural responses. Although the authors try to rule out this confound by doing a clever decoding and Euclidian distance analysis, an alternate more straightforward strategy would have been to measure neuronal activity for each odorant during the process of habituation.

      Reviewer #2 (Public review):

      This manuscript investigates how olfactory representations are transformed along the cortico-hippocampal pathway in mice during a non-associative learning paradigm involving novel and familiar odors. By recording single-unit activity in several key brain regions (AON, aPCx, LEC, CA1, and SUB), the authors aim to elucidate how stimulus identity and experience are encoded and how these representations change across the pathway.

      The study addresses an important question in sensory neuroscience regarding the interplay between sensory processing and signaling novelty/familiarity. It provides insights into how the brain processes and retains sensory experiences, suggesting that the earlier stations in the olfactory pathway, the AON aPCx, play a central role in detecting novelty and encoding odor, while areas deeper into the pathway (LEC, CA1 & Sub) are more sparse and encodes odor identity but not novelty/familiarity. However, there are several concerns related to methodology, data interpretation, and the strength of the conclusions drawn.

      Strengths:

      The authors combine the use of modern tools to obtain high-density recordings from large populations of neurons at different stages of the olfactory system (although mostly one region at a time) with elegant data analyses to study an important and interesting question.

      Weaknesses:

      (1) The first and biggest problem I have with this paper is that it is very confusing, and the results seem to be all over the place. In some parts, it seems like the AON and aPCx are more sensitive to novelty; in others, it seems the other way around. I find their metrics confusing and unconvincing. For example, the example cells in Figure 1C show an AON neuron with a very low spontaneous firing rate and a CA1 with a much higher firing rate, but the opposite is true in Figure 2A. So, what are we to make of Figure 2C that shows the difference in firing rates between novel vs. familiar odors measured as a difference in spikes/sec. This seems nearly meaningless. The authors could have used a difference in Z-scored responses to normalize different baseline activity levels. (This is just one example of a problem with the methodology.)

      We appreciate the reviewer’s concerns regarding clarity and methodology. It is less clear why all neurons in a given brain area should have similar firing rates. Anatomically defined brain areas typically comprise of multiple cell types, which can have diverse baseline firing rates. Since we computed absolute firing rate differences per neuron (i.e., novel vs. familiar odor responses within the same neuron), baseline differences across neurons do not have a major impact.

      The suggestion to use Z-scores instead of absolute firing rate differences is well taken. However, Z-scoring assumes that the underlying data are normally distributed, which is not the case in our dataset. Specifically, when analyzing odor-evoked firing rates on a per-neuron basis, only 4% of neurons exhibit a normal distribution. In cases of skewed distributions, Z-scoring can distort the data by exaggerating small variations, leading to misleading conclusions. We acknowledge that different analysis methods exist, we believe that our chosen approach best reflects the properties of the dataset and avoids potential misinterpretations introduced by inappropriate normalization techniques.

      (2) There are a lot of high-level data analyses (e.g., decoding, analyzing decoding errors, calculating mutual information, calculating distances in state space, etc.) but very little neural data (except for Figure 2C, and see my comment above about how this is flawed). So, if responses to novel vs. familiar odors are different in the AON and aPCx, how are they different? Why is decoding accuracy better for novel odors in CA1 but better for familiar odors in SUB (Figure 3A)? The authors identify a small subset of neurons that have unusually high weights in the SVM analyses that contribute to decoding novelty, but they don't tell us which neurons these are and how they are responding differently to novel vs. familiar odors.

      We performed additional analyses to address the reviewer’s feedback (Figures 2C-E and lines 118-132) and added more single-neuron data (Figures 1, S3 and S4).

      (3) The authors call AON and aPCx "primary sensory cortices" and LEC, CA1, and Sub "multisensory areas". This is a straw man argument. For example, we now know that PCx encodes multimodal signals (Poo et al. 2021, Federman et al., 2024; Kehl et al., 2024), and LEC receives direct OB inputs, which has traditionally been the criterion for being considered a "primary olfactory cortical area". So, this terminology is outdated and wrong, and although it suits the authors' needs here in drawing distinctions, it is simplistic and not helpful moving forward.

      We appreciate the reviewer’s concern regarding the classification of brain regions as “primary sensory” versus “multisensory.” Of note, the cited studies (Poo et al., 2021; Federman et al., 2024; Kehl et al., 2024) focus on posterior PCx (pPCx), while our recordings were conducted in very anterior section of anterior PCx. The aPCx and pPCx have distinct patterns of connectivity, both anatomically and functionally. To the best of our knowledge, there is no evidence for multimodal responses in aPCx, whereas there is for LEC, CA1 and SUB. Furthermore, our distinction is not based on a connectivity argument, as the reviewer suggests, but on differences in the α-Poisson ratio (Figure 1E and F).

      To avoid confusion due to definitions of what constitutes a “primary sensory” region, we adopted a more neutral description throughout the manuscript.

      (4) Why not simply report z-scored firing rates for all neurons as a function of trial number? (e.g., Jacobson & Friedrich, 2018). Figure 2C is not sufficient.

      Regarding z-scores, please see response to 1). We further added a figure showing responses of all neurons to novel stimuli (using ROC instead of z-scoring, as described previously (e.g. Cohen et al. Nature 2012). We added the following figure to the supplementary for the completeness of the analysis (S2E).

      For example, in the Discussion, they say, "novel stimuli caused larger increases in firing rates than familiar stimuli" (L. 270), but what does this mean?

      This means that on average, the population of neurons exhibit higher firing rates in response to novel odors compared to familiar ones.

      Odors typically increase the firing in some neurons and suppress firing in others. Where does the delta come from? Is this because novel odors more strongly activate neurons that increase their firing or because familiar odors more strongly suppress neurons?

      We thank the reviewer for this valuable feedback and extended the characterization of firing rate properties, including a separate analysis of neurons i) significantly excited by odorants, ii) significantly inhibited by odorants and iii) not responsive to odorants. We added the analysis and corresponding discussion to the main manuscript (Figures 2C-E and lines 118-132)

      (5) Lines 122-124 - If cells in AON and aPCx responded the same way to novel and familiar odors, then we would say that they only encode for odor and not at all for experience. So, I don't understand why the authors say these areas code for a "mixed representation of chemical identity and experience." "On the other hand," if LEC, CA1, and SUB are odor selective and only encode novel odors, then these areas, not AON and aPCx, are the jointly encoding chemical identity and experience. Also, I do not understand why, here, they say that AON and PCx respond to both while LEC, CA1, and SUB were selective for novel stimuli, but the authors then go on to argue that novelty is encoded in the AON and PCx, but not in the LEC, CA1, and SUB.

      We appreciate the reviewer’s request for clarification. Throughout the brain areas we studied, odorant identity and experience can be decoded. However, the way information is represented is different between regions. We acknowledge that that “mixed” representation is a misleading term and removed it from the manuscript.

      In AON and aPCx, neurons significantly respond to both novel and familiar odors. However, the magnitude of their responses to novel and familiar odors is sufficiently distinct to allow for decoding of odor experience (i.e., whether an odor is novel or familiar). Moreover, novelty engages more neurons in encoding the stimulus (Figure 2D). In neural space, the position of an odor’s representation in AON and aPCx shifts depending on whether it is novel or familiar, meaning that experience modifies the neural representation of odor identity. This suggests that in these regions the two representations are intertwined.

      In contrast, some neurons in LEC, CA1, and SUB exhibit responses to novel odors, but few neurons respond to familiar odors at all. This suggests a more selective encoding of novelty.

      (6) Lines 132-140 - As presented in the text and the figure, this section is poorly written and confusing. Their use of the word "shuffled" is a major source of this confusion, because this typically is the control that produces outcomes at the chance level. More importantly, they did the wrong analysis here. The better and, I think, the only way to do this analysis correctly is to train on some of the odors and test on an untrained odor (i.e., what Bernardi et al., 2021 called "cross-condition generalization performance"; CCGP).

      We appreciate the feedback and thank the reviewer for the recommendation to implement cross-condition generalization performance (CCGP) as used in Bernardi et al., 2020. We acknowledge that the term "shuffled" may have caused confusion, as it typically refers to control analyses producing chance-level outcomes. In our case, by "shuffling" we shuffled the identity of novel and familiar odors to assess how much the decoder relies on odor identity when distinguishing novelty. This test provided insight into how novelty-based structure exists within neural activity beyond random grouping but does not directly assess generalization.

      As suggested, we used CCGP to measure how well novelty-related representations generalize across different odors. Our findings show that in AON and aPCx, novelty-related information is indeed highly generalizable, supporting the idea that these regions encode novelty in a less odor-selective manner (Figure 2K).

      Reviewer #3 (Public review):

      In this manuscript, the authors investigate how odor-evoked neural activity is modulated by experience within the olfactory-hippocampal network. The authors perform extracellular recordings in the anterior olfactory nucleus (AON), the anterior piriform (aPCx) and lateral entorhinal cortex (LEC), the hippocampus (CA1), and the subiculum (SUB), in naïve mice and in mice repeatedly exposed to the same odorants. They determine the response properties of individual neurons and use population decoding analyses to assess the effect of experience on odor information coding across these regions.

      The authors' findings show that odor identity is represented in all recorded areas, but that the response magnitude and selectivity of neurons are differentially modulated by experience across the olfactory-hippocampal pathway.

      Overall, this work represents a valuable multi-region data set of odor-evoked neural activity. However, limitations in the interpretability of odor experience of the behavioral paradigm, and limitations in experimental design and analysis, restrict the conclusions that can be drawn from this study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Some suggestions, in no particular order, to further improve the manuscript:

      (1) The example neuronal responses for CA1 and SUB in Figure 1 are not very inspiring. To my eyes, the odor period response is not that different from the baseline period. In general, a thorough characterization of firing rate properties during the odor period between the different brain regions would be informative.

      We thank the reviewer for this valuable feedback. We have replaced the example neurons from CA1 and SUB in Figure 1C. We further extended the characterization of firing rate properties, including a separate analysis of neurons i) significantly excited by odorants, ii) significantly inhibited by odorants and iii) not responsive to odorants. We added the analysis and corresponding discussion to the main manuscript (Figures 2C-E and lines 118-132)

      (2) For the summary in Figure 1, why not show neuronal responses as z-scored firing rates as opposed to auROC?

      We chose to use auROC instead of z-scored firing rates due to the non-normality of the dataset, which can distort results when using z-scores. Specifically, z-scoring can exaggerate small deviations in neurons with low responsiveness, potentially leading to misleading conclusions. auROC provides a more robust measure of response change that is less sensitive to these distortions because it does not assume any specific distribution. This approach has been used previously (e.g. Cohen et al. 2012, Nature).

      (3) To study novelty, the authors presented odorants that were not used during four days of habituation. But this design makes it hard to dissociate odor identity from novelty. Why not track the response of the same odorants during the habituation process itself?

      We respectfully disagree with the argument that using different stimuli as novel and familiar constitutes a confound in our analysis. In our study, we used multiple different, structurally dissimilar single molecule chemicals which were randomly assigned to novel and familiar categories in each animal. If individual stimuli did cause “drastic differences in evoked neural responses”, these would be evenly distributed between novel and familiar stimuli. It is therefore extremely unlikely that the clear differences we observed between novel and familiar conditions and between brain areas can be attributed to the contribution of individual stimuli, in particular given our analyses was performed at the population level. In fact, we observed that responses between novel and familiar conditions were qualitatively very similar in the short time window after odor onset (Figure 1G and H).

      Importantly, the goal of this study was to investigate the impact of long-term habituation over more than 4 days, rather than short term habituation during one behavioral session. However, tracking the activity of large numbers of neurons across multiple days presents a significant technical challenge, due to the difficulty of identifying stable single-unit recordings over extended periods of time with sufficient certainty. Tools that facilitate tracking have recently been developed (e.g. Yuan AX et al., Elife. 2024) and it will be interesting to apply them to our dataset in the future.

      (4) Since novel odors lead to greater sniffing and sniffing strongly influences firing rates in the olfactory system, the authors decided to focus on a 400 ms window with similar sniffing rates for both novel vs. familiar odors. Although I understand the rationale for this choice, I worry that this is too restrictive, and it may not capture the full extent of the phenomenology.

      Could the authors model the effect of sniffing on firing rates of individual neurons from the data, and then check whether the odor response for novel context can be fully explained just by increased sniffing or not?

      It is an interesting suggestion to extend the window of analysis and observe how responses evolve with sniffing (and other behavioral reactions). To address this, we added an additional figure to the supplementary material, showing the mean responses of all neurons to novel stimuli during the entire odor presentation window (Fig. S1B).

      As suggested, we further created a Generalized Linear Model (GLM) for the entire 2s odor stimulation period, incorporating sniffing and novelty as independent variables. As expected, sniffing had a dominant impact on firing rate in all brain areas. A smaller proportion of neurons was modulated by novelty or by the interaction between novelty x breathing, suggesting the entrainment of neural activity by sniffing during the response to novel odors. These results support our decision to focus the analysis on the early 400ms window in order to dissociate the effects of novelty and behavioral responses. Taken together, our results suggest that odorant responses are modulated by novelty early during odorant processing, whereas at later stages sniffing becomes the predominant factor driving firing (Figure S2C-D).

      (5) The authors conclude that aPCx has a subset of neurons dedicated to familiar odors based on the distribution of SVM weights in Figure 3D. To me, this is the weakest conclusion of the paper because although significant, the effect size is paltry; the central tendencies are hardly different for the two conditions in aPCx. Could the authors show the PSTHs of some of these neurons to make this point more convincing?

      We appreciate the reviewer’s concern regarding the effect size. To strengthen our conclusion, we now include PSTHs of representative neurons in the least 10% and best 10% of neuronal population based on the SVM analysis (Figures S3 and S4). We hope this provides more clarity and support for the interpretation that there is a subset of neurons in aPCx that show greater sensitivity to familiar odors, despite the relatively modest central tendency differences.

      In the revised manuscript, we discuss the effect size more explicitly in the text to provide context for its significance (lines 193 - 195).

      Reviewer #2 (Recommendations for the authors):

      (1) The authors only talk about "responsive" neurons. Does this include neurons whose activity increases significantly (activated) and neurons whose activity decreases (suppressed)?

      Yes, the term "responsive" refers to neurons whose activity either increases significantly (excited) or decreases (inhibited) in response to the odor stimuli. We performed additional analyses to characterize responses separately for the different groups (Figure 2C-E and lines 118-132).

      (2) Line 54 - The Schoonover paper doesn't show that cells lose their responses to odors, but rather that the population of cells that respond to odors changes with time. That is, population responses don't become more sparse

      The fact that “the population of cells that respond to odors changes with time”, implies that some neurons lose their responsiveness (e.g. unit 2 in Figure 1 of Schoonover et al., 2021), while others become responsive (e.g. unit 1 in Figure 1 of Schoonover et al., 2021). Frequent responses reduce drift rate (Figure 4 of Schoonover et al., 2021), thus fewer neurons loose or gain responsiveness. We have revised the manuscript to clarify this.

      (3) Line 104 - "Recurrent" is incorrectly used here. I think the authors mean "repeated" or something more like that.

      Thank you for pointing this out. We replaced "recurrent" with "repeated".

      (4) Figure 3D - What is the scale bar here?

      We apologize for the accidental omission. The scale bar was be added to Figure 3D in the revised version of the manuscript.

      (5) Line 377 - They say they lowered their electrodes to "200 um/s per second." This must be incorrect. Is this just a typo, or is it really 200 um/s, because that's really fast?

      Thank you for pointing this out. It was 20 to 60 um/s, the change has been made in the manuscript.

      (6) Line 431: The authors say they used auROC to calculate changes in firing rates (which I think is only shown in Figure 1D). Note that auROC measures the discriminability of two distributions, not the strength or change in the strength of response.

      Indeed we used auROC to measure the discriminability of firing between baseline and during stimulus response. We have corrected the wording in the methods.

      (7) Figure 1B: The anatomical locations of the five areas they recorded from are straightforward, and this figure is not hugely helpful. However, the reader would benefit tremendously by including an experimental schematic. As is, we needed to scour the text and methods sections to understand exactly what they did when.

      We thank the reviewer for this suggestion. We included an experimental schematic in the supplementary material.

      (8) Figure 1F(left): This plot is much less useful without showing a pre-odor window, even if only times after the odor onset were used for calculation alpha

      We appreciate this concern, however the goal of Figure 1F is to illustrate the meaning of the alpha value itself. We chose not to include a pre-odor window comparison to avoid confusing the reader.

      (9) Figure 2A: What are the bar plots above the raster plots? Are these firing rates? Are the bars overlaid or stacked? Where is the y-axis scale bar?

      The bar plots above the raster plots represent a histogram of the spike count/trials over time, with a bin width of 50 ms. These bars are overlaid on the raster plot. We will include a y-axis scale bar in the revised figure to clarify the presentation.

      (10) Figure 4G: This makes no sense. First, the Y axis is supposed to measure standard deviation, but the axis label is spikes/s. Second, if responses in the AON are much less reliable than responses in "deeper" areas, why is odor decoding in AON so much better than in the other areas?

      We acknowledge the error in the axis label, and we will correct it to indicate the correct units. AON has a larger response variability but also larger responses magnitudes, which can explain the higher decoding accuracy.

      (11) From the model and text, one predicts that the lifetime sparseness increases along the pathway. The authors should use this metric as well/instead of "odor selectivity" because of problems with arbitrary thresholding.

      We acknowledge that lifetime sparseness, often computed using lifetime kurtosis, can be an informative measure of selectivity. However, we believe it has limitations that make it less suitable for our analysis. One key issue is that lifetime sparseness does not account for the stability of responses across multiple presentations of the same stimulus. In contrast, our odor selectivity measure incorporates trial-to-trial variability by considering responses over 10 trials and assessing significance using a Wilcoxon test compared to baseline. While the choice of a p-value threshold (e.g., 0.05) is somewhat arbitrary, it is a widely accepted statistical convention. Additionally, lifetime sparseness does not account for excitatory and inhibitory responses. For example, if a neuron X is strongly inhibited by odor A, strongly excited by odor B, and unresponsive to odors C and D, lifetime sparseness would classify it as highly selective for odor B, without capturing its inhibitory selectivity for odor A. The lifetime sparseness will be higher than if X was simply unresponsive for A.

      Our odor selectivity measure addresses this by considering both excitation and inhibition as potential responses. Thus, while lifetime sparseness could provide a useful complementary perspective in another type of dataset, it does not fully capture the dynamics of odor selectivity here.

      Author response 1.

      Lifetime Kurtosis distribution per region.

      Reviewer #3 (Recommendations for the authors):

      Main points:

      (1) The authors use a non-associative learning paradigm - repeated odor exposure - to test how experience modulates odor responses along the olfactory-hippocampal pathway. While repeated odor exposure clearly modulates odor-evoked neural activity, the relevance of this modulation and its differential effect across different brain areas are difficult to assess in the absence of any behavioral read-outs.

      Our experimental paradigm involves a robust, reliable behavioral readout of non-associative learning. Novel olfactory stimuli evoke a well-characterized orienting reaction, which includes a multitude of physiological reactions, including exploratory sniffing, facial movements and pupil dilation (Modirshanechi et al., Trends Neuroscience 2023). In our study, we focused on exploration sniffing.

      Compared to associative learning, non-associative learning might have received less attention. However, it is critically important because it forms the foundation for how organisms adapt to their environment through experience without forming associations. This is highlighted by the fact that non-instrumental stimuli can be remembered in large number (Standing, 1973) and with remarkable detail (Brady et al., 2008). While non-associative learning can thus create vast, implicit memory of stimuli in the environment, it is unclear how stimulus representations reflect this memory. Our study contributes to answering this question. We describe the impact of experience on olfactory sensory representations and reveal a transformation of representations from olfactory cortical to hippocampal structures. Our findings also indicate that sensory responses to familiar stimuli persist within sensory cortical and hippocampal regions, even after spontaneous orienting behaviors habituated. Further studies involving experimental manipulation techniques are needed to elucidate the causal mechanisms underlying the formation of stimulus memory during non-associative learning.

      (2) The authors discuss the olfactory-hippocampal pathway as a transition from primary sensory (AON, aPCx) to associative areas (LEC, CA1, SUB). While this is reasonable, given the known circuit connectivity, other interpretations are possible. For example, AON, aPCx, and LEC receive direct inputs from the olfactory bulb ('primary cortex'), while CA1 and SUB do not; AON receives direct top-down inputs from CA1 ('associative cortex'), while aPCx does not. In fact, the data presented in this manuscript does not appear to support a consistent, smooth transformation from sensory to associative, as implied by the authors (e.g. Figure 4A, F, and G).

      Thank you for this insightful comment. Indeed, there are complexities in the circuitry, and the relationships between different areas are not linear. We believe that AON and aPCx are distinctly different from LEC, CA1 and SUB, as the latter areas have been shown to integrate multimodal sensory information. To avoid confusion due to definitions of what constitutes a “primary sensory” region, we adopted a more neutral description throughout the manuscript. We also removed the term “gradual” to describe the transition of neural representations from olfactory cortical to hippocampal areas.

      (3) The analysis of odor-evoked responses is focused on a 400 ms window to exclude differences in sniffing behavior. This window spans 200 ms before and after the first inhalation after odor onset. Inhalation onset initiates neural odor responses - why do the authors include neural data before inhalation onset?

      The reason to include a brief time window prior to odor onset is to account for what is often called “partical” sniffs. In our experimental setup, odor delivery is not triggered by the animal’s inhalation. Therefore, it can happen that an animal has just begun to inhale when the stimulus is delivered. In this case, the animal is exposed to odorant molecules prior to the first complete inhalation after odor onset. We acknowledge that this limits the temporal resolution of our measurements, but it does not affect the comparison of sensory representations between different brain areas.

      It would also be interesting to explore the effect of sniffing behavior (see point 2) on odor-evoked neural activity.

      Thank you for your comment, we performed additional analysis including a GLM to address this question (Figure S2C-D).

      Minor points:

      (4) Figure 2A represents raster plots for 2 neurons per area - it is unclear how to distinguish between the 2 neurons in the plots.

      Figure 2A shows one example neuron per brain area. Each neurons has two raster plot which indicate responses to either a novel (orange) or a familiar stimulus (blue). We have revised the figure caption for clarity.

      (5) Overall, axes should be kept consistent and labeled in more detail. For example, Figure 2H and I are difficult to compare, given that the y-axis changes and that decoding accuracies are difficult to estimate without additional marks on the y-axis.

      Axes are indeed different, because chance level decoding accuracy is different between those two figures. The decoding between novel and familiar odors has a chance level of 0.5, while chance level decoding odors is 0.1 (there are 10 odors to decode the identity from).

      (6) Some parts of the discussion seem only loosely related to the data presented in this manuscript. For example, the statement that 'AON rather than aPCx should be considered as the primary sensory cortex in olfaction' seems out of context. Similarly, it would be helpful to provide data on the stability of subpopulations of neurons tuned to familiar odors, rather than simply speculate that they could be stable. The authors could summarize more speculative statements in an 'Ideas and Speculation' subsection.

      Thank you for your comment. We appreciate your perspective on our hypotheses. We have revised the discussion accordingly. Specifically, we removed the discussion of stable subpopulations, since we have not performed longitudinal tracking in this study.

      (7) The authors should try to reference relevant published work more comprehensively.

      Thank you for your comment. We attempted to include relevant published work without exceeding the limit for references but might have overseen important contributions. We apologize to our colleagues, whose relevant work might not have been cited.

    1. eLife Assessment

      This work provides a fundamental molecular mechanism of how a single enzyme can coordinate the ordered assembly of hyaluronan, a complex polysaccharide, from two different building blocks in an alternating pattern. The authors present compelling evidence by combining high-resolution structural data with rigorous biochemical validation to define the underlying process. Major strengths of the study include the clarity and coherence of the mechanistic insights and the complementary use of structural and functional approaches to address the research question.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript describes critical intermediate reaction steps of a HA synthase at the molecular level; specifically, it examines the 2nd step, polymerization, adding GlcA to GlcNAc to form the initial disaccharide of the repeating HA structure. Unlike the vast majority of known glycosyltransferases, the viral HAS (a convenient proxy extrapolated to resemble the vertebrate forms) uses a single pocket to catalyze both monosaccharide transfer steps. The authors' work illustrates the interactions needed to bind & proof-read the UDP-GlcA using direct and '2nd layer' amino acid residues. This step also allows the HAS to distinguish the two UDP-sugars; this is very important as the enzymes are not known or observed to make homopolymers of only GlcA or GlcNAc, but only make the HA disaccharide repeats GlcNAc-GlcA.

      Strengths:

      Overall, the strengths of this paper lie in its techniques & analysis.

      The authors make significant leaps forward towards understanding this process using a variety of tools and comparisons of wild-type & mutant enzymes. The work is well presented overall with respect to the text and illustrations (especially the 3D representations), and the robustness of the analyses & statistics is also noteworthy.

      Furthermore, the authors make some strides towards creating novel sugar polymers using alternative primers & work with detergent binding to the HAS. The authors tested a wide variety of monosaccharides and several disaccharides for primer activity and observed that GlcA could be added to cellobiose and chitobiose, which are moderately close structural analogs to HA disaccharides. Did the authors also test the readily available HA tetramer (HA4, [GlcA-GlcNAc]2) as a primer in their system? This is a highly recommended experiment; if it works, then this molecule may also be useful for cryo-EM studies of CvHAS as well.

      Weaknesses:

      In the past, another report describing the failed attempt of elongating short primers (HA4 & chitin oligosaccharides larger than the cello- or chitobiose that have activity in this report) with a vertebrate HAS, XlHAS1, an enzyme that seems to behave like the CvHAS ( https://pubmed.ncbi.nlm.nih.gov/10473619/); this work should probably be cited and briefly discussed. It may be that the longer primers in the 1999 paper and/or the different construct or isolation specifics (detergent extract vs crude) were not conducive to the extension reaction, as the authors extracted recombinant enzyme.

      There are a few areas that should be addressed for clarity and correctness, especially defining the class of HAS studied here (Class I-NR) as the results may (Class I-R) or may not (Class II) align (see comment (a) below), but overall, a very nicely done body of work that will significantly enhance understanding in the field.

    3. Reviewer #2 (Public review):

      Summary:

      The paper by Stephens and co-workers provides important mechanistic insight into how hyaluronan synthase (HAS) coordinates alternating GlcNAc and GlcA incorporation using a single Type-I catalytic centre. Through cryo-EM structures capturing both "proofreading" and fully "inserted" binding poses of UDP-GlcA, combined with detailed biochemical analysis, the authors show how the enzyme selectively recognizes the GlcA carboxylate, stabilizes substrates through conformational gating, and requires a priming GlcNAc for productive turnover.

      These findings clarify how one active site can manage two chemically distinct donor sugars while simultaneously coupling catalysis to polymer translocation.

      The work also reports a DDM-bound, detergent-inhibited conformation that possibly illuminates features of the acceptor pocket, although this appears to be a purification artefact (it is indeed inhibitory) rather than a relevant biological state.

      Overall, the study convincingly establishes a unified catalytic mechanism for Type-I HAS enzymes and represents a significant advance in understanding HA biosynthesis at the molecular level.

      Strengths:

      There are many strengths.

      This is a multi-disciplinary study with very high-quality cryo-EM and enzyme kinetics (backed up with orthogonal methods of product analysis) to justify the conclusions discussed above.

      Weaknesses:

      There are few weaknesses.

      The abstract and introduction assume a lot of detailed prior knowledge about hyaluronan synthases, and in doing so, risk lessening the readership pool.

      A lot of discussion focuses on detergents (whose presence is totally inhibitory) and transfer to non-biological acceptors (at high concentrations). This risks weakening the manuscript.

    1. eLife Assessment

      This valuable study addresses a question related to how we achieve visual stability across saccadic eye movements. The authors' gaze-contingent fMRI design provides convincing evidence that peripherally presented visual stimuli are represented in foveal visual cortex prior to a saccade. The results will be of interest to vision scientists and behavioural neuroscientists.

    2. Reviewer #2 (Public review):

      Summary:

      This study investigated whether the identity of a peripheral saccade target object is fed back to the foveal retinotopic cortex during saccade preparation, a critical prediction of the foveal prediction hypothesis proposed by Kroell & Rolfs (2022). To achieve this, the authors leveraged a gaze-contingent fMRI paradigm, where the peripheral saccade target was removed before the eyes landed near it, and used multivariate decoding analysis to quantify identity information in the foveal cortex. The results showed that the identity of the saccade target object can be decoded based on foveal cortex activity, despite the fovea never directly viewing the object, and that the foveal feedback representation was similar to passive viewing and not explained by spillover effects. Additionally, exploratory analysis suggested IPS as a candidate region mediating such foveal decodability. Overall, these findings provide neural evidence for the foveal cortex processing the features of the saccade target object, potentially supporting the maintenance of perceptual stability across saccadic eye movements.

      Strengths:

      This study is well-motivated by previous theoretical findings (Kroell & Rolfs, 2022), aiming to provide neural evidence for a potential neural mechanism of trans-saccadic perceptual stability. The question is important, and the gaze-contingent fMRI paradigm is a solid methodological choice for the research goal. The use of stimuli allowing orthogonal decoding of stimulus category vs stimulus shape is a nice strength, and the resulting distinctions in decoded information by brain region are clean. The results will be of interest to readers in the field, and they fill in some untested questions regarding pre-saccadic remapping and foveal feedback.

      Weaknesses:

      The authors have done a nice job addressing the previous weaknesses. The remaining weaknesses / limitations are appropriately discussed in the manuscript. E.g., the use of only 4 unique stimuli in the experiment. The findings are intriguing and relevant to saccadic remapping and foveal feedback, but somewhat limited in terms of the ability to draw theoretical distinctions between these related phenomena.

      Specifics:

      The revised manuscript is much improved in terms of framing and discussion of the prior literature, and the theoretical claims are now stated with appropriate nuance.

      I have two remaining minor suggestions/comments, which the authors may optionally respond to:

      (1) In the parametric modulation analysis, the authors' additional analyses nicely addresses my concern and strengthens the claim. However, the description in the revised manuscript (Pg 7 Ln 190-191) is minimal and may be difficult to grasp what the control analysis is about and how it rules out alternative explanations to the IPS findings. The authors may wish to elaborate on the description in the text.

      (2) Out of curiosity (not badgering): The authors argued that the findings of Harrison et al. (2013) and Szinte et al. (2015) can be explained by feature integration between the currently attended location and its future, post-saccadic location. Couldn't the same argument apply in the current paradigm, where attention at the saccade target gets remapped to the pre-saccadic fovea (see also Rolfs et al., 2011 Fig 5), thus leading to the observed feature remapping?

    3. Reviewer #3 (Public review):

      Summary:

      In this paper the authors used fMRI to determine whether peripherally-viewed objects could be decoded from foveal cortex, even when the objects themselves were never viewed foveally. Specifically they investigated whether pre-saccadic target attributes (shape, semantic category) could be decoded from foveal cortex. They found that object shape, but not semantic category could be decoded, providing evidence that foveal feedback relies on low-mid-level information. The authors claim that this provides evidence for a mechanism underlying visual stability and object recognition across saccades.

      Strengths:

      I think this is another nice demonstration that peripheral information can be decoded from / is processed in foveal cortex - the methods seem appropriate, and the experiments and analyses carefully conducted, and the main results seem convincing. The paper itself was very clear and well-written.

      Weaknesses:

      Given that foveal feedback has been found in previous studies that don't incorporate saccades, it is still unclear how this mechanism might specifically contribute to stability across saccades, rather than just being a general mechanism that aids the processing/discrimination of peripherally-viewed stimuli. The authors address this point, but I guess whether foveal feedback during fixation and saccade prep are really the same, is ultimately a question that needs more experimental work to disentangle.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The main contributions of this paper are: (1) a replication of the surprising prior finding that information about peripherally-presented stimuli can be decoded from foveal V1 (Williams et al 2008), (2) a new demonstration of cross-decoding between stimuli presented in the periphery and stimuli presented at the fovea, (3) a demonstration that the information present in the fovea is based on shape not semantic category, and (4) a demonstration that the strength of foveal information about peripheral targets is correlated with the univariate response in the same block in IPS.

      Strengths:

      The design and methods appear sound, and finding (2) above is new, and importantly constrains our understanding of this surprising phenomenon. The basic effect investigated here is so surprising that even though it has been replicated several times since it was first reported in 2008, it is useful to replicate it again.

      We thank the reviewer for their summary. While we agree with many points, we would like to respectfully push back on the notion that this work is a replication of Williams et al. (2008). What our findings share with those of Williams is a report of surprising decoding at the fovea without foveal stimulation. Beyond this similarity, we treat these as related but clearly separate findings, for the following reasons:

      (1) Foveal feedback, as shown by Williams et al. (2008) and others during fixation, was only observed during a shape discrimination task, specific to the presented stimulus. Control experiments without such a task (or a color-related task) did not show effects of foveal feedback. In contrast, in the present study, the participants’ task was merely to perform saccades towards stimuli, independently of target features. We thus show that foveal feedback can occur independently of a task related to stimulus features. This dissociation demonstrates that our study must be tapping into something different than reported by Williams.

      (2) In a related study, Kroell and Rolfs (2022, 2025) demonstrated a connection between foveal feedback and saccade preparation, including the temporal details of the onset of this effect before saccade execution, highlighting the close link of this effect to saccade preparation. Here we used a very similar behavioral task to capture this saccade-related effect in neural recordings and investigate how early it occurs and what its nature is. Thus, there is a clear motivation for this study in the context of eye movement preparation that is separate from the previous work by Williams.

      (3) Lastly, decoding in the experimental task was positively associated with activity in FEF and IPS, areas that have been reliably linked to saccade preparation. We have now also performed an additional analysis (see our response to Specific point 2 of Reviewer 2) showing that decoding in the control condition did not show the same association, further supporting the link of foveal feedback to saccade preparation. 

      Despite our emphasis on these critical differences in studies, covert peripheral attention, as required by the task in Williams et al., and saccade preparation in natural vision, as in our study, are tightly coupled processes. Indeed, the task in Williams et al. would, during natural vision, likely involve an eye movement to the peripheral target. While speculative, a parsimonious and ecologically valid explanation is that both ours and earlier studies involve eye movement preparation, for which execution is suppressed, however, in studies enforcing fixation (e.g., Williams et al., 2008). We now discuss this idea of a shared underlying mechanism more extensively in the revised manuscript (pg 8 ln 228-240). 

      Weaknesses:

      (1) The paper, including in the title ("Feedback of peripheral saccade targets to early foveal cortex") seems to assume that the feedback to foveal cortex occurs in conjunction with saccade preparation. However, participants in the original Williams et al (2008) paper never made saccades to the peripheral stimuli. So, saccade preparation is not necessary for this effect to occur. Some acknowledgement and discussion of this prior evidence against the interpretation of the effect as due to saccade preparation would be useful. (e.g., one might argue that saccade preparation is automatic when attending to peripheral stimuli.)

      We agree that the effects Williams et al. showed were not sufficiently discussed in the first version of this manuscript. To more clearly engage with these findings we now introduce saccade related foveal feedback (foveal prediction) and foveal feedback during fixation separately in the introduction (pg 2 ln 46-59).

      We further added another section in the discussion called “Foveal feedback during saccade preparation” in which we discuss how our findings are related to Williams et al. and how they differ (pg 8 ln 211-240). 

      As described in our previous response, we believe that our findings go beyond those described by Williams et al. (2008) and others in significant ways. However, during natural vision, the paradigm used by Williams et al. (2008) would likely be solved using an eye movement. Thus, while participants in Williams et al. (2008) did not execute saccades, it appears plausible that they have prepared saccades. Given the fact that covert peripheral attention and saccade preparation are tightly coupled processes (Kowler et al., 1995, Vis Res; Deubel & Schneider, 1996, Vis Res; Montagnini & Castet, 2007, J Vis; Rolfs & Carrasco, 2012, J Neurosci; Rolfs et al., 2011, Nat Neurosci), their results are parsimoniously explained by saccade preparation (but not execution) to a behaviorally relevant target.

      (2) The most important new finding from this paper is the cross-decodability between stimuli presented in the fovea and stimuli presented in the periphery. This finding should be related to the prior behavioral finding (Yu & Shim, 2016) that when a foveal foil stimulus identical to a peripheral target is presented 150 ms after the onset of the peripheral target, visual discrimination of the peripheral target is improved, and this congruency effect occurred even though participants did not consciously perceive the foveal stimulus (Yu, Q., & Shim, W. M., 2016). Modulating foveal representation can influence visual discrimination in the periphery (Journal of Vision, 16(3), 15-15).

      We thank the reviewer for highlighting this highly relevant reference. In the revised version of the manuscript, we now put more emphasis on the finding of cross-decodability (pg 2 ln 60-61). We now also discuss Yu et al.’s finding, which support our conclusion that foveal feedback and direct stimulus presentation share representational formats in early visual areas (pg 9 ln 277-279).

      (3) The prior literature should be laid out more clearly. For example, most readers will not realize that the basic effect of decodability of peripherally-presented stimuli in the fovea was first reported in 2008, and that that original paper already showed that the effect cannot arise from spillover effects from peripheral retinotopic cortex because it was not present in a retinotopic location between the cortical locus corresponding to the peripheral target and the fovea. (For example, this claim on lines 56-57 is not correct: "it remains unknown 1) whether information is fed back all the way to early visual areas".) What is needed is a clear presentation of the prior findings in one place in the introduction to the paper, followed by an articulation and motivation of the new questions addressed in this paper. If I were writing the paper, I would focus on the cross-decodability between foveal and peripheral stimuli, as I think that is the most revealing finding.

      We agree that the structure of the introduction did not sufficiently place our work in the context of prior literature. We have now expanded upon our Introduction section to discuss past studies of saccade- and fixation-related foveal feedback (pg 2 ln 49-59), laying out how this effect has been studied previously. We also removed the claim that "it remains unknown 1) whether information is fed back all the way to early visual areas", where our intention was to specifically focus on foveal prediction. We realize that this was not clear and hence removed this section. Instead, we now place a stronger focus on the cross-decodability finding (pg 2 ln 60-61).

      Reviewer #2 (Public review):

      Summary:

      This study investigated whether the identity of a peripheral saccade target object is predictively fed back to the foveal retinotopic cortex during saccade preparation, a critical prediction of the foveal prediction hypothesis proposed by Kroell & Rolfs (2022). To achieve this, the authors leveraged a gaze-contingent fMRI paradigm, where the peripheral saccade target was removed before the eyes landed near it, and used multivariate decoding analysis to quantify identity information in the foveal cortex. The results showed that the identity of the saccade target object can be decoded based on foveal cortex activity, despite the fovea never directly viewing the object, and that the foveal feedback representation was similar to passive viewing and not explained by spillover effects. Additionally, exploratory analysis suggested IPS as a candidate region mediating such foveal decodability. Overall, these findings provide neural evidence for the foveal cortex processing the features of the saccade target object, potentially supporting the maintenance of perceptual stability across saccadic eye movements.

      Strengths:

      This study is well-motivated by previous theoretical findings (Kroell & Rolfs, 2022), aiming to provide neural evidence for a potential neural mechanism of trans-saccadic perceptual stability. The question is important, and the gaze-contingent fMRI paradigm is a solid methodological choice for the research goal. The use of stimuli allowing orthogonal decoding of stimulus category vs stimulus shape is a nice strength, and the resulting distinctions in decoded information by brain region are clean. The results will be of interest to readers in the field, and they fill in some untested questions regarding pre-saccadic remapping and foveal feedback.

      We thank the reviewer for the positive assessment of our study.

      Weaknesses:

      The conclusions feel a bit over-reaching; some strong theoretical claims are not fully supported, and the framing of prior literature is currently too narrow. A critical weakness lies in the inability to test a distinction between these findings (claiming to demonstrate that "feedback during saccade preparation must underlie this effect") and foveal feedback previously found during passive fixation (Williams et al., 2008). Discussions (and perhaps control analysis/experiments) about how these findings are specific to the saccade target and the temporal constraints on these effects are lacking. The relationship between the concepts of foveal prediction, foveal feedback, and predictive remapping needs more thorough treatment. The choice to use only 4 stimuli is justified in the manuscript, but remains an important limitation. The IPS results are intriguing but could be strengthened by additional control analysis. Finally, the manuscript claims the study was pre-registered ("detailing the hypotheses, methodology, and planned analyses prior to data collection"), but on the OSF link provided, there is just a brief summary paragraph, and the website says "there have been no completed registrations of this project".

      We thank the reviewer for these helpful considerations. We agree that some of the claims were not sufficiently supported by the evidence, and in the revised manuscript, we added nuance to those claims (pg 8 ln 211-240). Furthermore, we now address more directly the distinction between foveal feedback during fixation and foveal feedback (foveal prediction) during saccade preparation. In particular, we now describe the literature about these two effects separately in the introduction (pg 2 ln 46-59), and we have added a new section in the discussion (“Foveal feedback during saccade preparation”) that more thoroughly explains why a passive fixation condition would have been unlikely to produce the same results we find (pg 8 ln 211-227). We also adapted the section about “Saccadic remapping or foveal prediction”, clearly delineating foveal prediction from feature remapping and predictive updating of attention pointers. As recommended by the reviewer, we conducted the parametric modulation analyses on the control condition, strengthening the claim that our findings are saccade-related. These results were added as Supplementary Figure 2 and are discussed in (pg 7 ln 190-191) and (pg 8 ln 224-227). 

      Lastly, we would like to apologize about a mistake we made with the pre-registration. We realized that the pre-registration had indeed not been submitted. We have now done so without changing the pre-registration itself, which can be seen from the recent activity of the preregistration (screenshot attached in the end). After consulting an open science expert at the University of Leipzig, we added a note of this mistake to the methods section of the revised manuscript (pg 10 ln 326-332). We could remove reference to this preregistration altogether, but would keep it at the discretion of the editor. 

      Specifics:

      (1) In the eccentricity-dependent decoding results (Figure 2B), are there any statistical tests to support the results being a U-shaped curve? The dip isn't especially pronounced. Is 4 degrees lower than the further ones? Are there alternative methods of quantifying this (e.g., fitting it to a linear and quadratic function)?

      We statistically tested the U-shaped relationship using a weighted quadratic regression, which showed significant positive curvature for decoding between fovea and periphery in all early visual areas (V1: t(27) = 3.98, p = 0.008, V2: t(27) = 3.03, p = 0.02, V3: t(27)= 2.776, p = 0.025, one-sided). We now report these results in the revised manuscript (pg 5 ln 137-138).

      (2) In the parametric modulation analysis, the evidence for IPS being the only region showing stronger fovea vs peripheral beta values was weak, especially given the exploratory nature of this analysis. The raw beta value can reflect other things, such as global brain fluctuations or signal-to-noise ratio. I would also want to see the results of the same analysis performed on the control condition decoding results.

      We appreciate the reviewer’s suggestion and repeated the same parametric modulation analysis on the control condition to assess the influence of potential confounds on the overall beta values (Supplementary Figure 2). The results show a negative association between foveal decoding and FEF and IPS (likely because eye movements in the control condition lead to less foveal presentation of the stimulus) and a positive association with LO. Peripheral decoding was not associated with significant changes in any of the ROIs, indicating that global brain fluctuations alone are not responsible for the effects reported in the experimental condition. The results of this analysis thus show a specific positive association of IPS activity with the experimental condition, not the control condition, which is in line with the idea that the foveal feedback effect reported in this study may be related to saccade preparation.

      (3) Many of the claims feel overstated. There is an emphasis throughout the manuscript (including claims in the abstract) that these findings demonstrate foveal prediction, specifically that "image-specific feedback during saccade preparation must underlie this effect." To my understanding, one of the key aspects of the foveal prediction phenomenon that ties it closely to trans-saccadic stability is its specificity to the saccade target but not to other objects in the environment. However, it is not clear to what degree the observed findings are specific to saccade preparation and the peripheral saccade target. Should the observers be asked to make a saccade to another fixation location, or simply maintain passive fixation, will foveal retinotopic cortex similarly contain the object's identity information? Without these control conditions, the results are consistent with foveal prediction, but do not definitively demonstrate that as the cause, so claims need to be toned down.

      We fully agree with the reviewer and toned down claims about foveal prediction. We engage with the questions raised by the reviewer more thoroughly in the new discussion section “Foveal feedback during saccade preparation”.

      In addition, we agree that another condition in which subjects make a saccade towards a different location would have been a great addition that we also considered, but due to concerns with statistical power did not add. While including such a condition exceeds the scope of the current study, we included this limitation in the Discussion section (pg 10 ln 316) and hope that future studies will address this question.

      (4) Another critical aspect is the temporal locus of the feedback signal. In the paradigm, the authors ensured that the saccade target object was never foveated via the gaze-contingent procedure and a conservative data exclusion criterion, thus enabling the test of feedback signals to foveal retinotopic cortex. However, due to the temporal sluggishness of fMRI BOLD signals, it is unclear when the feedback signal arrives at the foveal retinotopic cortex. In other words, it is possible that the feedback signal arrives after the eyes land at the saccade target location. This possibility is also bolstered by Chambers et al. (2013)'s TMS study, where they found that TMS to the foveal cortex at 350-400 ms SOA interrupts the peripheral discrimination task. The authors should qualify their claims of the results occurring "during saccade preparation" (e.g., pg 1 ln 22) throughout the manuscript, and discuss the importance of temporal dynamics of the effect in supporting stability across saccades.

      We fully agree that the sluggishness of the fMRI signal presents an important challenge in investigating foveal feedback. We have now included this limitation in the discussion (pg 10 ln 306-318). We also clarify that our argument connects to previous studies investigating the temporal dynamics of foveal feedback using similar tasks (pg 10 ln 313-316). Specifically, in their psychophysical work, Kroell and Rolfs (2022) and (2025) showed that foveal feedback occurs before saccade execution with a peak around 80 ms before the eye movement. 

      (5) Relatedly, the claims that result in this paradigm reflect "activity exclusively related to predictive feedback" and "must originate from predictive rather than direct visual processes" (e.g., lines 60-65 and throughout) need to be toned down. The experimental design nicely rules out direct visual foveal stimulation, but predictive feedback is not the only alternative to that. The activation could also reflect mental imagery, visual working memory, attention, etc. Importantly, the experiment uses a block design, where the same exact image is presented multiple times over the block, and the activation is taken for the block as a whole. Thus, while at no point was the image presented at the fovea, there could still be more going on than temporally-specific and saccade-specific predictive feedback.

      We agree that those claims could have misled the reader. Our intention was to state that the activation originates from feedback rather than direct foveal stimulation because of the nature of the design. We have now clarified these statements (pg 2 ln 65) and also included a discussion of other effects including imagery and working memory in the limitations section (pg 10 ln 306-313).

      (6) The authors should avoid using the terms foveal feedback and foveal prediction interchangeably. To me, foveal feedback refers to the findings of Williams et al. (2008), where participants maintained passive fixation and discriminated objects in the periphery (see also Fan et al., 2016), whereas foveal prediction refers to the neural mechanism hypothesized by Kroell & Rolfs (2022), occurring before a saccade to the target object and contains task irrelevant feature information.

      We agree, and we have now adopted a clearer distinction between these terms, referring to foveal prediction only when discussing the distinct predictive nature of the effect discovered by Kroell and Rolfs (2022). Otherwise we referred to this effect as foveal feedback.

      (7) More broadly, the treatment of how foveal prediction relates to saccadic remapping is overly simplistic. The authors seem to be taking the perspective that remapping is an attentional phenomenon marked by remapping of only attentional/spatial pointers, but this is not the classic or widely accepted definition of remapping. Within the field of saccadic remapping, it is an ongoing debate whether (/how/where/when) information about stimulus content is remapped alongside spatial location (and also whether the attentional pointer concept is even neurophysiologically viable). This relationship between saccadic remapping and foveal prediction needs clarification and deeper treatment, in both the introduction and discussion.

      We thank the reviewer for their remarks. We reformulated the discussion section on “Saccadic remapping or foveal prediction” to include the nuances about spatial and feature remapping laid out in the reviewer’s comment (pg 8-9 ln 241-269). We also put a stronger focus on the special role the fovea seems to be playing regarding the feedback of visual features (pg 8-9 ln 265-269).

      (8) As part of this enhanced discussion, the findings should be better integrated with prior studies. E.g., there is some evidence for predictive remapping inducing integration of non-spatial features (some by the authors themselves; Harrison et al., 2013; Szinte et al., 2015). How do these findings relate to the observed results? Can the results simply be a special case of non-spatial feature integration between the currently attended and remapped location (fovea)? How are the results different from neurophysiological evidence for facilitation of the saccade target object's feature across the visual field (Burrow et al., 2014)? How might the results be reconciled with a prior fMRI study that failed to find decoding of stimulus content in remapped responses (Lescroart et al, 2016)? Might this reflect a difference between peripheral-to-peripheral vs peripheral-to-foveal remapping? A recent study by Chiu & Golomb (2025) provided supporting evidence for peripheral-to-fovea remapping (but not peripheral-to-peripheral remapping) of object-location binding (though in the post-saccadic time window), and suggested foveal prediction as the underlying mechanism.

      We thank the reviewer for raising these intriguing questions. We now address them in the revised discussion. We argue that the findings by Harrison et al., 2013 and Szinte et al., 2015 of presaccadic integration of features across two peripheral locations can be explained by presaccadic updating of spatial attention pointers rather than remapping of feature information (pg 8 ln 248-253). The lack of evidence for periphery-to-periphery remapping (Lescroart et al, 2016) and the recent study by Chiu & Golomb (2025) showing object-location binding from periphery to fovea nicely align with our characterization of foveal processing as unique in predicting feature information of upcoming stimuli (pg 8-9 ln 265-269). Finally, we argue that the global (i.e., space-invariant) selection task-irrelevant saccadic target features (Burrows et al., 2014) is well-established at the neural level, but does not suffice to explain the spatially specific nature of foveal prediction (pg 8 ln 220-224). We now include these studies in the revised discussion section.

      Reviewer #3 (Public review):

      Summary:

      In this paper, the authors used fMRI to determine whether peripherally viewed objects could be decoded from the foveal cortex, even when the objects themselves were never viewed foveally. Specifically, they investigated whether pre-saccadic target attributes (shape, semantic category) could be decoded from the foveal cortex. They found that object shape, but not semantic category, could be decoded, providing evidence that foveal feedback relies on low-mid-level information. The authors claim that this provides evidence for a mechanism underlying visual stability and object recognition across saccades.

      Strengths:

      I think this is another nice demonstration that peripheral information can be decoded from / is processed in the foveal cortex - the methods seem appropriate, and the experiments and analyses are carefully conducted, and the main results seem convincing. The paper itself was very clear and well-written.

      We thank the reviewer for this positive evaluation of our work. As discussed in our response to Reviewer 1, we now elaborate on the differences between previous work showing decoding of peripheral information from foveal cortex from the effect shown here. While there are important similarities between these findings, foveal prediction in our study occurs in a saccade condition and in the absence of a task that is specific to stimulus features. 

      Weaknesses:

      There are a couple of reasons why I think the main theoretical conclusions drawn from the study might not be supported, and why a more thorough investigation might be needed to draw these conclusions.

      (1) The authors used a blocked design, with each object being shown repeatedly in the same block. This meant that the stimulus was entirely predictable on each block, which weakens the authors' claims about this being a predictive mechanism that facilitates object recognition - if the stimulus is 100% predictable, there is no aspect of recognition or discrimination actually being tested. I think to strengthen these claims, an experiment would need to have unpredictable stimuli, and potentially combine behavioural reports with decoding to see whether this mechanism can be linked to facilitating object recognition across saccades.

      We appreciate the reviewer’s point and would like to highlight that it was not our intention to claim a behavioral effect on object recognition. We believe that an ambiguous formulation in the original abstract may have been interpreted this way, and we thus removed this reference. We also speculated in our Discussion that a potential reason for foveal prediction could be a headstart in peripheral object recognition and in the revised manuscript more clearly highlight that this is a  potential future direction only.

      (2)  Given that foveal feedback has been found in previous studies that don't incorporate saccades, how is this a mechanism that might specifically contribute to stability across saccades, rather than just being a general mechanism that aids the processing/discrimination of peripherally-viewed stimuli? I don't think this paper addresses this point, which would seem to be crucial to differentiate the results from those of previous studies.

      We fully agree that this point had not been sufficiently addressed in the previous version of the manuscript. As described in our responses to similar comments from reviewers 1 and 2, we included an additional section in the Discussion (“Foveal feedback during saccade preparation”) to more clearly delineate the present study from previous findings of foveal feedback. Previous studies (Williams et al., 2008) only found foveal feedback during narrow discrimination tasks related to spatial features of the target stimulus, not during color-discrimination or fixation-only tasks, concluding that the observed effect must be related to the discrimination behavior. In contrast, we found foveal feedback (as evidenced by decoding of target features) during a saccade condition that was independent of the target features, suggesting a different role of foveal feedback than hypothesized by Williams et al. (2008).

      Recommendations for the authors:  

      Reviewer #2 (Recommendations for the authors):

      (A) Minor comments:

      (1)  The task should be clarified earlier in the manuscript.

      We now characterise the task in the abstract and clarified its description in the third paragraph, right after introducing the main literature.

      (2) Is there actually only 0.5 seconds between saccades? This feels very short/rushed.

      The inter-trial-interval was 0.5 seconds, though effectively it varied because the target only appeared once participants fixated on the fixation dot. Note that this pacing is slower than the rate of saccades in natural vision (about 3 to 4 saccades per second).Participants did not report this paradigm as rushed.

      (3) Typo on pg2 ln64 (whooe).

      Fixed.

      (4)  Can the authors also show individual data points for Figures 3 and 4?

      We added individual data points for Figures 4 and S2

      (5) The MNI coordinates on Figure 4A seem to be incorrect.

      We took out those coordinates.

      (6) Pg4 ln126 and pg6 ln194, why cite Williams et al. (2008)?

      We included this reference here to acknowledge that Williams et al. raised the same issues. We added a “cf.” before this reference to clarify this.

      (7) Pg7 ln207 Fabius et al. (2020) showed slow post-saccadic feature remapping, rather than predictive remapping of spatial attention.

      We have corrected this mistake.

      (8) The OSF link is valid, but I couldn't find a pre-registration.

      The issue with the OSF link has been resolved. The pre-registration had been set up but not published. We now published it without changing the original pre-registration (see the screenshot attached).

      (9) I couldn't access the OpenNeuro repository.

      The issue with the OpenNeuro link has been resolved.

      (B) Additional references you may wish to include:

      (1) Burrows, B. E., Zirnsak, M., Akhlaghpour, H., Wang, M., & Moore, T.  (2014). Global selection of saccadic target features by neurons in area v4. Journal of Neuroscience.

      (2) Chambers, C. D., Allen, C. P., Maizey, L., & Williams, M. A. (2013). Is delayed foveal feedback critical for extra-foveal perception?. Cortex.

      (3) Chiu, T. Y., & Golomb, J. D. (2025). The influence of saccade target status on the reference frame of object-location binding. Journal of Experimental Psychology. General.

      (4) Harrison, W. J., Retell, J. D., Remington, R. W., & Mattingley, J. B. (2013). Visual crowding at a distance during predictive remapping. Current Biology.

      (5) Lescroart, M. D., Kanwisher, N., & Golomb, J. D. (2016). No evidence for automatic remapping of stimulus features or location found with fMRI. Frontiers in Systems Neuroscience.

      (6) Moran, C., Johnson, P. A., Hogendoorn, H., & Landau, A. N. (2025). The representation of stimulus features during stable fixation and active vision. Journal of Neuroscience.

      (7) Szinte, M., Jonikaitis, D., Rolfs, M., Cavanagh, P., & Deubel, H. (2016). Presaccadic motion integration between current and future retinotopic locations of attended objects. Journal of Neurophysiology.

      We thank the reviewer for pointing out these references. We have included them in the revised version of the manuscript.

      Reviewer #3 (Recommendations for the authors):

      I just have a few minor points where I think some clarifications could be made.

      (1) Line 64 - "whooe" should be "whoose" I think.

      Fixed.

      (2) Around line 53 - you might consider citing this review on foveal feedback - https://doi.org/10.1167/jov.20.12.2

      We included the reference (pg 2 ln 55).

      (3) Line 129 - you mention a u-shaped relationship for decoding - I wasn't quite sure of the significance/relevance of this relationship - it would be helpful to expand on this / clarify what this means.

      We have expanded this section and added statistical tests of the u-shaped relationship in decoding using a weighted quadratic regression. We found significant positive curvature in all early visual areas between fovea and periphery (V1: t(27) = 3.98, p = 0.008, V2: t(27) = 3.03, p = 0.02, V3: t(27)= 2.776, p = 0.025). These findings support a u-shaped relationship. We now report these results in the revised manuscript (pg 5 ln 137-138).

      (4) Figure 1 - it would be helpful to indicate how long the target was viewed in the "stim on" panels - I assume it was for the saccade latency, but it would be good to include those values in the main text.

      We included that detail in the text (pg 3 ln 96-97).

    1. eLife Assessment

      The development of glmSMA represents a valuable advancement in spatial transcriptomics analysis, offering a mathematically robust regression-based approach that achieves higher-resolution mapping of single-cell RNA sequencing data to spatial locations than existing methods. The evidence is convincing, as the authors demonstrate the method's superiority by formulating it as a convex optimization problem that ensures stable solutions, coupled with successful validation across multiple biological systems. The rigorous mathematical framework and validation across diverse tissues enable precise spatial mapping of cellular heterogeneity at enhanced resolution.

    2. Reviewer #2 (Public review):

      Summary:

      The author proposes a novel method for mapping single-cell data to specific locations with higher resolution than several existing tools.

      Strengths:

      The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus.

      Comments on revised version:

      I have no additional comments regarding the current version of the manuscript.

    3. Reviewer #3 (Public review):

      Summary:

      The authors have provided a thorough and constructive response to the comments. They effectively addressed concerns regarding the dependence on marker gene selection by detailing the incorporation of multiple feature selection strategies, such as highly variable genes and spatially informative markers (e.g., via Moran's I), which enhance glmSMA's robustness even when using gene-limited reference atlases.

      Furthermore, the authors thoughtfully acknowledged the assumption underlying glmSMA-that transcriptionally similar cells are spatially proximal-and discussed both its limitations and empirical robustness in heterogeneous tissues such as human PDAC. Their use of real-world, heterogeneous datasets to validate this assumption demonstrates the method's practical utility and adaptability.

      Overall, the response appropriately contextualizes the limitations while reinforcing the generalizability and performance of glmSMA. The authors' clarifications and experimental justifications strengthen the manuscript and address the reviewer's concerns in a scientifically sound and transparent manner.

      Comments on revised version:

      Figure 1 does not yet clearly convey what the glmSMA algorithm actually does. I recommend revising or redesigning the figure so that the workflow, main inputs, and outputs of the algorithm are more intuitively presented. A clearer visual explanation would help readers quickly grasp the core concept and contribution of glmSMA.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1

      (1) Related to comment 3, related to the spatial communication section, either provide a clearer worked example or adjust the framing to avoid implying a more developed capability than is shown.

      We appreciate the reviewer’s feedback regarding the framing of the spatial communication section. We have removed this section from the revised version.

      (2) Related to comment 4 about resolution, consider including explicit numerical estimates of spatial resolution (e.g., median patch diameter in micrometers) for at least one dataset to help users understand practical mapping granularity.

      We appreciate the suggestion. We have added explicit numerical estimates of spatial resolution to clarify our mappings. Specifically, we now (i) define “patch” precisely and (ii) report the median patch diameter (in µm) for representative datasets:

      10x Visium (mouse cortex): spot diameter = 55 µm; center-to-center spacing = 100 µm.

      Slide-seqV2 (mouse brain): bead diameter ≈ 10 µm. When we optionally coarse-grain to 5×5 bead tiles for robustness, the effective patch diameter is ~50 µm

    1. eLife Assessment

      This valuable study investigates the relationship between pupil dilation and information gain during associative learning, using two different tasks. A key strength of this study is its exploration of pupil dilation beyond the immediate response period, extending analysis to later time windows after feedback, and it provides convincing evidence that pupillary response to information gain may be context-dependent during associative learning. The interpretation remains limited by task heterogeneity and unresolved contextual factors influencing pupil dynamics, but a range of interesting ideas are discussed.

    2. Reviewer #1 (Public review):

      Summary:

      This study examines whether changes in pupil size index prediction-error-related updating during associative learning, formalised as information gain via Kullback-Leibler (KL) divergence. Across two independent tasks, pupil responses scaled with KL divergence shortly after feedback, with the timing and direction of the response varying by task. Overall, the work supports the view that pupil size reflects information-theoretic processes in a context-dependent manner.

      Strengths:

      This study provides a novel and convincing contribution by linking pupil dilation to information-theoretic measures, such as KL divergence, supporting Zénon's hypothesis that pupil responses reflect information gain during learning. The robust methodology, including two independent datasets with distinct task structures, enhances the reliability and generalisability of the findings. By carefully analysing early and late time windows, the authors capture the timing and direction of prediction-error-related responses, offering new insights into the temporal dynamics of model updating. The use of an ideal-learner framework to quantify prediction errors, surprise, and uncertainty provides a principled account of the computational processes underlying pupil responses. The work also highlights the critical role of task context in shaping the direction and magnitude of these effects, revealing the adaptability of predictive processing mechanisms. Importantly, the conclusions are supported by rigorous control analyses and preprocessing sanity checks, as well as convergent results from frequentist and Bayesian linear mixed-effects modelling approaches.

      Weaknesses:

      Some aspects of directionality remain context-dependent, and on current evidence cannot be attributed specifically to whether average uncertainty increases or decreases across trials. Differences between the two tasks (e.g., sensory modality and learning regime) limit direct comparisons of effect direction and make mechanistic attribution cautious. In addition, subjective factors such as confidence were not measured and could influence both prediction-error signals and pupil responses. Importantly, the authors explicitly acknowledge these limitations, and the manuscript clearly frames them as areas for future work rather than settled conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate whether pupil dilation reflects information gain during associative learning, formalised as Kullback-Leibler divergence within an ideal observer framework. They examine pupil responses in a late time window after feedback and compare these to information-theoretic estimates (information gain, surprise, and entropy) derived from two different tasks with contrasting uncertainty dynamics.

      Strength:

      The exploration of task evoked pupil dynamics beyond the immediate response/feedback period and then associating them with model estimates was interesting and inspiring. This offered a new perspective on the relationship between pupil dilation and information processing.

      Weakness:

      However, the interpretability of the findings remains constrained by the fundamental differences between the two tasks (stimulus modality, feedback type, and learning structure), which confound the claimed context-dependent effects. The later time-window pupil effects, although intriguing, are small in magnitude and may reflect residual noise or task-specific arousal fluctuations rather than distinct information-processing signals. Thus, while the study offers valuable methodological insight and contributes to ongoing debates about the role of the pupil in cognitive inference, its conclusions about the functional significance of late pupil responses should be treated with caution.

    4. Reviewer #3 (Public review):

      Summary:

      Thank you for inviting me to review this manuscript entitled "Pupil dilation offers a time-window on prediction error" by Colizoli and colleagues. The study examines prediction errors, information gain (Kullback-Leibler [KL] divergence), and uncertainty (entropy) from an information-theory perspective using two experimental tasks and pupillometry. The authors aim to test a theoretical proposal by Zénon (2019) that the pupil response reflects information gain (KL divergence). The conclusion of this work is that (post-feedback) pupil dilation in response to information gain is context dependent.

      Strengths:

      Use of an established Bayesian model to compute KL divergence and entropy.

      Pupillometry data preprocessing and multiple robustness checks.

      Weaknesses:

      Operationalization of prediction errors based on frequency, accuracy, and their interaction:

      The authors rely on a more model-agnostic definition of the prediction error in terms of stimulus frequency ("unsigned prediction error"), accuracy, and their interaction ("signed prediction error"). While I see the point, I would argue that this approach provides a simple approximation of the prediction error, but that a model-based approach would be more appropriate.

      Model validation:

      My impression is that the ideal learner model should work well in this case. However, the authors don't directly compare model behavior to participant behavior ("posterior predictive checks") to validate the model. Therefore, it is currently unclear if the model-derived terms like KL divergence and entropy provide reasonable estimates for the participant data.

      Lack of a clear conclusion:

      The authors conclude that this study shows for the first time that (post-feedback) pupil dilation in response to information gain is context dependent. However, the study does not offer a unifying explanation for such context dependence. The discussion is quite detailed with respect to task-specific effects, but fails to provide an overarching perspective on the context-dependent nature of pupil signatures of information gain. This seems to be partly due to the strong differences between the experimental tasks.

    5. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study examines whether changes in pupil size index prediction-error-related updating during associative learning, formalised as information gain via Kullback-Leibler (KL) divergence. Across two independent tasks, pupil responses scaled with KL divergence shortly after feedback, with the timing and direction of the response varying by task. Overall, the work supports the view that pupil size reflects information-theoretic processes in a context-dependent manner.

      Strengths:

      This study provides a novel and convincing contribution by linking pupil dilation to informationtheoretic measures, such as KL divergence, supporting Zénon's hypothesis that pupil responses reflect information gain during learning. The robust methodology, including two independent datasets with distinct task structures, enhances the reliability and generalisability of the findings. By carefully analysing early and late time windows, the authors capture the timing and direction of prediction-error-related responses, oPering new insights into the temporal dynamics of model updating. The use of an ideal-learner framework to quantify prediction errors, surprise, and uncertainty provides a principled account of the computational processes underlying pupil responses. The work also highlights the critical role of task context in shaping the direction and magnitude of these ePects, revealing the adaptability of predictive processing mechanisms. Importantly, the conclusions are supported by rigorous control analyses and preprocessing sanity checks, as well as convergent results from frequentist and Bayesian linear mixed-ePects modelling approaches.

      Weaknesses:

      Some aspects of directionality remain context-dependent, and on current evidence cannot be attributed specifically to whether average uncertainty increases or decreases across trials. DiPerences between the two tasks (e.g., sensory modality and learning regime) limit direct comparisons of ePect direction and make mechanistic attribution cautious. In addition, subjective factors such as confidence were not measured and could influence both predictionerror signals and pupil responses. Importantly, the authors explicitly acknowledge these limitations, and the manuscript clearly frames them as areas for future work rather than settled conclusions.

      Reviewer #2 (Public review):

      Summary:

      The authors investigate whether pupil dilation reflects information gain during associative learning, formalised as Kullback-Leibler divergence within an ideal observer framework. They examine pupil responses in a late time window after feedback and compare these to informationtheoretic estimates (information gain, surprise, and entropy) derived from two diPerent tasks with contrasting uncertainty dynamics.

      Strength:

      The exploration of task evoked pupil dynamics beyond the immediate response/feedback period and then associating them with model estimates was interesting and inspiring. This oPered a new perspective on the relationship between pupil dilation and information processing.

      Weakness:

      However, the interpretability of the findings remains constrained by the fundamental diPerences between the two tasks (stimulus modality, feedback type, and learning structure), which confound the claimed context-dependent ePects. The later time-window pupil ePects, although intriguing, are small in magnitude and may reflect residual noise or task-specific arousal fluctuations rather than distinct information-processing signals. Thus, while the study oPers valuable methodological insight and contributes to ongoing debates about the role of the pupil in cognitive inference, its conclusions about the functional significance of late pupil responses should be treated with caution.

      Reviewer #3 (Public review):

      Summary:

      Thank you for inviting me to review this manuscript entitled "Pupil dilation oPers a time-window on prediction error" by Colizoli and colleagues. The study examines prediction errors, information gain (Kullback-Leibler [KL] divergence), and uncertainty (entropy) from an information-theory perspective using two experimental tasks and pupillometry. The authors aim to test a theoretical proposal by Zénon (2019) that the pupil response reflects information gain (KL divergence). The conclusion of this work is that (post-feedback) pupil dilation in response to information gain is context dependent.

      Strengths:

      Use of an established Bayesian model to compute KL divergence and entropy.

      Pupillometry data preprocessing and multiple robustness checks.

      Weaknesses:

      Operationalization of prediction errors based on frequency, accuracy, and their interaction:

      The authors rely on a more model-agnostic definition of the prediction error in terms of stimulus frequency ("unsigned prediction error"), accuracy, and their interaction ("signed prediction error"). While I see the point, I would argue that this approach provides a simple approximation of the prediction error, but that a model-based approach would be more appropriate.

      Model validation:

      My impression is that the ideal learner model should work well in this case. However, the authors don't directly compare model behavior to participant behavior ("posterior predictive checks") to validate the model. Therefore, it is currently unclear if the model-derived terms like KL divergence and entropy provide reasonable estimates for the participant data.

      Lack of a clear conclusion:

      The authors conclude that this study shows for the first time that (post-feedback) pupil dilation in response to information gain is context dependent. However, the study does not oPer a unifying explanation for such context dependence. The discussion is quite detailed with respect to taskspecific ePects, but fails to provide an overarching perspective on the context-dependent nature of pupil signatures of information gain. This seems to be partly due to the strong diPerences between the experimental tasks.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I highly appreciate the care and detail in the authors' response and thank them for the ePort invested in revising the manuscript. They addressed the core concerns to a high standard, and the manuscript has substantially improved in methodological rigour (through additional controls/sanity checks and complementary mixed-ePects analyses) and in clarity of interpretation (by explicitly acknowledging context-dependence and tempering stronger claims). The present version reads clearly and is much strengthened overall. I only have a few minor points below:

      Minor suggestions:

      Abstract:

      In the abstract KL is introduced as abbreviation, but at first occurence it should be written out as "Kullback-Leibler (KL)" for readers not familiar with it.

      We thank the reviewer for catching this error. It has been correct in the version of record.

      Methods:

      I appreciate the additional bayesian LME analysis. I only had a few things that I thought were missing from knowing the parameters: 1) what was the target acceptance rate (default of .95?), 2) which family was used to model the response distribution: (default) "gaussian" or robust "student-t"? Depending on the data a student-t would be preferred, but since the author's checked the fit & the results corroborate the correlation analysis, using the default would also be fine! Just add the information for completeness.

      Thank you for bringing this to our attention. We have now noted that default parameters were used in all cases unless otherwise mentioned. 

      Thank you once again for your time and consideration.

      Reviewer #2 (Recommendations for the authors):

      Thanks to the authors' ePort on revision. I am happy with this new version of manuscript.

      Thank you once again for your time and consideration.

      Reviewer #3 (Recommendations for the authors):

      (1) Regarding comments #3 and #6 (first round) on model validation and posterior predictive checks, the authors replied that since their model is not a "generative" one, they can't perform posterior predictive checks. Crucially, in eq. 2, the authors present the p{tilde}^j_k variable denoting the learned probability of event k on trial j. I don't see why this can't be exploited for simulations. In my opinion, one could (and should) generate predictions based on this variable. The simplest implementation would translate the probability into a categorical choice (w/o fitting any free parameter). Based on this, they could assess whether the model and data are comparable.

      We thank the reviewer for this clarification. The reviewer suggests using the probability distributions at each trial to predict which event should be chosen on each trial. More specifically, the event(s) with the highest probability on trial j could be used to generate a prediction for the choice of the participant on trial j. We agree that this would indeed be an interesting analysis. However, the response options of each task are limited to two-alternatives. In the cue-target task, four events are modeled (representing all possible cue-target conditions) while the participants’ response options are only “left” and “right”. Similarly, in the letter-color task, 36 events are modeled while the participants’ response options are “match” and “no-match”. In other words, we do not know which event (either four or 36, for the two tasks) the participant would have indicated on each trial. As an approximation to this fine-grained analysis, we investigated the relationship between the information-theoretic variables separately for error and correct trials. Our rationale was that we would have more insight into how the model fits depended on the participants’ actual behavior as compared with the ideal learner model.

      (2) I recommend providing a plot of the linear mixed model analysis of the pupil data. Currently, results are only presented in the text and tables, but a figure would be much more useful.

      We thank the reviewer for the suggestion to add a plot of the linear mixed model results. We appreciate the value of visualizing model estimates; however, we feel that the current presentation in the text and tables clearly conveys the relevant findings. For this reason, and to avoid further lengthening the manuscript, we prefer to retain the current format.

      (3) I would consider only presenting the linear mixed ePects for the pupil data in the main results, and the correlation results in the supplement. It is currently quite long.

      We thank the reviewer for this recommendation. We agree that the results section is detailed; however, we consider the correlation analyses to be integral to the interpretation of the pupil data and therefore prefer to keep them in the main text rather than move them to the supplement.


      The following is the authors’ response to the original reviews

      eLife Assessment

      This important study seeks to examine the relationship between pupil size and information gain, showing opposite effects dependent upon whether the average uncertainty increases or decreases across trials. Given the broad implications for learning and perception, the findings will be of broad interest to researchers in cognitive neuroscience, decision-making, and computational modelling. Nevertheless, the evidence in support of the particular conclusion is at present incomplete - the conclusions would be strengthened if the authors could both clarify the differences between model-updating and prediction error in their account and clarify the patterns in the data.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates whether pupil dilation reflects prediction error signals during associative learning, defined formally by Kullback-Leibler (KL) divergence, an information-theoretic measure of information gain. Two independent tasks with different entropy dynamics (decreasing and increasing uncertainty) were analyzed: the cue-target 2AFC task and the lettercolor 2AFC task. Results revealed that pupil responses scaled with KL divergence shortly after feedback onset, but the direction of this relationship depended on whether uncertainty (entropy) increased or decreased across trials. Furthermore, signed prediction errors (interaction between frequency and accuracy) emerged at different time windows across tasks, suggesting taskspecific temporal components of model updating. Overall, the findings highlight that pupil dilation reflects information-theoretic processes in a complex, context-dependent manner.

      Strengths:

      This study provides a novel and convincing contribution by linking pupil dilation to informationtheoretic measures, such as KL divergence, supporting Zénon's hypothesis that pupil responses reflect information gained during learning. The robust methodology, including two independent datasets with distinct entropy dynamics, enhances the reliability and generalisability of the findings. By carefully analysing early and late time windows, the authors capture the temporal dynamics of prediction error signals, offering new insights into the timing of model updates. The use of an ideal learner model to quantify prediction errors, surprise, and entropy provides a principled framework for understanding the computational processes underlying pupil responses. Furthermore, the study highlights the critical role of task context - specifically increasing versus decreasing entropy - in shaping the directionality and magnitude of these effects, revealing the adaptability of predictive processing mechanisms.

      Weaknesses:

      While this study offers important insights, several limitations remain. The two tasks differ significantly in design (e.g., sensory modality and learning type), complicating direct comparisons and limiting the interpretation of differences in pupil dynamics. Importantly, the apparent context-dependent reversal between pupil constriction and dilation in response to feedback raises concerns about how these opposing effects might confound the observed correlations with KL divergence. 

      We agree with the reviewer’s concerns and acknowledge that the speculation concerning the directional effect of entropy across trials can not be fully substantiated by the current study. As the reviewer points out, the directional relationship between pupil dilation and information gain must be due to other factors, for instance, the sensory modality, learning type, or the reversal between pupil constriction and dilation across the two tasks. Also, we would like to note that ongoing experiments in our lab already contradict our original speculation. In line with the reviewer’s point, we noted these differences in the section on “Limitations and future research” in the Discussion. To better align the manuscript with the above mentioned points, we have made several changes in the Abstract, Introduction and Discussion summarized below: 

      We have removed the following text from the Abstract and Introduction: “…, specifically related to increasing or decreasing average uncertainty (entropy) across trials.”

      We have edited the following text in the Introduction (changes in italics) (p. 5):

      “We analyzed two independent datasets featuring distinct associative learning paradigms, one characterized by increasing entropy and the other by decreasing entropy as the tasks progressed. By examining these different tasks, we aimed to identify commonalities (if any) in the results across varying contexts. Additionally, the contrasting directions of entropy in the two tasks enabled us to disentangle the correlation between stimulus-pair frequency and information gain in the postfeedback pupil response.

      We have removed the following text from the Discussion:

      “…and information gain in fact seems to be driven by increased uncertainty.”

      “We speculate that this difference in the direction of scaling between information gain and the pupil response may depend on whether entropy was increasing or decreasing across trials.” 

      “…which could explain the opposite direction of the relationship between pupil dilation and information gain”

      “… and seems to relate to the direction of the entropy as learning progresses (i.e., either increasing or decreasing average uncertainty).” 

      We have edited the following texts in the Discussion (changes in italics):

      “For the first time, we show that the direction of the relationship between postfeedback pupil dilation and information gain (defined as KL divergence) was context dependent.” (p. 29):

      Finally, we have added the following correction to the Discussion (p. 30):

      “Although it is tempting to speculate that the direction of the relationship between pupil dilation and information gain may be due to either increasing or decreasing entropy as the task progressed, we must refrain from this conclusion. We note that the two tasks differ substantially in terms of design with other confounding variables and therefore cannot be directly compared to one another. We expand on these limitations in the section below (see Limitations and future research).”

      Finally, subjective factors such as participants' confidence and internal belief states were not measured, despite their potential influence on prediction errors and pupil responses.

      Thank you for the thoughtful comment. We agree with the reviewer that subjective factors, such as participants' confidence, can be important in understanding prediction errors and pupil responses. As per the reviewer’s point, we have included the following limitation in the Discussion (p. 33): 

      “Finally, while we acknowledge the potential relevance of subjective factors, such as the participants’ overt confidence reports, in understanding prediction errors and pupil responses, the current study focused on the more objective, model-driven measure of information-theoretic variables. This approach aligns with our use of the ideal learner model, which estimates information-theoretic variables while being agnostic about the observer's subjective experience itself. Future research is needed to explore the relationship between information-gain signals in pupil dilation and the observer’s reported experience of or awareness about confidence in their decisions.” 

      Reviewer #2 (Public review):

      Summary:

      The authors proposed that variability in post-feedback pupillary responses during the associative learning tasks can be explained by information gain, which is measured as KL divergence. They analysed pupil responses in a later time window (2.5s-3s after feedback onset) and correlated them with information-theory-based estimates from an ideal learner model (i.e., information gain-KL divergence, surprise-subjective probability, and entropy-average uncertainty) in two different associative decision-making tasks.

      Strength:

      The exploration of task-evoked pupil dynamics beyond the immediate response/feedback period and then associating them with model estimates was interesting and inspiring. This offered a new perspective on the relationship between pupil dilation and information processing.

      Weakness:

      However, disentangling these later effects from noise needs caution. Noise in pupillometry can arise from variations in stimuli and task engagement, as well as artefacts from earlier pupil dynamics. The increasing variance in the time series of pupillary responses (e.g., as shown in Figure 2D) highlights this concern.

      It's also unclear what this complicated association between information gain and pupil dynamics actually means. The complexity of the two different tasks reported made the interpretation more difficult in the present manuscript.

      We share the reviewer’s concerns. To make this point come across more clearly, we have added the following text to the Introduction (p. 5):

      “The current study was motivated by Zenon’s hypothesis concerning the relationship between pupil dilation and information gain, particularly in light of the varying sources of signal and noise introduced by task context and pupil dynamics. By demonstrating how task context can influence which signals are reflected in pupil dilation, and highlighting the importance of considering their temporal dynamics, we aim to promote a more nuanced and model-driven approach to cognitive research using pupillometry.”

      Reviewer #3 (Public review):

      Summary:

      This study examines prediction errors, information gain (Kullback-Leibler [KL] divergence), and uncertainty (entropy) from an information-theory perspective using two experimental tasks and pupillometry. The authors aim to test a theoretical proposal by Zénon (2019) that the pupil response reflects information gain (KL divergence). In particular, the study defines the prediction error in terms of KL divergence and speculates that changes in pupil size associated with KL divergence depend on entropy. Moreover, the authors examine the temporal characteristics of pupil correlates of prediction errors, which differed considerably across previous studies that employed different experimental paradigms. In my opinion, the study does not achieve these aims due to several methodological and theoretical issues.

      Strengths:

      (1)  Use of an established Bayesian model to compute KL divergence and entropy.

      (2)  Pupillometry data preprocessing, including deconvolution.

      Weaknesses:

      (1) Definition of the prediction error in terms of KL divergence:

      I'm concerned about the authors' theoretical assumption that the prediction error is defined in terms of KL divergence. The authors primarily refer to a review article by Zénon (2019): "Eye pupil signals information gain". It is my understanding that Zénon argues that KL divergence quantifies the update of a belief, not the prediction error: "In short, updates of the brain's internal model, quantified formally as the Kullback-Leibler (KL) divergence between prior and posterior beliefs, would be the common denominator to all these instances of pupillary dilation to cognition." (Zénon, 2019).

      From my perspective, the update differs from the prediction error. Prediction error refers to the difference between outcome and expectation, while update refers to the difference between the prior and the posterior. The prediction error can drive the update, but the update is typically smaller, for example, because the prediction error is weighted by the learning rate to compute the update. My interpretation of Zénon (2019) is that they explicitly argue that KL divergence defines the update in terms of the described difference between prior and posterior, not the prediction error.

      The authors also cite a few other papers, including Friston (2010), where I also could not find a definition of the prediction error in terms of KL divergence. For example [KL divergence:] "A non-commutative measure of the non-negative difference between two probability distributions." Similarly, Friston (2010) states: Bayesian Surprise - "A measure of salience based on the Kullback-Leibler divergence between the recognition density (which encodes posterior beliefs) and the prior density. It measures the information that can be recognized in the data." Finally, also in O'Reilly (2013), KL divergence is used to define the update of the internal model, not the prediction error.

      The authors seem to mix up this common definition of the model update in terms of KL divergence and their definition of prediction error along the same lines. For example, on page 4: "KL divergence is a measure of the difference between two probability distributions. In the context of predictive processing, KL divergence can be used to quantify the mismatch between the probability distributions corresponding to the brain's expectations about incoming sensory input and the actual sensory input received, in other words, the prediction error (Friston, 2010; Spratling, 2017)."

      Similarly (page 23): "In the current study, we investigated whether the pupil's response to decision outcome (i.e., feedback) in the context of associative learning reflects a prediction error as defined by KL divergence."

      This is problematic because the results might actually have limited implications for the authors' main perspective (i.e., that the pupil encodes prediction errors) and could be better interpreted in terms of model updating. In my opinion, there are two potential ways to deal with this issue:

      (a) Cite work that unambiguously supports the perspective that it is reasonable to define the prediction error in terms of KL divergence and that this has a link to pupillometry. In this case, it would be necessary to clearly explain the definition of the prediction error in terms of KL divergence and dissociate it from the definition in terms of model updating.

      (b) If there is no prior work supporting the authors' current perspective on the prediction error, it might be necessary to revise the entire paper substantially and focus on the definition in terms of model updating.

      We thank the reviewer for pointy out these inconsistencies in the manuscript and appreciate their suggestions for improvement. We take approach (a) recommended by the reviewer, and provide our reasoning as to why prediction error signals in pupil dilation are expected to correlate with information gain (defined as the KL divergence between posterior and prior belief distributions). This can be found in a new section in the introduction, copied here for convenience (p. 3-4):

      “We reasoned that the link between prediction error signals and information gain in pupil dilation is through precision-weighting. Precision refers to the amount of uncertainty (inverse variance) of both the prior belief and sensory input in the prediction error signals [6,64–67]. More precise prediction errors receive more weighting, and therefore, have greater influence on model updating processes. The precisionweighting of prediction error signals may provide a mechanism for distinguishing between known and unknown sources of uncertainty, related to the inherent stochastic nature of a signal versus insufficient information of the part of the observer, respectively [65,67,68]. In Bayesian frameworks, information gain is fundamentally linked to prediction error, modulated by precision [65,66,69–75]. In non-hierarchical Bayesian models, information gain can be derived as a function of prediction errors and the precision of the prior and likelihood distributions, a relationship that can be approximately linear [70]. In hierarchical Bayesian inference, the update in beliefs (posterior mean changes) at each level is proportional to the precision-weighted prediction error; this update encodes the information gained from new observations [65,66,69,71,72]. Neuromodulatory arousal systems are well-situated to act as precision-weighting mechanisms in line with predictive processing frameworks [76,77]. Empirical evidence suggests that neuromodulatory systems broadcast precisionweighted prediction errors to cortical regions [11,59,66,78]. Therefore, the hypothesis that feedback-locked pupil dilation reflects a prediction error signal is similarly in line with Zenon’s main claim that pupil dilation generally reflects information gain, through precision-weighting of the prediction error. We expected a prediction error signal in pupil dilation to be proportional to the information gain.”

      We have referenced previous work that has linked prediction error and information gain directly (p. 4): “The KL divergence between posterior and prior belief distributions has been previously considered to be a proxy of (precision-weighted) prediction errors [68,72].”

      We have taken the following steps to remedy this error of equating “prediction error” directly with the information gain.

      First, we have replaced “KL divergence” with “information gain” whenever possible throughout the manuscript for greater clarity. 

      Second, we have edited the section in the introduction defining information gain substantially (p. 4): 

      “Information gain can be operationalized within information theory as the KullbackLeibler (KL) divergence between the posterior and prior belief distributions of a Bayesian observer, representing a formalized quantity that is used to update internal models [29,79,80]. Itti and Baldi (2005)81 termed the KL divergence between posterior and prior belief distributions as “Bayesian surprise” and showed a link to the allocation of attention. The KL divergence between posterior and prior belief distributions has been previously considered to be a proxy of (precision-weighted) prediction errors[68,72]. According to Zénon’s hypothesis, if pupil dilation reflects information gain during the observation of an outcome event, such as feedback on decision accuracy, then pupil size will be expected to increase in proportion to how much novel sensory evidence is used to update current beliefs [29,63]. ” 

      Finally, we have made several minor textual edits to the Abstract and main text wherever possible to further clarify the proposed relationship between prediction errors and information gain.

      (2) Operationalization of prediction errors based on frequency, accuracy, and their interaction:

      The authors also rely on a more model-agnostic definition of the prediction error in terms of stimulus frequency ("unsigned prediction error"), accuracy, and their interaction ("signed prediction error"). While I see the point here, I would argue that this approach offers a simple approximation to the prediction error, but it is possible that factors like difficulty and effort can influence the pupil signal at the same time, which the current approach does not take into account. I recommend computing prediction errors (defined in terms of the difference between outcome and expectation) based on a simple reinforcement-learning model and analyzing the data using a pupillometry regression model in which nuisance regressors are controlled, and results are corrected for multiple comparisons.

      We agree with the reviewer’s suggestion that alternatively modeling the data in a reinforcement learning paradigm would be fruitful. We adopted the ideal learner model as we were primarily focused on Information Theory, stemming from our aim to test Zenon’s hypothesis that information gain drives pupil dilation. However, we agree with the reviewer that it is worthwhile to pursue different modeling approaches in future work. We have now included a complementary linear mixed model analysis in which we controlled for the effects of the information-theoretic variables on one another, while also including the nuisance regressors of pre-feedback baseline pupil dilation and reaction times (explained in more detail below in our response to your point #4). Results including correction for multiple comparisons was reported for all pupil time course data as detailed in Methods section 2.5. 

      (3) The link between model-based (KL divergence) and model-agnostic (frequency- and accuracy-based) prediction errors:

      I was expecting a validation analysis showing that KL divergence and model-agnostic prediction errors are correlated (in the behavioral data). This would be useful to validate the theoretical assumptions empirically.

      The model limitations and the operalization of prediction error in terms of post-feedback processing do not seem to allow for a comparison of information gain and model-agnostic prediction errors in the behavioral data for the following reasons. First, the simple ideal learner model used here is not a generative model, and therefore, cannot replicate or simulate the participants responses (see also our response to your point #6 “model validation” below). Second, the behavioral dependent variables obtained are accuracy and reaction times, which both occur before feedback presentation. While accuracy and reaction times can serve as a marker of the participant’s (statistical) confidence/uncertainty following the decision interval, these behavioral measures cannot provide access to post-feedback information processing. The pupil dilation is of interest to us because the peripheral arousal system is able to provide a marker of post-feedback processing. Through the analysis presented in Figure 3, we indeed aimed to make the comparison of the model-based information gain to the model-agnostic prediction errors via the proxy variable of post-feedback pupil dilation instead of behavioral variables. To bridge the gap between the “behaviorally agnostic” model parameters and the actual performance of the participants, we examined the relationship between the model-based information gain and the post-feedback pupil dilation separately for error and correct trials as shown in Figure 3D-F & Figure 3J-L. We hope this addresses the reviewers concern and apologize in case we did not understand the reviewers suggestion here.

      (4) Model-based analyses of pupil data:

      I'm concerned about the authors' model-based analyses of the pupil data. The current approach is to simply compute a correlation for each model term separately (i.e., KL divergence, surprise, entropy). While the authors do show low correlations between these terms, single correlational analyses do not allow them to control for additional variables like outcome valence, prediction error (defined in terms of the difference between outcome and expectation), and additional nuisance variables like reaction time, as well as x and y coordinates of gaze.

      Moreover, including entropy and KL divergence in the same regression model could, at least within each task, provide some insights into whether the pupil response to KL divergence depends on entropy. This could be achieved by including an interaction term between KL divergence and entropy in the model.

      In line with the reviewer’s suggestions, we have included a complementary linear mixed model analysis in which we controlled for the effects of the information-theoretic variables on one another, while also including the nuisance regressors of pre-feedback baseline pupil dilation and reaction times. We compared the performance of two models on the post-feedback pupil dilation in each time window of interest: Modle 1 had no interaction between information gain and entropy and Model 2 included an interaction term as suggested. We did not include the x- and y- coordinates of gaze in the mixed linear model analysis, as there are multiple values of these coordinates per trial. Furthermore, regressing out the x and y- coordinates of gaze can potentially remove signal of interest in the pupil dilation data in addition to the gaze-related confounds and we did not measure absolute pupil size (Mathôt, Melmi & Castet, 2015; Hayes & Petrov, 2015). We present more sanity checks on the pre-processing pipeline as recommended by Reviewer 1.  

      This new analysis resulted in several additions to the Methods (see Section 2.5) and Results. In sum, we found that including an interaction term for information gain and entropy did not lead to better model fits, but sometimes lead to significantly worse fits. Overall, the results of the linear mixed model corroborated the “simple” correlation analysis across the pupil time course while accounting for the relationship to the pre-feedback baseline pupil and preceeding reaction time differences. There was only one difference to note between the correlation and linear mixed modeling analyses: for the error trials in the cue-target 2AFC task, including entropy in the model accounted for the variance previously explained by surprise.

      (5) Major differences between experimental tasks:

      More generally, I'm not convinced that the authors' conclusion that the pupil response to KL divergence depends on entropy is sufficiently supported by the current design. The two tasks differ on different levels (stimuli, contingencies, when learning takes place), not just in terms of entropy. In my opinion, it would be necessary to rely on a common task with two conditions that differ primarily in terms of entropy while controlling for other potentially confounding factors. I'm afraid that seemingly minor task details can dramatically change pupil responses. The positive/negative difference in the correlation with KL divergence that the authors interpret to be driven by entropy may depend on another potentially confounding factor currently not controlled.

      We agree with the reviewer’s concerns and acknowledge that the speculation concerning the directional effect of entropy across trials can not be fully substantiated by the currect study. We note that Review #1 had a similar concern. Our response to Reviewer #1 addresses this concern of Reviewer #3 as well. To better align the manuscript with the above mentioned points, we have made several changes that are detailed in our response to Reviewer #1’s public review (above). 

      (6) Model validation:

      My impression is that the ideal learner model should work well in this case. However, the authors don't directly compare model behavior to participant behavior ("posterior predictive checks") to validate the model. Therefore, it is currently unclear if the model-derived terms like KL divergence and entropy provide reasonable estimates for the participant data.

      Based on our understanding, posterior predictive checks are used to assess the goodness of fit between generated (or simulated) data and observed data. Given that the “simple” ideal learner model employed in the current study is not a generative model, a posterior predictive check would not apply here (Gelman, Carlin, Stern, Dunson, Vehtari, & Rubin (2013). The ideal learner model is unable to simulate or replicate the participants’ responses and behaviors such as accuracy and reaction times; it simply computes the probability of seeing each stimulus type at each trial based on the prior distribution and the exact trial order of the stimuli presented to each participant. The model’s probabilities are computed directly from a Dirichlet distribution of values that represent the number of occurences of each stimulus-pair type for each task. The information-theoretic variables are then directly computed from these probabilities using standard formulas. The exact formulas used in the ideal learner model can be found in section 2.4.

      We have now included a complementary linear mixed model analysis which also provides insight into the amount of explained variance of these information-theoretic predictors on the post-feedback pupil response, while also including the pre-feedback baseline pupil and reaction time differences (see section 3.3, Tables 3 & 4). The R<sup>2</sup> values ranged from 0.16 – 0.50 across all conditions tested.

      (7) Discussion:

      The authors interpret the directional effect of the pupil response w.r.t. KL divergence in terms of differences in entropy. However, I did not find a normative/computational explanation supporting this interpretation. Why should the pupil (or the central arousal system) respond differently to KL divergence depending on differences in entropy?

      The current suggestion (page 24) that might go in this direction is that pupil responses are driven by uncertainty (entropy) rather than learning (quoting O'Reilly et al. (2013)). However, this might be inconsistent with the authors' overarching perspective based on Zénon (2019) stating that pupil responses reflect updating, which seems to imply learning, in my opinion. To go beyond the suggestion that the relationship between KL divergence and pupil size "needs more context" than previously assumed, I would recommend a deeper discussion of the computational underpinnings of the result.

      Since we have removed the original speculative conclusion from the manuscript, we will refrain from discussing the computational underpinnings of a potential mechanism. To note as mentioned above, we have preliminary data from our own lab that contradicts our original hypothesis about the relationship between entropy and information gain on the post-feedback pupil response. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Apart from the points raised in the public review above, I'd like to use the opportunity here to provide a more detailed review of potential issues, questions, and queries I have:

      (1) Constriction vs. Dilation Effects:

      The study observes a context-dependent relationship between KL divergence and pupil responses, where pupil dilation and constriction appear to exhibit opposing effects. However, this phenomenon raises a critical concern: Could the initial pupil constriction to visual stimuli (e.g., in the cue-target task) confound correlations with KL divergence? This potential confound warrants further clarification or control analyses to ensure that the observed effects genuinely reflect prediction error signals and are not merely a result of low-level stimulus-driven responses.

      We agree with the reviewers concern and have added the following information to the limitations section in the Discussion (changes in italics below; p. 32-33).

      “First, the two associative learning paradigms differed in many ways and were not directly comparable. For instance, the shape of the mean pupil response function differed across the two tasks in accordance with a visual or auditory feedback stimulus (compare Supplementary Figure 3A with Supplementary Figure 3D), and it is unclear whether these overall response differences contributed to any differences obtained between task conditions within each task. We are unable to rule out whether so-called “low level” effects such as the initial constriction to visual stimuli in the cue-target 2AFC task as compared with the dilation in response auditory stimuli in letter-color 2AFC task could confound correlations with information gain. Future work should strive to disentangle how the specific aspects of the associative learning paradigms relate to prediction errors in pupil dilation by systematically manipulating design elements within each task.”

      Here, I also was curious about Supplementary Figure 1, showing 'no difference' between the two tones (indicating 'error' or 'correct'). Was this the case for FDR-corrected or uncorrected cluster statistics? Especially since the main results also showed sig. differences only for uncorrected cluster statistics (Figure 2), but were n.s. for FDR corrected. I.e. can we be sure to rule out a confound of the tones here after all?

      As per the reviewer’s suggestion, we verified that there were also no significant clusters after feedback onset before applying the correction for multiple comparisons. We have added this information to Supplemenatary section 1.2 as follows: 

      “Results showed that the auditory tone dilated pupils on average (Supplementary Figure 1C). Crucially, however, the two tones did not differ from one another in either of the time windows of interest (Supplementary Figure 1D; no significant time points after feedback onset were obtained either before or after correcting for multiple comparisons using cluster-based permutation methods; see Section 2.5.” 

      Supplementary Figure 1 is showing effects cluster-corrected for multiple comparisons using cluster-based permutation tests from the MNE software package in Python (see Methods section 2.5). We have clarified that the cluster-correction was based on permutation testing in the figure legend. 

      (2) Participant-Specific Priors:

      The ideal learner models do not account for individualised priors, assuming homogeneous learning behaviour across participants. Could incorporating participant-specific priors better reflect variability in how individuals update their beliefs during associative learning?

      We have clarified in the Methods (see section 2.4) that the ideal learner models did account for participant-specific stimuli including participant-specific priors in the letter-color 2AFC task. We have added the following texts: 

      “We also note that while the ideal learner model for the cue-target 2AFC task used a uniform (flat) prior distribution for all participants, the model parameters were based on the participant-specific cue-target counterbalancing conditions and randomized trial order.” (p. 13)

      “The prior distributions used for the letter-color 2AFC task were estimated from the randomized letter-color pairs and randomized trial order presentation in the preceding odd-ball task; this resulted in participant-specific prior distributions for the ideal learner model of the letter-color 2AFC task. The model parameters were likewise estimated from the (participant-specific) randomized trial order presented in the letter-color 2AFC task.” (p. 13)

      (3) Trial-by-Trial Variability:

      The analysis does not account for random effects or inter-trial variability using mixed-effects models. Including such models could provide a more robust statistical framework and ensure the observed relationships are not influenced by unaccounted participant- or trial-specific factors.

      We have included a complementary linear mixed model analysis in which “subject” was modeled as a random effect on the post-feedback pupil response in each time window of interest and for each task. Across all trials, the results of the linear mixed model corroborated the “simple” correlation analysis across the pupil time course while accounting for the relationship to the prefeedback baseline pupil and preceeding reaction time differences (see section 3.3, Tables 3 & 4).

      (4) Preprocessing/Analysis choices:

      Before anything else, I'd like to highlight the authors' effort in providing public code (and data) in a very readable and detailed format!

      We appreciate the compliment - thank you for taking the time to look at the data and code provided.

      I found the idea of regressing the effect of Blinks/Saccades on the pupil trace intriguing. However, I miss a complete picture here to understand how well this actually worked, especially since it seems to be performed on already interpolated data. My main points here are:

      (4.1) Why is the deconvolution performed on already interpolated data and not on 'raw' data where there are actually peaks of information to fit?

      To our understanding, at least one critical reason for interpolating the data before proceeding with the deconvolution analysis is that the raw data contain many missing values (i.e., NaNs) due to the presence of blinks. Interpolating over the missing data first ensures that there are valid numerical elements in the linear algebra equations. We refer the reviewer to the methods detailed in Knapen et al. (2016) for more details on this pre-processing method. 

      (4.2) What is the model fit (e.g. R-squared)? If this was a poor fit for the regressors in the first place, can we trust the residuals (i.e. clean pupil trace)? Is it possible to plot the same Pupil trace of Figure 1D with a) the 'raw' pupil time-series, b) after interpolation only (both of course also mean-centered for comparison), on top of the residuals after deconvolution (already presented), so we can be sure that this is not driving the effects in a 'bad' way? I'd just like to make sure that this approach did not lead to artefacts in the residuals rather than removing them.

      We thank the reviewer for this suggestion. In the Supplementary Materials, we have included a new figure (Supplementary Figure 2, copied below for convience), which illustrates the same conditions as in Figure 1D and Figure 2D, with 1) the raw data, and 2) the interpolated data before the nuisance regression. Both the raw data and interpolated data have been band-pass filtered as was done in the original pre-processing pipeline and converted to percent signal change. These figures can be compared directly to Figure 1D and Figure 2D, for the two tasks, respectively. 

      Of note is that the raw data seem to be dominated by responses to blinks (and/or saccades). Crucially, the pattern of results remains overall unchaged between the interpolated-only and fully pre-processed version of the data for both tasks. 

      In the Supplementary Materials (see Supplementary section 2), we have added the descriptives of the model fits from the deconvolution method. Model fits (R<sup>2</sup>) for the nuisance regression were generally low: cue-target 2AFC task, M = 0.03, SD = 0.02, range = [0.00, 0.07]; letter-color visual 2AFC, M = 0.08, SD = 0.04, range = [0.02, 0.16].

      Furthermore, a Pearson correlation analysis between the interpolated and fully pre-processed data within the time windows of interest for both task indicated high correspondence: 

      Cue-target 2AFC task

      Early time window: M = 0.99, SD = 0.01, range = [0.955, 1.000]

      Late time window: M = 0.99, SD = 0.01, range = [0.971, 1.000]

      Letter-color visual 2AFC

      Early time window: M = 0.95, SD = 0.04, range = [0.803, 0.998]

      Late time window: M = 0.97, SD = 0.02, range = [0.908, 0.999]

      In hindsight, including the deconvolution (nuisance regression) method may not have changed the pattern of results much. However, the decision to include this deconvolution method was not data-driven; instead, it was based on the literature establishing the importance of removing variance (up to 5 s) of these blinks and saccades from cognitive effects of interest in pupil dilation (Knapen et al., 2016). 

      (4.3) Since this should also lead to predicted time series for the nuisance-regressors, can we see a similar effect (of what is reported for the pupil dilation) based on the blink/saccade traces of a) their predicted time series based on the deconvolution, which could indicate a problem with the interpretation of the pupil dilation effects, and b) the 'raw' blink/saccade events from the eye-tracker? I understand that this is a very exhaustive analysis so I would actually just be interested here in an averaged time-course / blink&saccade frequency of the same time-window in Figure 1D to complement the PD analysis as a sanity check.

      Also included in the Supplementary Figure 2 is the data averaged as in Figure 1D and Figure 2D for the raw data and nuisance-predictor time courses (please refer to the bottom row of the sub-plots). No pattern was observed in either the raw data or the nuisance predictors as was shown in the residual time courses. 

      (4.4) How many samples were removed from the time series due to blinks/saccades in the first place? 150ms for both events in both directions is quite a long bit of time so I wonder how much 'original' information of the pupil was actually left in the time windows of interest that were used for subsequent interpretations.

      We thank the reviewer for bringing this issue to our attention. The size of the interpolation window was based on previous literature, indicating a range of 100-200 ms as acceptable (Urai et al., 2017; Knapen et al., 2016; Winn et al., 2018). The ratio of interpolated-to-original data (across the entire trial) varied greatly between participants and between trials: cue-target 2AFC task, M = 0.262, SD = 0.242, range = [0,1]; letter-color 2AFC task, M = 0.194, SD = 0.199, range = [0,1]. 

      We have now included a conservative analysis in which only trials with more than half (threshold = 60%) of original data are included in the analyses. Crucially, we still observe the same pattern of effects as when all data are considered across both tasks (compare the second to last row in the Supplementary Figure 2 to Figure 1D and Figure 2D).

      (4.5) Was the baseline correction performed on the percentage change unit?

      Yes, the baseline correction was performed on the pupil timeseries after converting to percentsignal change. We have added that information to the Methods (section 2.3).

      (4.6) What metric was used to define events in the derivative as 'peaks'? I assume some sort of threshold? How was this chosen?

      The threshold was chosen in a data-driven manner and was kept consistent across both tasks. The following details have been added to the Methods:

      “The size of the interpolation window preceding nuisance events was based on previous literature [13,39,99]. After interpolation based on data-markers and/or missing values, remaining blinks and saccades were estimated by testing the first derivative of the pupil dilation time series against a threshold rate of change. The threshold for identifying peaks in the temporal derivative is data-driven, partially based on past work[10,14,33]. The output of each participant’s pre-processing pipeline was checked visually. Once an appropriate threshold was established at the group level, it remained the same for all participants (minimum peak height of 10 units).” (p. 8 & 11).

      (5) Multicollinearity Between Variables:

      Lastly, the authors state on page 13: "Furthermore, it is expected that these explanatory variables will be correlated with one another. For this reason, we did not adopt a multiple regression approach to test the relationship between the information-theoretic variables and pupil response in a single model". However, the very purpose of multiple regression is to account for and disentangle the contributions of correlated predictors, no? I might have missed something here.

      We apologize for the ambiguity of our explanation in the Methods section. We originally sought to assess the overall relationship between the post-feedback response and information gain (primarily), but also surprise and entropy. Our reasoning was that these variables are often investigated in isolation across different experiments (i.e., only investigating Shannon surprise), and we would like to know what the pattern of results would look like when comparing a single information-theoretic variable to the pupil response (one-by-one). We assumed that including additional explanatory variables (that we expected to show some degree of collinearity with each other) in a regression model would affect variance attributed to them as compared with the one-on-one relationships observed with the pupil response (Morrissey & Ruxton 2018). We also acknowledge the value of a multiple regression approach on our data. Based on the suggestions by the reviewers we have included a complementary linear mixed model analysis in which we controlled for the effects of the information-theoretic variables on one another, while also including the nuisance regressors of pre-feedback baseline pupil dilation and reaction times.  

      This new analysis resulted in several additions to the Methods (see Section 2.5) and Results (see Tables 3 and 4). Overall, the results of the linear mixed model corroborated the “simple” correlation analysis across the pupil time course while accounting for the relationship to the prefeedback baseline pupil and preceeding reaction time differences. There was only one difference to note between the correlation and linear mixed modeling analyses: for the error trials in the cue-target 2AFC task, including entropy in the model accounted for the variance previously explained by surprise. 

      Reviewer #2 (Recommendations for the authors):

      (1) Given the inherent temporal dependencies in pupil dynamics, characterising later pupil responses as independent of earlier ones in a three-way repeated measures ANOVA may not be appropriate. A more suitable approach might involve incorporating the earlier pupil response as a covariate in the model.

      We thank the reviewer for bringing this issue to our attention. From our understanding, a repeated-measures ANOVA with factor “time window” would be appropriate in the current context for the following reasons. First, autocorrelation (closely tied to sphericity) is generally not considered a problem when only two timepoints are compared from time series data (Field, 2013; Tabachnick & Fidell, 2019). Second, the repeated-measures component of the ANOVA takes the correlated variance between time points into account in the statistical inference. Finally, as a complementary analysis, we present the results testing the interaction between the frequency and accuracy conditions across the full time courses (see Figures 1D and 2D); in these pupil time courses, any difference between the early and late time windows can be judged by the reader visually and qualitatively. 

      (2) Please clarify the correlations between KL divergence, surprise, entropy, and pupil response time series. Specifically, state whether these correlations account for the interrelationships between these information-theoretic measures. Given their strong correlations, partialing out these effects is crucial for accurate interpretation.

      As mentioned above, based on the suggestions by the reviewers we have included a complementary linear mixed model analysis in which we controlled for the effects of the information-theoretic variables on one another, while also including the nuisance regressors of pre-feedback baseline pupil dilation and reaction times.  

      This new analysis resulted in several additions to the Methods (see Section 2.5) and Results (see Tables 3 and 4). Overall, the results of the linear mixed model corroborated the “simple” correlation analysis across the pupil time course while accounting for the relationship to the prefeedback baseline pupil and preceeding reaction time differences. There was only one difference to note between the correlation and linear mixed modeling analyses: for the error trials in the cue-target 2AFC task, including entropy in the model accounted for the variance previously explained by surprise. 

      (3) The effects observed in the late time windows appear weak (e.g., Figure 2E vs. 2F, and the generally low correlation coefficients in Figure 3). Please elaborate on the reliability and potential implications of these findings.

      We have now included a complementary linear mixed model analysis which also provides insight into the amount of explained variance of these information-theoretic predictors on the post-feedback pupil response, while also including the pre-feedback baseline pupil and reaction time differences (see section 3.3, Tables 3 & 4). The R<sup>2</sup> values ranged from 0.16 – 0.50 across all conditions tested. Including the pre-feedback baseline pupil dilation as a predictor in the linear mixed model analysis consistently led to more explained variance in the post-feedback pupil response, as expected.  

      (4) In Figure 3 (C-J), please clarify how the trial-by-trial correlations were computed (averaged across trials or subjects). Also, specify how the standard error of the mean (SEM) was calculated (using the number of participants or trials).

      The trial-by-trial correlations between the pupil signal and model parameters were computed for each participant, then the coefficients were averaged across participants for statistical inference. We have added several clarifications in the text (see section 2.5 and legends of Figure 3 and Supplementary Figure 4).

      We have added “the standard error of the mean across participants” to all figure labels.

      (5) For all time axes (e.g., Figure 2D), please label the ticks at 0, 0.5, 1, 1.5, 2, 2.5, and 3 seconds. Clearly indicate the duration of the feedback on the time axes. This is particularly important for interpreting the pupil dilation responses evoked by auditory feedback.

      We have labeled the x-ticks every 0.5 seconds in all figures and indicated the duration of the auditory feedback in the letter-color decision task and as well as the stimuli presented in the control tasks in the Supplementary Materials. 

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction page 3: "In information theory, information gain quantifies the reduction of uncertainty about a random variable given the knowledge of another variable. In other words, information gain measures how much knowing about one variable improves the prediction or understanding of another variable."

      (2) In my opinion, the description of information gain can be clarified. Currently, it is not very concrete and quite abstract. I would recommend explaining it in the context of belief updating.

      We have removed these unclear statements in the Introduction. We now clearly state the following:

      “Information gain can be operationalized within information theory as the KullbackLeibler (KL) divergence between the posterior and prior belief distributions of a Bayesian observer, representing a formalized quantity that is used to update internal models [29,79,80].” (p. 4)

      (3) Page 4: The inconsistencies across studies are described in extreme detail. I recommend shortening this part and summarizing the inconsistencies instead of listing all of the findings separately.

      As per the reviewer’s recommendation, we have shortened this part of the introduction to summarize the inconsistencies in a more concise manner as follows: 

      “Previous studies have shown different temporal response dynamics of prediction error signals in pupil dilation following feedback on decision outcome: While some studies suggest that the prediction error signals arise around the peak (~1 s) of the canonical impulse response function of the pupil [11,30,41,61,62,90], other studies have shown evidence that prediction error signals (also) arise considerably later with respect to feedback on choice outcome [10,25,32,41,62]. A relatively slower prediction error signal following feedback presentation may suggest deeper cognitive processing, increased cognitive load from sustained attention or ongoing uncertainty, or that the brain is integrating multiple sources of information before updating its internal model. Taken together, the literature on prediction error signals in pupil dilation following feedback on decision outcome does not converge to produce a consistent temporal signature.” (p. 5)

      We would like to note some additional minor corrections to the preprint:

      We have clarified the direction of the effect in Supplementary Figure 3 with the following: 

      “Participants who showed a larger mean difference between the 80% as compared with the 20% frequency conditions in accuracy also showed smaller differences (a larger mean difference in magnitude in the negative direction) in pupil responses between frequency conditions (see Supplementary Figure 4).”

      The y-axis labels in Supplementary Figure 3 were incorrect and have been corrected as the following: “Pupil responses (80-20%)”.

      We corrected typos, formatting and grammatical mistakes when discovered during the revision process. Some minor changes were made to improve clarity. Of course, we include a version of the manuscript with Tracked Changes as instructed for consideration.

    1. eLife Assessment

      This study identifies 53BP1 as an interaction partner of GMCL1 (a likely CUL3 substrate receptor). The study proposes a novel mechanism by which cancer cells evade the mitotic surveillance pathway through GMCL1-mediated degradation of 53BP1, leading to reduced p53 activation and paclitaxel resistance. These data are the most useful aspect of the study, but the data supporting the authors' conclusions as to the clinical relevance of the study are inadequate. The authors have not taken relevant data about the clinical mechanism of taxanes into account.

    2. Reviewer #2 (Public review):

      Summary

      This study investigates the role of GMCL1 in regulating the mitotic surveillance pathway (MSP), a protective mechanism that activates p53 following prolonged mitosis. The authors identify a physical interaction between 53BP1 and GMCL1, but not with GMCL2. They propose that the ubiquitin ligase complex CRL3-GMCL1 targets 53BP1 for degradation during mitosis, thereby preventing the formation of the "mitotic stopwatch" complex (53BP1-USP28-p53) and subsequent p53 activation. The authors show that high GMCL1 expression correlates with resistance to paclitaxel in cancer cell lines that express wild-type p53. Importantly, loss of GMCL1 restores paclitaxel sensitivity in these cells, but not in p53-deficient lines. They propose that GMCL1 overexpression enables cancer cells to bypass MSP-mediated p53 activation, promoting survival despite mitotic stress. Targeting GMCL1 may thus represent a therapeutic strategy to re-sensitize resistant tumors to taxane-based chemotherapy.

      Strengths

      This manuscript presents potentially interesting observations. The major strength of this article is the identification of GMCL1 as 53BP1 interaction partner. The authors identified relevant domains and show that GMCL1 controls 53BP1 stability. The authors further show a potentially interesting link between GMCL1 status and sensitivity to Taxol.

      Weaknesses

      A major limitation of the original manuscript was that the functional relevance of GMCL1 in regulating 53BP1 within an appropriate model system was not clearly demonstrated. In the revised version, the authors attempt to address this point. However, the new experiment is insufficiently controlled, making it difficult to interpret the results. State-of-the-art approaches would typically rely on single-cell tracking to monitor cell fate following release from a moderately prolonged mitosis.

      In contrast, the authors use a population-based assay, but the reported rescue from arrest is minimal. If the assay were functioning robustly, one would expect that nearly all cells depleted of USP28 or 53BP1 should have entered S-phase at a defined time after release. Thus, the very small rescue effect of siTP53BP1 suggests that the current assay is not suitable. It is also likely that release from a 16-hour mitotic arrest induces defects independent of the 53BP1-dependent p53 response.

      Furthermore, the cell-cycle duration of RPE1 cells is less than 20 hours. It is therefore unclear why cells are released for 30 hours before analysis. At this time point, many cells are likely to have progressed into the next cell cycle, making it impossible to draw conclusions regarding the immediate consequences of prolonged mitosis. As a result, the experiment cannot be evaluated due to inadequate controls.

      To strengthen this part of the study, I recommend that the authors first establish an assay that reliably rescues the mitotic-arrest-induced G1 block upon depletion of p53, 53BP1, or USP28. Once this baseline is validated, GMCL1 knockout can then be introduced to quantify its contribution to the response.

      A broader conceptual issue is that the evidence presented does not form a continuous line of reasoning. For example, it is not demonstrated that GMCL1 interacts with or regulates 53BP1 in RPE1 cells-the system in which the limited functional experiments are conducted.

      There are also a number of inconsistencies and issues with data presentation that need to be addressed:

      (1) Figure 2C: p21 levels appear identical between GMCL1 KO and WT rescue. If GMCL1 regulates p53 through 53BP1, p21 should be upregulated in the KO.

      (2) Figure 2A vs. 2C: GMCL1 KO affects chromatin-bound 53BP1 in Figure 2A, yet in Figure 2C it affects 53BP1 levels specifically in G1-phase cells. This discrepancy requires clarification.

      (3) Figure 2C quantification: The three biological repeats show an unusual pattern, with one repeat's data points lying exactly between the other two. It is unclear what the line represents; please clarify.

      (4) Figure nomenclature: Some abbreviations (e.g., FLAG-KI in Fig. 1F, WKE in Fig. 1C-D, ΔMFF in Fig. 1E) are not defined in the figure legends. All abbreviations must be explained.

      (5) Figure 2D: Please indicate how many times the experiment was reproduced. Quantification with statistical testing would strengthen the result. Pull-downs of 53BP1 with calculation of the ubiquitinated/total ratio could also support the conclusion.

      (6) Figures 3A and 3C: The G1 bars share the same color as the error bars, making the graphs difficult to interpret. Please adjust the color scheme.

    3. Reviewer #3 (Public review):

      Summary:

      In this study, Kito et al follow up on previous work that identified Drosophila GCL as a mitotic substrate recognition subunit of a CUL3-RING ubiquitin ligase (CRL3) complex. Here they identified mutants of the human ortholog of GCL, GMCL1, that disrupt the interaction with CUL3 (GMCL1E142K) and that lack the substrate interaction domain (GMCL1 BBO). Immunoprecipitation followed by mass spectrometry identified 9 proteins that interacted with wild type FLAG-GMCL1 but not GMCL1 EK or GMCL1 BBO. These proteins included 53BP1, which plays a well characterized role in double strand break repair but also functions in a USP28-p53-53BP1 "mitotic stopwatch" complex that arrests the cell cycle after a substantially prolonged mitosis. Consistent with the IP-MS results, FLAG-GMCL1 immunoprecipitated 53BP1. Depletion of GMCL1 during mitotic arrest increased protein levels of 53BP1, and this could be rescued by wild type GMCL1 but not the E142K mutant or a R433A mutant that failed to immunoprecipitate 53BP1. Using a publicly available dataset, the authors identified a relatively small subset of cell lines with high levels of GMCL1 mRNA that were resistant to the taxanes paclitaxel, cabazitaxel, and/or docetaxel. This type of analysis is confounded by the fact that paclitaxel and other microtubule poisons accumulate to substantially different levels in various cell lines (PMID: 8105478, PMID: 10198049) so careful follow up experiments are required to validate results. The correlation between increased GMCL1 mRNA and taxane resistance was not observed in lung cancer cell lines. The authors propose this was because nearly half of lung cancers harbor p53 mutations, and lung cancer cell lines with wild type but not mutant p53 showed the correlation between increased GMCL1 mRNA and taxane resistance. However, the other cancer cell types in which they report increased GMCL1 expression correlates with taxane sensitivity also have high rates of p53 mutation. Furthermore, p53 status does not predict taxane response in patients (PMID: 10951339, PMID: 8826941, PMID: 10955790). The authors then depleted GMCL1 and reported that it increased apoptosis in two cell lines with wild type p53 (MCF7 and U2OS) due to activation of the mitotic stopwatch. This is surprising because the mitotic stopwatch paper cited (PMID: 38547292) reported that U2OS cells have an inactive stopwatch. Though it can be partially restored by treatment with an inhibitor of WIP1, the stopwatch was reported to be substantially impaired in U2OS cells, in contrast to what is reported here. Additionally, activation of the stopwatch results in cell cycle arrest rather than apoptosis in most cell types, including MCF7. Beyond this, it has recently been shown that the level of taxanes and other microtubule poisons achieved in patient tumors is too low to induce mitotic arrest (PMID: 24670687, PMID: 34516829, PMID: 37883329). Physiologically relevant concentrations are achieved with approximately 5-10 nM paclitaxel, rather than the 100 nM used here. The findings here demonstrating that GMCL1 mediates chromatin localization of 53BP1 during mitotic arrest are solid and of interest to cell biologists, but it is unlikely that these findings are relevant to paclitaxel response in patients.

      Strengths:

      This study identified 53BP1 as a target of CRL3GMCL1-mediated degradation during mitotic arrest. AlphaFold3 predictions of the binding interface followed by mutational analysis identified mutants of each protein (GMCL1 R433A and 53BP1 IEDI1422-1425AAAA) that disrupted their interaction. Knock-in of a FLAG tag into the C-terminus of GMCL1 in HCT116 cells followed by FLAG immunoprecipitation confirmed that endogenous GMCL1 interacts with endogenous CUL3 and 53BP1 during mitotic arrest.

      Weaknesses:

      The clinical relevance of the study is overinterpreted. The authors have not taken relevant data about the clinical mechanism of taxanes into account. Supraphysiologic doses of microtubule poisons cause mitotic arrest and can activate the mitotic stopwatch. However, in physiologic concentrations of clinically useful microtubule poisons, cells proceed though mitosis and divide their chromosomes on mitotic spindles that are at least transiently multipolar. Though these low concentrations may result in a brief mitotic delay, it is substantially shorter than the arrest caused by high concentrations of microtubule poisons, and the one mimicked here by 16 hours of 0.4 mg/mL nocodazole or 48 hours of 100 nM paclitaxel. Resistance to mitotic arrest occurs through different mechanisms than resistance to multipolar spindles, raising concerns about the relevance of prolonged mitosis to paclitaxel response in cancer. Nocodazole is a microtubule poison that is not used clinically and does not induce multipolar spindles, so a similar apoptotic response to both drugs increases concern about a lack of physiological relevance. Moreover, clinical response to paclitaxel does not correlate with p53 status (PMID: 10951339, PMID: 8826941, PMID: 10955790). No evidence is presented that GMCL1 affects cellular response to clinically relevant doses of paclitaxel.

      Comments on revisions:

      (1) The claim that GMCL1 modulates paclitaxel sensitivity in cancer should be toned down. Inaccurate statements based on an outdated understanding of the anti-cancer mechanism of paclitaxel should be removed (eg lines 42-44: "In cancers that are resistant to paclitaxel, a microtubule-targeting agent, cells bypass mitotic surveillance activation, allowing unchecked proliferation...", lines 73-75: "Proper mitotic arrest is critical for the efficacy of microtubule-targeting therapies...", lines 78-79: "This resistance is frequently associated with loss of MSP activity, for example due to defective p53 signaling". As cited in the public review, p53 status does not correlate with paclitaxel response in cancer.)

      (2) Perform timelapse experiments +/- GMCL1 siRNA in the absence of drug and in the presence of low, physiologically relevant concentrations of paclitaxel (5-10 nM), as well as supraphysiologic concentrations (100 nM) and correlate mitotic duration with cell cycle arrest. Test if co-depletion of 53BP1 with GMCL1 rescues cell cycle arrest after a substantially prolonged mitosis. Perform these experiments in a cell line with an intact mitotic stopwatch.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1(Public review):

      In this manuscript, Pagano and colleagues test the idea that the protein GMCL1 functions as a substrate receptor for a Cullin RING 3 E3 ubiquitin ligase (CUL3) complex. Using a pulldown approach, they identify GMCL1 binding proteins, including the DNA damage scaffolding protein 53BP1. They then focus on the idea that GMCL1 recruits 53BP1 for CUL3-dependent ubiquitination, triggering subsequent proteasomal degradation of ubiquitinated 53BP1.

      In addition to its DNA damage signalling function, in mitosis, 53BP1 is reported to form a stopwatch complex with the deubiquitinating enzyme USP28 and the transcription factor p53 (PMID: 38547292). These 53BP1-stopwatch complexes generated in mitosis are inherited by G1 daughter cells and help promote p53-dependent cell cycle arrest independent from DNA damage (PMID: 38547292). Several studies show that knockout of 53BP1 overcomes G1 cell cycle arrest after mitotic delays caused by anti-mitotic drugs or centrosome ablation (PMID: 27432897, 27432896). In this model, it is crucial that 53BP1 remains stable in mitosis and more stopwatch complex is formed after delayed mitosis.

      Major concerns:

      Pagano and coworkers suggest that 53BP1 levels can sometimes be suppressed in mitosis if the cells overexpress GMCL1. They carry out a bioinformatic analysis of available public data for p53 wild-type cancer cell lines resistant to the anti-mitotic drug paclitaxel and related compounds. Stratifying GMCL1 into low and high expression groups reveals a weak (p = 0.05 or ns) correlation with sensitivity to taxanes. It is unclear on what basis the authors claim paclitaxel-resistant and p53 wild-type cancer cell lines bypass the mitotic surveillance/timer pathway. They have not tested this. Figure 3 is a correlation assembled from public databases but has no experimental tests. Figure 4 looks at proliferation but not cell cycle progression or the length of mitosis. The main conclusions relating to cell cycle progression and specifically the link to mitotic delays are therefore not supported by experimental data. There is no imaging of the cell cycle or cell fate after mitotic delays, or analysis of where the cells arrest in the cell cycle. Most of the cell lines used have been reported to lack a functional mitotic surveillance pathway in the recent work by Meitinger. To support these conclusions, the stability of endogenous 53BP1 under different conditions in cells known to have a functional mitotic surveillance pathway needs to be examined. A key suggestion in the work is that the level of GMCL1 expression correlates with resistance to taxanes. For the mitotic surveillance pathway, the type of drug (nocodazole, taxol, etc) used to induce a delay isn't thought to be relevant, only the length of the delay. Do GMCL1-overexpressing cells show resistance to anti-mitotics in general?

      We thank the reviewer for this insightful comment. We propose that GMCL1 promotes CUL3-dependent ubiquitination of 53BP1 during prolonged mitotic arrest, thereby facilitating its proteasome-dependent degradation. To evaluate the potential clinical relevance of this mechanism, we stratified cancer cell lines based on GMCL1 mRNA expression using publicly available datasets from DepMap (PMID: 39468210). We observed correlations between GMCL1 expression levels and taxane sensitivity that appear to reflect specific cancer type-drug combinations. To experimentally evaluate this correlation and obtain mechanistic insights, we performed knockdown experiments in hTERT-RPE1 cells, which are known to possess an intact mitotic surveillance pathway. Silencing of GMCL1 alone inhibited cell proliferation and induced apoptosis, while co-depletion of either TP53BP1 or USP28 significantly rescued these effects. These results suggest that GMCL1 modulates the stability of 53BP1 and therefore the availability of the 53BP1-USP28-p53 ternary complex in cells with a functional mitotic surveillance pathway (MSP) (new Figure 5I,J) directly linking GMCL1 to the regulation of the MSP complex. Moreover, to further support our mechanism, we assessed the effect of GMCL1 levels on cell cycle progression. Briefly, following nocodazole synchronization and release, we treated cells with EdU and performed FACS analyses at different times. Knockdown of GMCL1 alone led to a delayed cell cycle progression, but co-depletion of either TP53BP1 or USP28 restored this phenotype (new Figure 3A and new Supplementary Figure 3A-C). These results are consistent with our proliferation data and suggest that the observed effects of GMCL1 are specific to mitotic exit. Finally, overexpression of GMCL1 accelerates cell cycle progression (as assessed by FACS analyses) upon release from prolonged mitotic arrest (new Figure 3B and new Supplementary Figure 3D-E). 

      Importantly, if GMCL1 specifically degrades 53BP1 during prolonged mitotic arrests, the authors should show what happens during normal cell divisions without any delays or drug treatments. How much 53BP1 is destroyed in mitosis under those conditions? Does 53BP1 destruction depend on the length of mitosis, drug treatment, or does 53BP1 get degraded every mitosis regardless of length? Testing the contribution of key mitotic E3 ligase activities on mitotic 53BP1 stability, such as the anaphase-promoting complex/cyclosome (APC/C) is important in this regard. One previous study reported an analysis of putative APC/C KEN-box degron motifs in 53BP1 and concluded these play a role in 53BP1 stability in anaphase (PMID: 28228263).

      Physiological mitosis under unperturbed conditions is typically brief (approximately 30 minutes), making protein quantification during this window challenging. Despite this, we tried by synchronizing cells using RO-3306 and releasing them into drug-free medium to assess GMCL1 dynamics during normal mitosis. Under these conditions, GMCL1 expression was similar to that in asynchronous cells and higher than the levels upon extended mitosis. However, when we attempted to measure the half-life of proteins using cycloheximide, most cells died, likely due to the toxic effect of cycloheximide in cells subjected to co-treatment with RO-3306 or nocodazole. This is the same reasons why in Figure 2C, we assessed 53BP1 in daughter cells rather than mitotic cells. 

      There is no direct test of the proposed mechanism, and it is therefore unclear if 53BP1 is ubiquitinated by a GMCL1-CUL3 ligase in cells, and how efficient this process would be at different cell cycle stages. A key issue is the lack of experimental data explaining why the proposed mechanism would be restricted to mitosis. Indirect effects, such as loss of 53BP1 from the chromatin fraction during M phase upon GMCL1 overexpression, do not necessarily mean that 53BP1 is degraded. PLK1-dependent chromatin-cytoplasmic shuttling of 53BP1 during mitotic delays has been described previously (PMID: 38547292, 37888778). These papers are cited in the text, but the main conclusions of those papers on 53BP1 incorporation into a stopwatch complex during mitotic delays have been ignored. Are the authors sure that 53BP1 is destroyed in mitosis and not simply re-localised between chromatin and non-chromatin fractions? At the very least, these reported findings should be discussed in the text.

      To examine whether GMCL1 promotes 53BP1 ubiquitination in cells, we expressed in cells Trypsin-Resistant Tandem Ubiquitin-Binding Entity (TR-TUBE), a protein that binds polyubiquitin chains. Abundant, endogenous ubiquitinated 53BP1 co-precipitated with TR-TUBE constructs only when wild-type GMCL1 but not the E142K GMCL1 mutant, was expressed (new Figure 2D).  The PLK1-dependent incorporation of 53BP1 into the stopwatch complex and the chromatin-cytoplasmic shuttling of 53BP1 during mitotic delays is now discussed in the text. That said, compared to parental cells, 53BP1 levels in the chromatin fraction are high in two different GMCL1 KO clones in M phase arrested cells (Figure 2A-B).  This increase does not correspond to a decrease in the 53BP1 soluble fraction (Figure 2A and new Supplementary Figure 2D), suggesting decreased 53BP1 is not due to re-localization. The increased half-life of 53BP1 in daughter cells (Figure 2C), also supports this hypothesis. 

      The authors use a variety of cancer cell line models throughout their study, most of which have been reported to lack a functional mitotic surveillance pathway. U2OS and HCT116 cells do not respond normally to mitotic delays, despite being annotated as p53 WT. Other studies have used p53 wild-type hTERT RPE-1 cells to study the mitotic surveillance pathway. If the model is correct, then over-expressing GMCL1 in hTERT-RPE1 cells should suppress cell cycle arrest after mitotic delays, and GMCL1 KO should make the cells more sensitive to delays. These experiments are needed to provide an adequate test of the proposed model.

      We greatly appreciate the reviewer’s suggestion regarding overexpression of GMCL1 in hTERT-RPE1 cells. To address this, we generated stable RPE1 cells expressing V5-tagged GMCL1 and conducted EdU incorporation assays following nocodazole synchronization and release. Overexpression of GMCL1 enhanced cell cycle progression compared to control cells (new Figure 3B and new Supplementary Figure 3D-E) after mitotic arrest, consistent with our model. We, therefore, propose that GMCL1 controls 53BP1 stability to suppress p53-dependent cell cycle arrest.

      We also want to point out that while some papers suggest that HCT116 and U2OS cells do not have an intact mitotic surveillance pathway, others have shown that the MSP is indeed functioning in HCT116 cells and can be triggered with variable efficiency in U2OS cells (PMID: 38547292). This is likely due to high heterogeneity and extensive clonal diversity of cancer cell lines grown in different labs. Please see examples in PMIDs: 3620713, 30089904, and 30778230. In particular, PMID: 30089904 shows that this heterogeneity correlates with considerably different drug responses. 

      To conclude, while the authors propose a potentially interesting model on how GMCL1 overexpression could regulate 53BP1 stability to limit p53-dependent cell cycle arrest, it is unclear what triggers this pathway or when it is relevant. 53BP1 is known to function in DNA damage signalling, and GMCL1 might be relevant in that context. The manuscript contains the initial description of GMCL1-53BP1 interaction but lacks a proper analysis of the function of this interaction and is therefore a preliminary report.

      We hope that the new experiments, along with the clarifications provided in this response letter and revised manuscript, offer the reviewer increased confidence in the robustness and validity of our proposed model.

      Reviewer #2 (Public review):

      This study investigates the role of GMCL1 in regulating the mitotic surveillance pathway (MSP), a protective mechanism that activates p53 following prolonged mitosis. The authors identify a physical interaction between 53BP1 and GMCL1, but not with GMCL2. They propose that the ubiquitin ligase complex CRL3-GMCL1 targets 53BP1 for degradation during mitosis, thereby preventing the formation of the "mitotic stopwatch" complex (53BP1-USP28-p53) and subsequent p53 activation. The authors show that high GMCL1 expression correlates with resistance to paclitaxel in cancer cell lines that express wild-type p53. Importantly, loss of GMCL1 restores paclitaxel sensitivity in these cells, but not in p53-deficient lines. They propose that GMCL1 overexpression enables cancer cells to bypass MSP-mediated p53 activation, promoting survival despite mitotic stress. Targeting GMCL1 may thus represent a therapeutic strategy to re-sensitize resistant tumors to taxane-based chemotherapy.

      Strengths:

      This manuscript presents potentially interesting observations. The major strength of this article is the identification of GMCL1 as a 53BP1 interaction partner. The authors identified relevant domains and showed that GMCL1 controls 53BP1 stability. The authors further show a potentially interesting link between GMCL1 status and sensitivity to Taxol.

      Weaknesses:

      However, the manuscript is significantly weakened by unsubstantiated mechanistic claims, overreliance on a non-functional model system (U2OS), and overinterpretation of correlative data. To support the conclusions of the manuscript, the authors must show that the GMCL1-dependent sensitivity to Taxol depends on the mitotic surveillance pathway.

      To demonstrate that GMCL1-dependent taxane sensitivity is mediated through the mitotic surveillance pathway (MSP), we now performed experiments using hTERT-RPE1 (RPE1) cells, a widely used, non-transformed cell line known to possess a functional MSP.  We compared RPE1 cells with knockdown of GMCL1 alone to those with simultaneous knockdown of GMCL1 and either TP53BP1 or USP28. Upon paclitaxel (Taxol) treatment, cells with GMCL1 knockdown exhibited suppressed proliferation and increased apoptosis. Notably, these phenotypes were rescued by co-depletion of TP53BP1 or USP28 (new Figure 5I,J). These results support the notion that GMCL1 contributes to MSP activity, at least in part, through its regulation of 53BP1.       

      To further strengthen our mechanistic experiments, we assessed the effect of GMCL1 levels on cell cycle progression. Following nocodazole synchronization and release, we treated cells with EdU and performed FACS analyses at different times. Knockdown of GMCL1 alone led to a delay in cell cycle progression, but co-depletion of either TP53BP1 or USP28 alleviate this phenotype (new Figure 3A and new Supplementary Figure 3A, B). These results are consistent with our proliferation data.

      Reviewer #3 (Public review):

      Summary:

      In this study, Kito et al follow up on previous work that identified Drosophila GCL as a mitotic substrate recognition subunit of a CUL3-RING ubiquitin ligase (CRL3) complex.

      Here they characterize mutants of the human ortholog of GCL, GMCL1, that disrupt the interaction with CUL3 (GMCL1E142K) and that lack the substrate interaction domain (GMCL1 BBO). Immunoprecipitation followed by mass spectrometry identified 9 proteins that interacted with wild-type FLAG-GMCL1 and GMCL1 EK but not GMCL1 BBO. These proteins included 53BP1, which plays a well-characterized role in double-strand break repair but also functions in a USP28-p53-53BP1 "mitotic stopwatch" complex that arrests the cell cycle after a substantially prolonged mitosis. Consistent with the IP-MS results, FLAG-GMCL1 immunoprecipitated 53BP1. Depletion of GMCL1 during mitotic arrest increased protein levels of 53BP1, and this could be rescued by wild-type GMCL1 but not the E142K mutant or a R433A mutant that failed to immunoprecipitate 53BP1.

      Using a publicly available dataset, the authors identified a relatively small subset of cell lines with high levels of GMCL1 mRNA that were resistant to the taxanes paclitaxel, cabazitaxel, and docetaxel. This type of analysis is confounded by the fact that paclitaxel and other microtubule poisons accumulate to substantially different levels in various cell lines (DOI: 10.1073/pnas.90.20.9552 , DOI: 10.1091/mbc.10.4.947 ), so careful follow-up experiments are required to validate results. The correlation between increased GMCL1 mRNA and taxane resistance was not observed in lung cancer cell lines. The authors propose this was because nearly half of lung cancers harbor p53 mutations, and lung cancer cell lines with wild-type but not mutant p53 showed the correlation between increased GMCL1 mRNA and taxane resistance. However, the other cancer cell types in which they report increased GMCL1 expression correlates with taxane sensitivity also have high rates of p53 mutation. Furthermore, p53 status does not predict taxane response in patients (DOI: 10.1002/1097-0142(20000815)89:4<769::aid-cncr8>3.0.co;2-6 , DOI: 10.1002/(SICI)1097-0142(19960915)78:6<1203::AID-CNCR6>3.0.CO;2-A , PMID: 10955790).

      The authors then depleted GMCL1 and reported that it increased apoptosis in two cell lines with wild-type p53 (MCF7 and U2OS) due to activation of the mitotic stopwatch. This is surprising because the mitotic stopwatch paper they cite (DOI: 10.1126/science.add9528 ) reported that U2OS cells have an inactive stopwatch and that activation of the stopwatch results in cell cycle arrest rather than apoptosis in most cell types, including MCF7. Beyond this, it has recently been shown that the level of taxanes and other microtubule poisons achieved in patient tumors is too low to induce mitotic arrest (DOI: 10.1126/scitranslmed.3007965 , DOI: 10.1126/scitranslmed.abd4811 , DOI: 10.1371/journal.pbio.3002339 ), raising concerns about the relevance of prolonged mitosis to paclitaxel response in cancer. The findings here demonstrating that GMCL1 mediates degradation of 53BP1 during mitotic arrest are solid and of interest to cell biologists, but it is unclear that these findings are relevant to paclitaxel response in patients.

      Strengths:

      This study identified 53BP1 as a target of CRL3GMCL1-mediated degradation during mitotic arrest. AlphaFold3 predictions of the binding interface, followed by mutational analysis, identified mutants of each protein (GMCL1 R433A and 53BP1 IEDI1422-1425AAAA) that disrupted their interaction. Knock-in of a FLAG tag into the C-terminus of GMCL1 in HCT116 cells, followed by FLAG immunoprecipitation, confirmed that endogenous GMCL1 interacts with endogenous CUL3 and 53BP1 during mitotic arrest.

      Weaknesses:

      The clinical relevance of the study is overinterpreted. The authors have not taken relevant data about the clinical mechanism of taxanes into account. Supraphysiologic doses of microtubule poisons cause mitotic arrest and can activate the mitotic stopwatch. However, in physiologic concentrations of clinically useful microtubule poisons, cells proceed through mitosis and divide their chromosomes on mitotic spindles that are at least transiently multipolar. Though these low concentrations may result in a brief mitotic delay, it is substantially shorter than the arrest caused by high concentrations of microtubule poisons, and the one mimicked here by 16 hours of 0.4 mg/mL nocodazole, which is not used clinically and does not induce multipolar spindles. Resistance to mitotic arrest occurs through different mechanisms than resistance to multipolar spindles. No evidence is presented in the current version of the manuscript that GMCL1 affects cellular response to clinically relevant doses of paclitaxel.

      We agree that it would be an overstatement to claim that GMCL1 and p53 regulates paclitaxel sensitivity in cancer patients in a clinical context. The correlations we observed were based on publicly available cancer cell lines from datasets catalogued in CCLE and DepMap, which do not fully account for clinical heterogeneity and patient-specific factors. In response to this important point, we have revised the text accordingly. 

      In the experiments shown in former Figure 4A-H (now Figure 5A-H) and in those shown in the new Figure 5I-J, we used 100 nM paclitaxel to test the hypothesis that low GMCL1 levels sensitizes cancer cells in a p53-dependent manner. Here, paclitaxel was chosen to mimic the conditions reported in the PRISM dataset (PMID: 32613204), which compiles the proliferation inhibitory activity of 4,518 compounds tested across 578 cancer cell lines. Consistent with our cell cycle findings, the paclitaxel sensitivity caused by GMCL1 depletion was reverted by silencing 53BP1 or USP28 (new Figure 5I-J), again supporting the involvement of the stopwatch complex. We are unsure about how to model the “physiologic concentrations of clinically useful microtubule poisons” in cell-based studies. A recent review notes that “The time above a threshold paclitaxel plasma concentration (0.05 mmol/L) is important for the efficacy and toxicity of the drug” (PMID: 28612269).  Two other reviews mention that the clinically relevant concentration of paclitaxel is considered to be plasma levels between 0.05–0.1 μmol/L (approximately 50–100 nM) and that in clinical dosing, typical patient plasma concentrations after paclitaxel infusion range from 80–280 nM, with corresponding intratumoral concentrations between 1.1–9.0 μM, due to drug accumulation in tumor tissue (PMIDs: 24670687 and  29703818).  We have now emphasized in the revised text the rationale for using 100 nM paclitaxel in our experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      General comments on the Figures:

      (1) Western blots lack molecular weight markers on most panels and are often over-exposed and over-contrasted, rendering them hard to interpret.

      We have now included molecular weight markers in all Western blot panels. We have also reprocessed the images to avoid overexposure and excessive contrast, ensuring that the bands are clearly visible and interpretable.

      (2) Input and IP samples do not show percentage loading, so it is hard to interpret relative enrichments.

      In the revised figures, we have indicated what % of the input was loaded.

      (3) The authors change between cell line models for their experiments, and this is not clear in the figures. These are important details for interpreting the data, as many of the cell lines used are not functional for the mitotic surveillance pathway.

      In the revised manuscript, we have clearly indicated the specific cell lines used in each experiment in the figure legends. Additionally, to address concerns regarding the mitotic surveillance pathway, we have included new experiments using hTERT-RPE1 cells, which have been reported to possess a functional mitotic surveillance pathway (MSP) (Figure 4I-J).

      (4) No n-numbers are provided in the figure legends. Are the Western blots provided done once, or are they reproducible? Many of the blots would benefit from quantification and presentation via graphs to test for reproducible changes to 53BP1 levels under the different conditions.

      As now indicated in the methods section, we have conducted each Western blot no less than three times, yielding results that exhibit a high degree of reproducibility. A representative Western blot has been selected for each figure. We did not include densiometric quantification of immunoblots, given that the semi-quantitative nature of this technique would lead to an overinterpretation of our data; unfortunately, this is a limitation of the technique. In fact, eLife and other similar scientific journals do not adhere to the practice of quantifying Western blots. One exception to this norm is for protein half-life studies, which is done to measure the kinetics of decay rates and their internal comparisons. Accordingly, the experiments in Figure 2C were quantified.

      (5) Graphs displayed in the supplementary figures are blacked out, and individual data points cannot be visualised. All graphs should have individual data points clearly visible.

      We revised the quantified graphs and replaced them with scatter plots to clearly display individual data points, showing sample distribution.

      Additional experiments with specific comments on Figures:

      (1) Figure 1C-D: the relative amount of 53BP1 co-precipitating with FLAG-tagged GMCL1 WT appears very different between the two experiments. If the idea is that MLN4924 (Cullin neddylation inhibitor) makes the interaction easier to capture, then this should be explained in the text, and ideally shown on the same gel/blot -/+ MLN4924.

      We now present the samples treated with and without MLN4924 on the same gel/blot to allow direct comparison (new Figure 1D) and clarified this point in the text.

      (2) Figure 1E: The figure legend states that GMCL1 was immunoprecipitated, but the Figure looks as though FLAG-tagged 53BP1 was the bait protein being immunoprecipitated? Can the authors clarify?

      We thank the reviewer for pointing out the discrepancy between the figure and the figure legend in Figure 1E. The immunoprecipitation was indeed performed using FLAG-tagged 53BP1, and we have now rectified the figure legend accordingly. 

      (3) Figure 1F: Rather than parental cell lysate, the better control would be to IP FLAG from another FLAG-tagged expressing cell line, to rule out non-specific binding with the FLAG tag at the non-overexpressed level. 

      Figure 1F shows interaction at the endogenous level. The specificity of binding with overexpressed proteins is shown in Figures 1C and 1D.

      The USP28 blot is over-exposed and makes it hard to see any changes in electrophoretic mobility - it looks as though there is a change between the parental and the KI cell line? It is surprising that USP28 would co-IP with GMCL1 (presumably because USP28 is bound to 53BP1) if the function of GMCL1-53BP1 interaction is to promote 53BP1 degradation. Can the authors reconcile this? Crucially, if the authors claim that the 53BP1-GMCL1 interaction is specific to prolonged mitosis, then this experiment should be repeated and performed with asynchronous, normal-length mitosis, and prolonged mitosis conditions. This is vital for supporting the claim that this interaction only occurs during prolonged mitoses and does not occur in every mitosis regardless of length.

      This is a good point. Unfortunately, many of the protein-protein interactions occur post lysis. Therefore, we could not observe differences in asynchronous vs. mitotic cells.

      (4) Figure S1F: Label on blot should be CUL3 not CUI3.

      We thank the reviewer for pointing this out and we have corrected the typo.

      (5) Figure 2A: The authors suggest an increase in chromatin-bound 53BP1 in GMCL1 KO U2OS cells, specifically in M phase. Again, is this time in mitosis dependent, or would this be evident in every mitosis, regardless of length? Such an experiment would benefit from repetition and quantification to test whether the observed effect is reproducibly consistent. If the authors' model is correct, simply treating U2OS WT mitotic cells with MG132 during the mitotic arrest and performing the same fractionation should bring 53BP1 levels up to that seen in GMCL1 KO cells under the same conditions.

      The reviewer’s suggestion to assess 53BP1 accumulation in wild-type U2OS cells treated with MG132 during mitotic arrest is indeed highly relevant. However, treatment with MG132 during prolonged mitosis consistently led to significant cell death, making it technically challenging to evaluate 53BP1 levels under these conditions.

      (6) Figure 2B: The authors restore GMCL1 expression in the KO U2OS cells using WT and 2 distinct mutant cDNAs. However, the expression of these constructs is not equivalent, and thus their effects cannot be directly compared. It is also surprising that GMCL1 is much higher in M phase samples in this experiment (shouldn't it be destroyed?), when no such behaviour has been observed in the other figures.

      There is no evidence in our study or others that GMCL1 should be destroyed in M phase.  We show that the R433A mutant is expressed at a level very similar to the WT protein, yet it doesn’t promote the degradation of 53BP1. It is true that the E142K is expressed less in mitotic cells whereas is the most expressed in asynchronous cells. For some reason, this mutant has an inverse behavior compared to the WT, limiting the interpretation of this result. We now mention this in the text. 

      (7) Figure 2C: The CHX experiment would benefit from inclusion of a control protein known to have a short half-life (e.g. c-myc, p53). Is GMCL1 known to have a relatively short half-life? It looks as though GMCL1 disappears after 1 h CHX treatment (although hard to definitively tell in the absence of molecular weight markers). 53BP1 appears to continue declining in the absence of GMCL1, which is surprising if p53BP1 degradation requires GMCL1. How can the authors reconcile this?

      As a control for the CHX chase experiments, we included p21, whose protein levels decreased in a CHX-dependent. GMCL1 itself also appeared to undergo degradation upon CHX treatment, but it doesn’t disappear completely.

      (8) Supplemental Figure 2:

      Transcription is largely inhibited in M phase, so the p53 target gene transcripts present in M phase are inherited from the preceding G2 phase. The qPCR's thus need a reference sample to compare against. I.e., was p21/PUMA/NOXA mRNA already low in G2 in the GMCL1 KO + WT cells before they entered mitosis? Or is the mRNA stability affected during M phase specifically? Is this effect on the mRNA dependent on the time in mitosis?

      It is well established that transcription is not entirely shut down during mitosis, particularly for a subset of genes involved in cell cycle regulation. For example, p21, PUMA, NOXA, and p53 mRNAs have been shown to remain actively transcribed during mitosis (see Table S5 in PMID: 28912132). However, we currently lack direct evidence that p53 activation during mitosis, specifically through the mitotic surveillance pathway, drives the transcription of p21, PUMA, or NOXA mRNAs during M phase. In the absence of such mechanistic data, we opted to exclude these analyses from the final figures.

      Panel B: blots are too over-exposed to see differences in p53 stability under the different conditions. Mitotic samples should be included to show how these differ from the G1 samples.

      The background of all blot images has been adjusted to ensure clarity and consistency.

      Panel D: The authors show no significant difference in the cell cycle profiles of the GMCL1 KO and reconstituted cells compared to parental U2OS cells. This should also be performed in the G1 daughter cells following a prolonged mitosis, to test the effect of the different GMCL1 constructs on G1 cell cycle arrest. U2OS cells have been reported not to have a functional mitotic surveillance pathway (Meitinger et al, Science, 2024), so U2OS cells are perhaps not a good model for testing this.

      We performed cell cycle profiling using EdU incorporation in hTERT-RPE1 cells, which possess a functional MSP, to evaluate cell cycle progression in daughter cells following prolonged mitosis. We observed that GMCL1 knockdown alone leads to G1-phase arrest. In contrast, co-depletion of GMCL1 with either 53BP1 or USP28 bypasses this arrest, indicating that GMCL1 regulates cell cycle progression in an MSP-dependent manner. Please see also the answer to the public review above. 

      (9) Figure 3:

      The authors show expression data for GMCL1 in the different cancer cell lines. This should be validated for a subset of cancer cell lines at the GMCL1 protein level, and cross-correlated to their MSP/mitotic timer status. Does GMCL1 depletion or knockout in p53 wild-type cancer cell lines overexpressing GMCL1 protein restore mitotic surveillance function?

      We were unable to assess GMCL1 protein levels using publicly available proteomics datasets, as GMCL1 expression was not detected. In p53 wild-type hTERT-RPE1 cells, GMCL1 knockdown impaired the mitotic surveillance pathway, as evidenced by G1-phase arrest following prolonged mitosis (new Figure 3A and new Supplementary Figure 3A, B). This arrest was rescued by co-depletion of either TP53BP1 or USP28, indicating that GMCL1 acts upstream of the MSP.

      (10) Figure 4:

      The authors show siRNA experiments depleting GMCL1 and testing the effects of GMCL1 loss on cell viability and apoptosis induction. This is performed in different cell line backgrounds. However, there is no demonstration that any of the observed effects are due to a lack of GMCL1 activity on 53BP1. These experiments need to be repeated in 53BP1 co-depleted cells to test for rescue. Without this, the interpretation is purely correlative.

      We assessed the effects of GMCL1 knockdown, alone or in combination with TP53BP1 or USP28 knockdown, on cell viability and apoptosis in hTERT-RPE1 cells using siRNA. Knockdown of GMCL1 alone led to a significant reduction in cell viability and an increase in apoptosis. However, co-depletion of GMCL1 with either TP53BP1 or USP28 restored both cell viability and apoptosis levels to those observed in control cells (new Figure 5I,J).

      (11) Text comments:

      Line 257: HeLa cells supress p53 through the E6 viral protein and are not "mutant" for p53.

      The authors should cite early work by Uetake and Sluder describing the effects of spindle poisons on the mitotic surveillance pathway.

      We appreciate the reviewer’s comments – We have now made the necessary corrections.

      Reviewer #2 (Recommendations for the authors):

      Major Points:

      (1) Unsubstantiated Mechanistic Claims:

      In Figures 3 and 4, the authors show correlations between GMCL1 expression and sensitivity to Taxol. However, they fail to demonstrate that the mitotic stopwatch is mechanistically involved. To support this conclusion, the authors must test whether deletion of 53BP1, USP28, or disruption of their interaction rescues Taxol sensitivity in GMCL1-depleted cells. Since 53BP1 also plays a role in DNA damage response, such rescue experiments are necessary to distinguish between mitotic surveillance-specific and broader stress-response effects. Deletion of USP28 would be particularly informative.

      We sought to experimentally determine whether GMCL1 is involved in regulating the mitotic stopwatch. Knockdown of GMCL1 alone resulted in reduced cell proliferation and increased apoptosis. In contrast, co-depletion of GMCL1 with either TP53BP1 or USP28 restored both proliferation and apoptosis levels to those observed in control cells (new Figure 5I, J). To further strengthen our mechanistic experiments, we assessed the effect of GMCL1 levels on cell cycle progression. We conducted EdU incorporation assays following nocodazole synchronization and release. Knockdown of GMCL1 alone led to a delay in G1 progression, whereas co-depletion of either TP53BP1 or USP28 rescued normal cell cycle progression (new Figure 3A and new Supplementary Figure 3A, B). These results are consistent with our proliferation data and suggest that GMCL1 functions upstream of the ternary complex, likely by regulating 53BP1 protein levels.

      (2) Model System Limitations (U2OS Cells):

      The use of U2OS cells is highly problematic for investigating the mitotic surveillance pathway. U2OS cells lack a functional mitotic stopwatch and do not arrest following prolonged mitosis in a 53BP1/USP28-dependent manner (PMID: 38547292). Therefore, conclusions drawn from this model system about the function of the mitotic surveillance pathway are not substantiated. Key experiments should be repeated in a cell line with an intact pathway, such as RPE1.

      We now performed all key experiments also hTERT-RPE1 cells (see above). We also would like to point out that while some papers suggest that HCT116 and U2OS cells do not have an intact mitotic surveillance pathway, others have showed that the MSP is indeed functioning in HCT116 cells and can be triggered with variable efficiency in U2OS cells (PMID: 38547292).  This is likely due to high heterogeneity and extensive clonal diversity of cancer cell lines grown in different labs. Please see examples in PMIDs: 3620713, 30089904, and 30778230. In particular, PMID: 30089904 shows that this heterogeneity correlates with considerably different drug responses. 

      (3) Misinterpretation of p53 Activity Timing:

      The manuscript states that "GMCL1 KO cells led to decreased mRNA levels of p21 and NOXA during mitosis" (line 194). However, it is well established that the mitotic surveillance pathway activates p53 in the G1 phase following prolonged mitosis-not during mitosis itself (PMID: 38547292). Therefore, the observed changes in mRNA levels during mitosis are unlikely to be relevant to this pathway.

      We currently lack direct evidence that p53 activated during mitosis through the mitotic surveillance pathway directly influences the transcription of p21, PUMA, or NOXA mRNAs during M phase. Therefore, we have chosen to exclude these data from the final figures.

      (4) Incorrect Interpretation of 53BP1 Chromatin Binding:

      The authors claim that 53BP1 remains associated with chromatin during mitosis, which contradicts established literature. It is known that 53BP1 is released from chromatin during mitosis via mitosis-specific phosphorylation (PMID: 24703952), and this is supported by more recent findings (PMID: 38547292). A likely explanation for the discrepancy may be contamination of mitotic fractions with interphase cells. The chromatin fraction data in Figure 2C must be interpreted with caution.

      Our method to synchronize in M phase is rather stringent (see Supplementary Figure 3D as an example). The literature indicates that the bulk of 53BP1 is released from chromatin during mitosis. Yet, even in the two publications mentioned by the reviewer, there is a difference in the observable amount of 53BP1 bound to chromatin (compare Figure 2B in PMID: 38547292 and Figure 5A in PMID: 24703952). The difference is likely due to the different biochemical approaches used to purify chromatin bound proteins (salt and detergent concentrations, sonication, etc.). Using our fractionation approach, we can reliably separate the soluble fraction (containing also the nucleoplasmic fraction) and chromatin associated proteins as indicated by the controls such as a-Tubulin and Histon H3.  We have now mentioned these limitations when comparing different fractionation methods in our discussion section.

      (5) Inadequate Citation of Foundational Literature:

      The literature on the mitotic surveillance pathway is relatively limited, and it is essential that the authors provide a comprehensive and accurate account of its development. The foundational work by the Sluder lab (PMID: 20832310), demonstrating a p53-dependent arrest following prolonged mitosis, must be cited. Furthermore, the three key 2016 papers (PMID: 27432896, 27432897, 27432896) that identified the involvement of USP28 and 53BP1 in this pathway are critical and should be cited as the basis of the mitotic surveillance pathway.

      In contrast, the manuscript currently emphasizes publications that either contribute minimally or have been contradicted by prior and subsequent work. For example: PMID: 31699974, which proposes Ser15 phosphorylation of p53 as critical, has been contradicted by multiple groups (e.g., Holland, Oegema, and Tsou labs).

      PMID: 37888778, which suggests that 53BP1 must be released from kinetochores, is inconsistent with findings that indicate kinetochore localization is not relevant.

      The authors should thoroughly revise the Introduction to reflect what this reviewer would describe as a more accurate and scholarly approach to the literature.

      We have substantially revised both the Introduction and Discussion sections to incorporate important references kindly suggested by the reviewer.

      Minor Points:

      (1) Overexposed Western Blots:

      The Western blots throughout the manuscript are heavily overexposed and saturated, obscuring differences in protein levels and hindering data interpretation. The authors should provide properly exposed blots with quantification where appropriate.

      We have provided Western blot images with appropriate exposure levels and included quantification where appropriate (i.e., to measure the kinetics of decay rates as in Figure 2C). For all the other immunoblots, we did not include densiometric quantification, given that the semi-quantitative nature of this technique would lead to overinterpretation of our data. This is, unfortunately, a limitation of the technique. In fact, eLife and other similar scientific journals do not adhere to the practice of quantifying Western blot analyses. 

      (2) Missing information in the graphs in Figure 2C and 4; S2? How many repeats? What are the asterisks?

      Panels referenced above have been repeated several times, and further details are now provided in the figure legends.

      Reviewer #3 (Recommendations for the authors):

      (1)   The claim that GMCL1 modulates paclitaxel sensitivity in cancer should be toned down

      .

      We agree that it would be an overstatement to claim that GMCL1 regulates paclitaxel sensitivity in cancer patients in a clinical context. The correlations we observed were based on publicly available, cell line–based datasets, which do not fully account for clinical heterogeneity and patient-specific factors. In response to this important point, we have revised our statements and corresponding text accordingly. We now placed greater emphasis on our molecular and cell biology studies.

      (2) Additional experiments in low, physiologically relevant concentrations of paclitaxel would be interesting. It is possible that these concentrations activate the mitotic stopwatch in a portion of cells, in addition to inducing cell death due to chromosome loss, activation of an immune response, and chromothripsis. Results should be interpreted in the context of this complexity.

      Please see the response to the public review. 

      (3) It would be helpful to show that CUL3 interacts with 53BP1 only in the presence of GMCL1.

      We show that the binding of 53BP1 to GMCL1 is independent of the ability of GMCL1 to bind CUL3 (Figure 1C, D). The binding between 53BP1 and CUL3 is difficult to detect (Figure 1F) likely because it’s not direct but mediated by GMCL1.

      (4) The GMCL1 "KO" lines appear to still express a low level of GMCL1 (Figure 2A), which should be acknowledged

      We have included the GMCL1 mRNA expression data, as measured by RT-PCR, in Supplementary Figure 1G, demonstrating that GMCL1 expression was undetectable under the tested conditions.

      (5) Additional description of the methods is warranted. This is particularly true for the database analysis that forms the basis for the claim that GMCL1 overexpression causes resistance to paclitaxel and other taxanes presented in Figure 3, the methodology used to obtain M-phase cells, and the concentration and duration of taxol treatment.

      We have now extensively revised the Methods section.  

      (6) "Taxol" and "paclitaxel" are used interchangeably throughout the manuscript. Consistency would be preferable.

      We have revised the manuscript to maintain consistency in the use of the terms “Taxol” and “paclitaxel” and now refer to “paclitaxel” when discussing that individual compound; “taxanes” when referring collectively to cabazitaxel, docetaxel and paclitaxel; and “Taxol” has been removed entirely to avoid redundancy or confusion.    

      (7) It is unclear why it is claimed that GMCL1 interacts "specifically" with 53BP1 (line 176) since multiple interactors were identified in the IP-MS study

      We meant that the GMCL1 R433A mutant loses its ability to bind 53BP1, suggesting that the GMCL1-53BP1 interaction is not an artifact. We have now clarified the text. 

      (8) The bottom row in Figure S3 is misleading. Paclitaxel is not uniformly effective in every tumor of any given type, and so resistance occurs in every cancer type.

      We fully agree that cancer is highly heterogeneous and that paclitaxel efficacy varies across tumors, even within the same histological subtype. Our intension was not to suggest uniform sensitivity/resistance, but rather to provide a high-level overview using aggregated data. We acknowledge that this coarse-grained representation may unintentionally imply overly generalized conclusions. To avoid potential misinterpretation, we have removed the corresponding panel in the revised paper.

    1. eLife Assessment

      This study presents a valuable advance in reconstructing naturalistic speech from intracranial ECoG data using a dual-pathway model. The evidence supporting the claims of the authors is solid, although the rationale for employing a smaller language model rather than a large language model (LLM) should be further clarified. This work will be of interest to cognitive neuroscientists and computer scientists/engineers working on speech reconstruction from neural data.

    2. Reviewer #1 (Public review):

      Summary:

      This paper introduces a dual-pathway model for reconstructing naturalistic speech from intracranial ECoG data. It integrates an acoustic pathway (LSTM + HiFi-GAN for spectral detail) and a linguistic pathway (Transformer + Parler-TTS for linguistic content). Output from the two components is later merged via CosyVoice2.0 voice cloning. Using only 20 minutes of ECoG data per participant, the model achieves high acoustic fidelity and linguistic intelligibility.

      Strengths:

      (1) The proposed dual-pathway framework effectively integrates the strengths of neural-to-acoustic and neural-to-text decoding and aligns well with established neurobiological models of dual-stream processing in speech and language.

      (2) The integrated approach achieves robust speech reconstruction using only 20 minutes of ECoG data per subject, demonstrating the efficiency of the proposed method.

      (3) The use of multiple evaluation metrics (MOS, mel-spectrogram R², WER, PER) spanning acoustic, linguistic (phoneme and word), and perceptual dimensions, together with comparisons against noise-degraded baselines, adds strong quantitative rigor to the study.

      Weaknesses:

      (1) It is unclear how much the acoustic pathway contributes to the final reconstruction results, based on Figures 3B-E and 4E. Including results from Baseline 2 + CosyVoice and Baseline 3 + CosyVoice could help clarify this contribution.

      (2) As noted in the limitations, the reconstruction results heavily rely on pre-trained generative models. However, no comparison is provided with state-of-the-art multimodal LLMs such as Qwen3-Omni, which can process auditory and textual information simultaneously. The rationale for using separate models (Wav2Vec for speech and TTS for text) instead of a single unified generative framework should be clearly justified. In addition, the adaptor employs an LSTM architecture for speech but a Transformer for text, which may introduce confounds in the performance comparison. Is there any theoretical or empirical motivation for adopting recurrent networks for auditory processing and Transformer-based models for textual processing?

      (3) The model is trained on approximately 20 minutes of data per participant, which raises concerns about potential overfitting. It would be helpful if the authors could analyze whether test sentences with higher or lower reconstruction performance include words that were also present in the training set.

      (4) The phoneme confusion matrix in Figure 4A does not appear to align with human phoneme confusion patterns. For instance, /s/ and /z/ differ only in voicing, yet the model does not seem to confuse these phonemes. Does this imply that the model and the human brain operate differently at the mechanistic level?

      (5) In general, is the motivation for adopting the dual-pathway model to better align with the organization of the human brain, or to achieve improved engineering performance? If the goal is primarily engineering-oriented, the authors should compare their approach with a pretrained multimodal LLM rather than relying on the dual-pathway architecture. Conversely, if the design aims to mirror human brain function, additional analysis, such as detailed comparisons of phoneme confusion matrices, should be included to demonstrate that the model exhibits brain-like performance patterns.

    3. Reviewer #2 (Public review):

      Summary:

      The study by Li et al. proposes a dual-path framework that concurrently decodes acoustic and linguistic representations from ECoG recordings. By integrating advanced pre-trained AI models, the approach preserves both acoustic richness and linguistic intelligibility, and achieves a WER of 18.9% with a short (~20-minute) recording.

      Overall, the study offers an advanced and promising framework for speech decoding. The method appears sound, and the results are clear and convincing. My main concerns are the need for additional control analyses and for more comparisons with existing models.

      Strengths:

      (1) This speech-decoding framework employs several advanced pre-trained DNN models, reaching superior performance (WER of 18.9%) with relatively short (~20-minute) neural recording.

      (2) The dual-pathway design is elegant, and the study clearly demonstrates its necessity: The acoustic pathway enhances spectral fidelity while the linguistic pathway improves linguistic intelligibility.

      Weaknesses:

      The DNNs used were pre-trained on large corpora, including TIMIT, which is also the source of the experimental stimuli. More generally, as DNNs are powerful at generating speech, additional evidence is needed to show that decoding performance is driven by neural signals rather than by the DNNs' generative capacity.

    4. Author response:

      Here we provide a provisional response addressing the public comments and outlining the revisions we are planning to make:

      (1) We will add additional baseline models to delineate the contributions of the acoustic and linguistic pathways.

      (2) We will show additional ablation analysis and other model comparison results, as suggested by the reviewers, to justify the choice of the DNN models.

      (3) We will clarify the use of the TIMIT dataset during pre-training. In fact, the TIMIT speech data (the speech corpora used in the test set) was not included or used when pre-training the acoustic or linguistic pathway. It was only used in fine-tuning the final speech synthesizer (the cosyvoice model). We will present results without this fine-tuning step, which will fully eliminate the usage of the TIMIT data during model training.

      (4) We will further analyze the phoneme confusion matrices and/or other data to evaluate the model behavior.

      (5) We will analyze the test sentences with high and low accuracies. We will also include results with partial training data (e.g. using 25%, 50%, 75% of the training set) to further evaluate the impact of the total amount of training data.

    1. eLife Assessment

      This fundamental study provides compelling evidence for the functional segregation of the sensorimotor cortex into precisely delineated areas, and highlights a rapid transition in functional properties at the boundaries between these areas. This result further confirms and extends recent work on the diversity of neural response specificities across cortical areas in the context of complex behavioral tasks. This work will be of interest to neuroscientists studying sensory-motor functions.

    2. Reviewer #1 (Public review):

      Summary:

      Here, the authors address the organization of reach-related activity in layer 2/3 across a broad swath of anterodorsal neocortex that included large subregions of M1, M2, and S1. In mice performing a novel variant water-reaching task, the authors measured activity using two-photon fluorescence imaging of a GECI expressed in excitatory projection neurons. The authors found a substantial diversity of response patterns using a number of metrics they developed for characterizing the PETHs of neurons across reach conditions (target locations). By mapping single-neuron properties across the cortex, the authors found substantial spatial variation, only some of which aligned with traditional boundaries between cortical regions. Using Gaussian mixture models, the authors found evidence of distinct response types in each region, with several types prominent in multiple cortical regions. Aggregating across regions, four primary subpopulations were apparent, each distinct in its average response properties. Strikingly, each subpopulation was observed in multiple regions, but subpopulation members from different regions exhibited largely similar response properties.

      Strengths:

      The work addresses a fundamental question in the field that has not previously been addressed at cellular resolution across such a broad cortical extent. I see this as truly foundational work that will support future investigation of how the rodent brain drives and controls reaching.

      The quantification is thoughtful and rigorous. It is great that the authors provide an explanation for and intuition behind their response metrics, rather than burying everything in the Methods.

      The Discussion and general contextualization of the results are thorough, thoughtful, and strong. It is great that the authors avoid the common over-interpretation of classical observations regarding cortical organization that are endemic in the field.

      All things considered, this is the best paper regarding spatial structure in the motor system I have ever read. The breadth of cellular resolution activity measurement, the rigor of the quantification, and the clear and open-minded interrogation of the data collectively have produced a very special piece of work.

      Weaknesses:

      The behavioral task is very impressive and an important contribution to the field in its own right. However, given that it appears substantially different from the one used in the previous paper, the characterization of the behavior provided in the Results is too brief. More illustration of the behavior would be helpful. For example, it is rather deep into the paper when the authors reveal that the mice can whisk to help localize the target location. That should be expressed at the outset when the behavior is first described. Other suggestions for elaborating the behavior description are included below.

      Statistical support for key claims is lacking. For example, "The five areas of interest varied in the fraction of neurons that were modulated: M2 had 14%, M1 had 23%, S1-fl had 30%, S1-hl had 25%, and S1-tr had 27%" - I cannot locate the statistical tests showing that these values are actually different. Another example is Figure 7, where a key observation is that distributions of PETH features are distinct across regions. It is clear that at least some distributions are not overlapping, but a clearer statistical basis for this key claim should be provided.

      I understand that the authors are planning a follow-up study that addresses the relation between activity patterns and kinematics. One question about interpreting the results here though, is how much the activity variation across target locations may relate to the kinematic differences across these different conditions, as opposed to true higher-order movement features like reach direction.

    3. Reviewer #2 (Public review):

      Summary:

      The functional parcellation of cortical areas is a critical question in neuroscience. This is particularly true in frontal areas in mice. While sensory areas are relatively well characterized by their tuning to sensory stimuli, the situation is much less clear for motor areas. This has become even more ambiguous since recent studies using large-scale neuronal recordings consistently report mixed sensory and motor-related activity throughout the brain, and motor mapping studies have shown that movements evoked by cortical stimulation are by no means limited to motor areas alone. Here, the authors use a correlation approach combining large-scale functional imaging at cellular resolution with movement-tracking in mice executing a reaching task. Across multiple recording sessions in the same animals, the authors have imaged a large portion of the sensorimotor cortex at cellular resolution in mice performing a reaching task, recording the activity of nearly 40,000 neurons. By aligning the calcium signal of each neuron to three task events-the Go cue triggering the reach, the onset of paw lift, and the contact between the paw and the target-for different target positions, the authors identified different response patterns distributed differently across cortical areas. They defined a set of features that describe the neurons' response pattern, representing the temporal dynamics and tuning properties for the different target positions. These features were used to construct cortical maps, and the authors show that, interestingly, gradient maps obtained from the first derivative of the feature maps reveal sharp discontinuities at the boundaries between anatomically defined cortical areas. Using dimensionality reduction of the neuronal response features, the authors found that, despite clear differences in their average response properties, individual neurons from the same cortical areas do not form distinct clusters in the reduced-dimensional space. In fact, most areas contain heterogeneous neuronal populations, and most neuronal populations are present in multiple areas, albeit in different proportions. Interestingly, the authors identified four neuronal subpopulations based on the distance between the components of the Gaussian mixture model used to model the distribution of neurons within each area. One of these subpopulations is almost exclusively represented in the anterior M2 cortex, while another is broadly distributed across the different areas.

      Strengths:

      This article is based on an impressive dataset of nearly 40,000 neurons covering a large portion of the sensorimotor cortex and on innovative analytical approaches. This study is likely the first to clearly demonstrate boundaries between cortical areas defined based on the responses of individual neurons. This innovative approach to functional mapping of cortical areas potentially opens up new perspectives for higher-resolution mapping of frontal cortical areas, using a broader repertoire of sensory and motor evoked responses.

      Weaknesses:

      The second part of the article, which presents multimodal responses in the cortical areas, seems to be a perhaps overly complicated way of showing what has already been demonstrated in numerous recent publications, but these new analyses expand upon these previous observations by revealing an interesting functional organization of the sensorimotor cortex, highlighting interesting similarities and differences between certain areas.

    1. eLife Assessment

      The authors present evidence for a WIPI2-Retriever complex (termed CROP2) that couples cargo selection to carrier fission at endosomes. CROP2 appears to function analogously to the previously described CROP1 complex, formed by WIPI1 and Retromer, with which it shares structural similarities. They provide convincing evidence that CROP1 and CROP2 regulate the trafficking of distinct subsets of cargoes; however, the cellular evidence for the existence of these distinct complexes remains incomplete. Overall, the findings are important and expand our understanding of how cargo selection by Retriever and Retromer is orchestrated at endosomes.

    2. Reviewer #1 (Public review):

      WIPI1 is a PROPPIN family protein that has been implicated in Retromer-mediated membrane fission events. Although the cargos that it has been tested to be important for are diverse, one of the cargos that is unaffected is Beta1-Integrin. This leads the authors to assess another PROPPIN family protein - WIPI2, which is a homolog of WIPI1. KD using siRNA is effective and had no consequences on LAMP1, EGFR trafficking or GLUT1 trafficking. Integrin-B1, however, had a large and significant defect in its recycling from the endosome, with a clear endosomal colocalisation. Complementation experiments with WT WIPI2 recovered the phenotype, but various mutant WIPI2 complements resulted in elongated tubules, and there was also a dominant negative effect of the mutant. Integrin is a classic retreiver cargo, so the authors rationalise that WIPI2 may be playing a role with retreiver that WIPI1 plays with retromer. To assess this, they perform a set of immunoprecipitations. SNX17, the retreiver-associated sorting nexin, co-IPs with WIPI2 in a VPS26C-dependent manner. VPS26C but not VPS26 co-IPs with WIPI2, and the reciprocal with WIPI1. These interactions were not present for the FSSS mutation of WIPI2. WIPI2 localises to Rab11 endosomes mainly, as does retriever. Mutations of WIPI2 not only affected WIPI2 localisation, but also VPS35L mutations, indicating that there is a functional relationship between the two.

      On the whole, I find the manuscript compelling. The manuscript is very clearly written, the results are convincing and well performed. The flow of experiments is logical, and although not comprehensive in the subsequent mechanistic understanding, the fundamental findings are important and convincing. My comments below are, on the whole, minor and are intended to support the communication of the findings to the field.

      (1) The IP interaction data were convincing; however, for me and some others, an interaction is only convincing when performed in vitro, and understood at a structural level. I do not suggest the authors do that in this case; however, I think, at a minimum, some sensible moderation of claims would be useful here.

      (2) I found the final localisation data and its interpretation confusing. My interpretation of that data would not be that the retreiver is relocalised, but rather that there is less of both recruited to the membrane and the remaining localisation distribution is shifted. In addition, I am not quite sure of the model here - is the idea that WIPI2 recruits retreiver, if that is the case, I find it hard to resolve with its role as a mediator of fission. Clarity would be appreciated here.

      (3) I am concerned that the repeats being compared for statistical analysis are not biological repeats but technical repeats (cells in the same experiment). I should think the idea of the statistical comparison is to show experimental reproducibility and variability across biological repeats. Therefore, I would expect an appropriate number of biological repeats (3 or more minimum), to be the data compared in the statistical analysis and graphs. I think it is appropriate to average the technical repeats from each biological repeat. I find these to be useful resources https://doi.org/10.1083/jcb.202401074, https://doi.org/10.1083/jcb.200611141

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript from De Leo and Mayer presents evidence that the PROPPIN protein, WIPI2, associates with the Retriever complex, and is required for the proper transport of the SNX17-Retriever cargo, beta1-integrin. This finding fits with prior papers from the Mayer lab, which showed that a related PROPPIN, WIPI1, is required for the transport of some SNX27-Retromer cargo, including GLUT1. The retromer and retriever complexes are architecturally similar. Importantly, they act at the same endosomes, and each transports cargo from endosomes to the plasma membrane. Thus, the possibility that each also requires a structurally related PROPPIN is of interest. However, the manuscript is incomplete, and the main claims are only partially supported.

      Strengths:

      The topic that PROPPIN proteins are important for the function of the Retromer and Retriever complexes expands our view of the trafficking complex.

      Weaknesses:

      Many important controls are missing. Several points that are made in the manuscript are only supported through a single approach.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript of Mayer and colleagues analyzes the function of WIPI proteins in mammalian cells. The authors previously identified CROP as a complex consisting of WIPI1 and the retromer complex, primarily in yeast cells. In mammalian cells, both WIPI1 and WIPI2 exist, whereas retromer has a homologous complex termed retriever. They now find that WIPI2 can form a complex with retriever subunits. They named this complex CROP2. Their data further indicate that CROP2 and CROP1 have distinct substrate specificities as knockdown of CROP2 subunits affects beta1 integrin sorting, whereas knockdown of CROP1 affects EGFR and GLUT1. They further identify a similar sequence (FSSS) in both WIPI1 and WIPI2, which is required for their specific binding to retromer and retriever.

      Strengths:

      CROP1 and CROP2 seem to use similar features for their formation, and have different substrates, which is convincingly shown.

      Weaknesses:

      The analysis lacks information that this is a complex as claimed. It can be deduced from the interaction analysis, but was not shown.

    1. eLife Assessment

      This study reports a valuable method to predict the capacity of a candidate probiotic bacterium to metabolically outcompete a bacterial pathogen in the ecological niche of the murine respiratory tract (niche exclusion) based on the overlap of used carbon sources in vitro. The in vivo confirmation of the in vitro/in silico predicted efficacy is, at this stage, incomplete and would require more persuasive experimental evidence for the elimination of alternative mechanisms of action.

    2. Reviewer #1 (Public review):

      A summary of what the authors were trying to achieve:

      (1) Identify probiotic candidates based on the phylogenetic proximity and their presence in the lower respiratory tract based on phylogenetic analysis and on meta-analysis of 16S rRNA sequencing of mouse lung samples.

      (2) Predefine probiotic candidates with overlapping and competing metabolic profiles based on a simple and easy-to-applicable score, taking carbon source use into consideration.

      (3) Confirm the functionality of these candidate probiotics in vitro and define their mechanism of action (niche exclusion by either metabolic competition or active antibacterial strategies).

      (4) Confirm the probiotic action in vivo.

      Strengths:

      The authors attempt to go the whole 9 yards from rational choice of phylogenetic close lower respiratory tract probiotics, over in silico modelling of niche index based on use of similar carbon sources with in vitro confirmation, to in vivo competition experiments in mice.

      Weaknesses:

      (1) The use of a carbon source is defined as growth to OD600 two SD above the blank level. While allowing a clear cutoff, this procedure does not take into account larger differences in the preferences of carbon sources between the pathogen and the probiotic candidate. If the pathogen is much better at taking up and processing a carbon source, the competition by the probiotic might be biologically irrelevant.

      (2) The authors do not take into account the growth of candidate probiotics in the presence of Bt. In monoculture, three of the four most potent candidate probiotics grow to comparable levels as Bt in LSM.

      (3) Niche exclusion in vivo is not shown. Mortality of hosts after infection with Bt is not a measure for competition of CP with the pathogen. Only Bt titers would prove a competitive effect. For CP17, less than half of the mice were actually colonized, but still, there is 100% protection. Activation of the host immune system would explain this and has to be excluded as an alternative reason for improved host survival.

      Appraisal:

      (1) Based on phylogenetic comparison and published resources on lower respiratory tract colonizing bacteria, the authors find a reasonably good number of candidate probiotics that grow in LSM and successfully compete with the pathogenic target bacterium Bt in vitro.

      (2) In vivo, only host survival was tested, and a direct competition of CP with Bt by testing for Bt titers was not shown.

      Impact:

      Niche exclusion based on competition for environmentally provided metabolites is not a new concept and was experimentally tested, e.g. in the intestine. The authors show here that this concept could be translated into the resource-poor environment of the respiratory tract. It remains to be tested if the LSM growth-based competition data in vitro can be translated into niche exclusion in vivo.

    3. Reviewer #2 (Public review):

      Summary:

      This study aims to establish a rational framework for designing bacterial probiotics against respiratory infections. The central hypothesis is that in vitro antagonism, particularly through metabolic niche overlap with a pathogen, predicts in vivo efficacy.

      Strengths:

      (1) Systematic pipeline: The study integrates bacterial isolation, in vitro characterization, model development, and in vivo validation into a cohesive workflow.

      (2) Quantitative model: The introduction of the Niche Index (NI) and Niche Index Fraction (NIF) provides a novel, quantitative tool for predicting probiotic efficacy based on ecological principles.

      (3) Mechanistic insight: The work dissects different modes of action, clearly demonstrating that inhibition can be driven by specialized metabolite production (CP8) or carbon resource competition (e.g., CP7), with lactate utilization identified as a key factor.

      Weaknesses:

      (1) Limited model generalizability: The predictive power of the NI model is not universal. It fails to account for the in vivo inefficacy of CP8 (a metabolite-dependent inhibitor) and cannot explain the short-term protection conferred by some non-inhibitory CPs in vivo, suggesting unmodeled mechanisms like immune priming are at play.

      (2) Preliminary nature of key findings: The emphasis on lactate consumption as a critical predictor, while interesting, is not sufficiently explored to establish its general importance beyond the specific strains and conditions tested.

      Appraisal:

      The authors successfully achieve their aim of establishing a rational probiotic-design pipeline. The data robustly support the conclusion that metabolic niche overlap predicts efficacy for many strains, while also clearly delineating the model's limitations, as acknowledged by the authors.

      Impact:

      This work provides a valuable methodological framework for hypothesis-driven probiotic discovery. The quantitative Niche Index offers immediate utility to the field and, with further refinement, has the potential to become a fundamental tool for developing respiratory therapeutics.

    1. eLife Assessment

      This is an overall compelling set of findings on the role of centrally produced estrogens in the control of behaviors in male medaka. The significance of the findings rests on the revealed potential mechanism between brain derived estrogens modulating social behaviors in males , supported by the analysis of multiple transgenic lines. The evidence for the broader claim is incomplete since it has not been extended to female medaka, and further experimentation would be necessary to fully validate the conclusions on the role of brain-derived estrogens. Nonetheless, the findings have led to important hypotheses on the hormonal control of behaviors in teleosts that can be tested further.

    2. Reviewer #1 (Public review):

      Summary:

      This research group has consistently performed cutting-edge research aiming to understand the role of hormones in the control of social behaviors, specifically by utilizing the genetically-tractable teleost fish, medaka, and the current work is no exception. The overall claim they make, that estrogens modulate social behaviors in males is supported, with important caveats. For one, there is no evidence these estrogens are generated by "neurons" as would be assumed by their main claim that it is NEUROestrogens that drive this effect. While indeed the aromatase they have investigated is expressed solely in the brain, in most teleosts, brain aromatase is only present in glial cells (astrocytes, radial glia). The authors should change this description so as not to mislead the reader. Below I detail more specific strengths and weaknesses of this manuscript.

      Strengths:

      Excellent use of the medaka model to disentangle the control of social behavior by sex steroid hormones

      The findings are strong for the most part because deficits in the mutants are restored by the molecule (estrogens) that was no longer present due to the mutation

      Presentation of the approach and findings are clear, allowing the reader to make their own inferences and compare them with the authors'

      Includes multiple follow-up experiments, which leads to tests of internal replication and an impactful mechanistic proposal

      Findings are provocative not just for teleost researchers, but for other species since, as the authors point out, the data suggest mechanisms of estrogenic control of social behaviors may be evolutionary ancient

      Weaknesses:

      The experimental design for studying aggression in males has flaws, but it appears a typical resident-intruder type assay is not appropriate for medaka. seems other species may be better for studying aggression in teleosts.

    3. Reviewer #3 (Public review):

      Summary:

      Taking advantage of the existence in fish of two genes coding for estrogen synthase, the enzyme aromatase, one mostly expressed in the brain (Cyp19a1b) and the other mostly found in the gonads (Cyp19a1a), this study investigates the role of brain-derived estrogens in the control of sexual and aggressive behavior in male medaka. The constitutive deletion of Cyp19a1b, confirmed by the ablation of its transcript, markedly reduced brain estrogen content. This effect is accompanied by reduced sexual and aggressive behavior and reduced expression of the transcripts coding for androgen receptors (AR), ara and arb, in brain regions involved in social behavior regulation. Both AR expression and aspects of social behaviors were restored by adult treatment with estrogens, providing some support for a role of aromatization. Expression analysis of AR isoforms and behavior of mutants of estrogen receptors (ER) indicates that these effects are likely mediated by the activation of the esr1 and esr2a isoforms. Together, these results provide valuable insights into the role of brain-derived estrogens in social behavior in fish.

      Strengths:

      This study evaluates the role of brain "specific" Cyp19a1 in the social behavior in male teleost fish, which as a taxon are more abundant and yet proportionally less studied that the most common birds and rodents. Therefore, evaluating the generalizability of results from higher vertebrates is important. The study suggests that, as opposed to mammals, the facilitatory role of brain-derived estrogens on mating and aggression is limited to adulthood.

      Results obtained from multiple mutant lines converge to show that estrogens most likely synthesized in the brain drives aspects of male sexual behavior.

      The comparative discussion of the age-dependent abundance of brain aromatase in fish vs mammals and its role in organization vs activation is important beyond the study of the targeted species.

      Weaknesses:

      Most experiments are weakly powered (low sample size).

      The variability of the mRNA content for a same target gene between experiments (genotype comparison vs E2 treatment comparison) raises questions about the reproducibility of the data (apparent disappearance of genotype effect).

      Conclusions :

      Overall, the present study provides convincing evidence for a facilitatory role of estrogens originating from the brain on sexual behavior and aggressive behavior in male medaka. The role of specific estrogen receptor isoforms underlying the expression of androgen receptors is supported by converging evidence from multiple mutant lines.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This research group has consistently performed cutting-edge research aiming to understand the role of hormones in the control of social behaviors, specifically by utilizing the genetically-tractable teleost fish, medaka, and the current work is no exception. The overall claim they make, that estrogens modulate social behaviors in males and females is supported, with important caveats. For one, there is no evidence these estrogens are generated by "neurons" as would be assumed by their main claim that it is NEUROestrogens that drive this effect. While indeed the aromatase they have investigated is expressed solely in the brain, in most teleosts, brain aromatase is only present in glial cells (astrocytes, radial glia). The authors should change this description so as not to mislead the reader. Below I detail more specific strengths and weaknesses of this manuscript.

      We thank the reviewer for this positive evaluation of our work and for the helpful comments and suggestions. Regarding the concern that the term “neuroestrogens” may be misleading, we addressed this in the previous revision by consistently replacing it throughout the manuscript with “brain-derived estrogens” or “brain estrogens.”

      In addition, the following sentence was added to the Introduction (line 61): “In teleost brains, including those of medaka, aromatase is exclusively localized in radial glial cells, in contrast to its neuronal localization in rodent brains (Forlano et al., 2001; Diotel et al., 2010; Takeuchi and Okubo, 2013).”

      Strenghth:

      Excellent use of the medaka model to disentangle the control of social behavior by sex steroid hormones 

      The findings are strong for the most part because deficits in the mutants are restored by the molecule (estrogens) that was no longer present due to the mutation 

      Presentation of the approach and findings are clear, allowing the reader to make their own inferences and compare them with the authors' 

      Includes multiple follow-up experiments, which leads to tests of internal replication and an impactful mechanistic proposal 

      Findings are provocative not just for teleost researchers, but for other species since, as the authors point out, the data suggest mechanisms of estrogenic control of social behaviors may be evolutionary ancient 

      We thank the reviewer again for their positive evaluation of our work.

      Weakness:

      As stated in the summary, the authors are attributing the estrogen source to neurons and there isn't evidence this is the case. The impact of the findings doesn't rest on this either

      As mentioned above, we addressed this in the previous revision by replacing “neuroestrogens” with “brain-derived estrogens” or “brain estrogens” throughout the manuscript. In addition, the following sentence was added to the Introduction (line 61): “In teleost brains, including those of medaka, aromatase is exclusively localized in radial glial cells, in contrast to its neuronal localization in rodent brains (Forlano et al., 2001; Diotel et al., 2010; Takeuchi and Okubo, 2013).”

      The d4 versus d8 esr2a mutants showed different results for aggression. The meaning and implications of this finding are not discussed, leaving the reader wondering

      This comment is the same as one raised in the first review (Reviewer #1’s comment 2 on weaknesses), which we already addressed in our initial revision. For the reviewer’s convenience, we provide the response below:

      Line 300: As the reviewer correctly noted, circles were significantly reduced in mutant males of the Δ8 line, whereas no significant reduction was observed in those of the Δ4 line. However, a tendency toward reduction was evident in the Δ4 line (P = 0.1512), and both lines showed significant differences in fin displays. Based on these findings, we believe our conclusion that esr2a<sup>−/−</sup> males exhibit reduced aggression remains valid. To clarify this point and address potential reader concerns, we have revised the text as follows: “esr2a<sup>−/−</sup> males exhibited significantly fewer fin displays (P = 0.0461 and 0.0293 for Δ8 and Δ4 lines, respectively) and circles (P = 0.0446 and 0.1512 for Δ8 and Δ4 lines, respectively) than their wild-type siblings (Fig. 5L; Fig. S8E), suggesting less aggression” was edited to read “esr2a<sup>−/−</sup> males from both the Δ8 and Δ4 lines exhibited significantly fewer fin displays than their wild-type siblings (P = 0.0461 and 0.0293, respectively). Circles followed a similar pattern, with a significant reduction in the Δ8 line (P = 0.0446) and a comparable but non-significant decrease in the Δ4 line (P =0.1512) (Figure 5L, Figure 5—figure supplement 3E), showing less aggression.”

      Lack of attribution of previous published work from other research groups that would provide the proper context of the present study

      This comment is also the same as one raised in the first review (Reviewer #1’s comment 3 on weaknesses). In our previous revision, in response to this comment, we cited the relevant references (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015; Yong et al., 2017; Alward et al., 2020; Ogino et al., 2023) in the appropriate sections. We also added the following new references and revised the Introduction and Discussion accordingly:

      (2) Alward BA, Laud VA, Skalnik CJ, York RA, Juntti SA, Fernald RD. 2020. Modular genetic control of social status in a cichlid fish. Proceedings of the National Academy of Sciences of the United States of America 117:28167–28174. DOI: https://doi.org/10.1073/pnas.2008925117

      (39) O’Connell LA, Hofmann HA. 2012. Social status predicts how sex steroid receptors regulate complex behavior across levels of biological organization. Endocrinology 153:1341–1351. DOI:https://doi.org/10.1210/en.2011-1663

      (54) Yong L, Thet Z, Zhu Y. 2017. Genetic editing of the androgen receptor contributes to impaired male courtship behavior in zebrafish. Journal of Experimental Biology 220:3017–3021.DOI:https://doi.org/10.1242/jeb.161596

      There are a surprising number of citations not included; some of the ones not included argue against the authors' claims that their findings were "contrary to expectation"

      In our previous revision, we cited the relevant references (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015) in the Introduction. We also revised the text to remove phrases such as “contrary to expectation” and “unexpected.”

      The experimental design for studying aggression in males has flaws. A standard test like a residentintruder test should be used.

      Following this comment, we have attempted additional aggression assays using the resident-intruder paradigm. However, these experiments did not produce consistent or interpretable results. As noted in our previous revision, medaka naturally form shoals and exhibit weak territoriality, and even slight differences in dominance between a resident and an intruder can markedly increase variability, reducing data reliability. Therefore, we believe that the approach used in the present study provides a more suitable assessment of aggression in medaka, regardless of territorial tendencies. We will continue to explore potential refinements in future studies and respectfully ask the reviewer to evaluate the present work based on the assay used here.

      While they investigate males and females, there are fewer experiments and explanations for the female results, making it feel like a small addition or an aside

      While we did not adopt this comment in our previous revision, we have carefully reconsidered the reviewers’ feedback and have now decided to remove the female data. This change allows us to present a more focused and cohesive story centered on males. The specific revisions are outlined below:

      Abstract

      Line 25: The text “, thereby revealing a previously unappreciated mode of action of brain-derived estrogens. We additionally show that female fish lacking Cyp19a1b are less receptive to male courtship and conversely court other females, highlighting the significance of brain-derived estrogens in establishing sex-typical behaviors in both sexes.” has been revised to “. Taken together, these findings reveal a previously unappreciated mode of action of brain-derived estrogens in shaping male-typical behaviors.”

      Results

      Line 88: The text “Loss of cyp19a1b function in these fish was verified by measuring brain and peripheral levels of sex steroids. As expected, brain estradiol-17β (E2) in both male and female homozygous mutants (cyp19a1b<sup>−/−</sup>) was significantly reduced to 16% and 50%, respectively, of the levels in their wild-type (cyp19a1b<sup>+/+</sup>) siblings (P = 0.0037, males; P = 0.0092, females) (Fig. 1, A and B). In males, brain E2 in heterozygotes (cyp19a1b<sup>−/−</sup>) was also reduced to 45% of the level in wild-type siblings (P = 0.0284) (Fig. 1A), indicating a dosage effect of cyp19a1b mutation. In contrast, peripheral E2 levels were unaltered in both cyp19a1b<sup>−/−</sup> males and females (Fig. S1, C and D), consistent with the expected functioning of Cyp19a1b primarily in the brain. Strikingly, brain levels of testosterone, as opposed to E2, increased 2.2-fold in cyp19a1b<sup>−/−</sup> males relative to wild-type siblings (P = 0.0006) (Fig. 1A). Similarly, brain 11KT levels in cyp19a1b<sup>−/−</sup> males and females increased 6.2- and 1.9-fold, respectively, versus wild-type siblings (P = 0.0007, males; P = 0.0316, females) (Fig. 1, A and B). These results show that cyp19a1b-deficient fish have reduced estrogen levels coupled with increased androgen levels in the brain, confirming the loss of cyp19a1b function. They also suggest that the majority of estrogens in the male brain and half of those in the female brain are synthesized locally in the brain. In addition, peripheral 11KT levels in cyp19a1b<sup>−/−</sup> males and females increased 3.7- and 1.8-fold, respectively (P = 0.0789, males; P = 0.0118, females) (Fig. S1, C and D), indicating peripheral influence in addition to central effects.” has been revised to “Loss of cyp19a1b function in these fish was verified by measuring brain and peripheral levels of sex steroids in males. As expected, brain estradiol-17β (E2) in homozygous mutants (cyp19a1b<sup>−/−</sup>) was significantly reduced to 16% of the levels in wild-type (cyp19a1b<sup>+/+</sup>) siblings (P = 0.0037) (Figure 1A). Brain E2 in heterozygotes (cyp19a1b<sup>+/−</sup>) was also reduced to 45% of wild-type levels (P = 0.0284) (Figure 1A), indicating a dosage effect of the cyp19a1b mutation. In contrast, peripheral E2 levels were unaltered in cyp19a1b<sup>−/−</sup> males (Figure 1B), consistent with the expected functioning of Cyp19a1b primarily in the brain. Strikingly, brain testosterone levels, as opposed to E2, increased 2.2-fold in cyp19a1b<sup>−/−</sup> males relative to wild-type siblings (P = 0.0006) (Figure 1A). Similarly, brain 11KT levels increased 6.2-fold (P = 0.0007) (Figure 1A). These results indicate that cyp19a1b-deficient males have reduced estrogen coupled with elevated androgen levels in the brain, confirming the loss of cyp19a1b function. They also suggest that the majority of estrogens in the male brain are synthesized locally in the brain. Peripheral 11KT levels also increased 3.7-fold in cyp19a1b<sup>−/−</sup> males (P = 0.0789) (Figure 1B), indicating peripheral influence in addition to central effects.”

      Line 211: “expression of vt in the pNVT of cyp19a1b<sup>−/−</sup> males was significantly reduced to 18% as compared with cyp19a1b<sup>+/+</sup> males (P = 0.0040), a level comparable to that observed in females” has been revised to “expression of vt in the pNVT of cyp19a1b<sup>−/−</sup> males was significantly reduced to 18% as compared with cyp19a1b<sup>+/+</sup> males (P = 0.0040).”

      The subsection entitled “cyp19a1b-deficient females are less receptive to males and instead court other females,” which followed line 311, has been removed.

      Discussion

      The two paragraphs between lines 373 and 374, which addressed the female data, have been removed.

      Materials and methods

      Line 433: “males and females” has been changed to “males”.

      Line 457: “focal fish” has been changed to “focal male”.

      Line 458: “stimulus fish” has been changed to “stimulus female”.

      Line 458: “Fig. 6, E and F, ” has been deleted.

      Line 460: “; wild-type males in Fig. 6, A to C” has been deleted.

      Line 466: The text “The period of interaction/recording was extended to 2 hours in tests of courtship displays received from the stimulus esr2b-deficient female and in tests of mating behavior between females, because they take longer to initiate courtship (12). In tests using an esr2b-deficient female as the stimulus fish, where the latency to spawn could not be calculated because these fish were unreceptive to males and did not spawn, the sexual motivation of the focal fish was instead assessed by counting the number of courtship displays and wrapping attempts in 30 min. The number of these mating acts was also counted in tests to evaluate the receptivity of females. In tests of mating behavior between two females, the stimulus female was marked with a small notch in the caudal fin to distinguish it from the focal female.” has been revised to “In tests using an esr2b-deficient female as the stimulus fish, the latency to spawn could not be calculated because the female was unreceptive to males and did not spawn. Therefore, the sexual motivation of the focal male was assessed by counting the number of courtship displays and wrapping attempts in 30 min. To evaluate courtship displays performed by stimulus esr2bdeficient females toward focal males, the recording period was extended to 2 hours, as these females take longer to initiate courtship (Nishiike et al., 2021). In all video analyses, the researcher was blind to the fish genotype and treatment.”

      Line 499: “brains dissected from males and females of the cyp19a1b-deficient line (analysis of ara, arb, vt, gal, npba, and esr2b) and males of the esr1-, esr2a-, and esr2b-deficient lines” has been revised to “male brains from the cyp19a1b-deficient line (analysis of ara, arb, vt, and gal) and from the esr1-, esr2a-, and esr2b-deficient lines.”

      Line 504: “After color development for 15 min (gal), 40 min (npba), 2 hours (vt), or overnight (ara, arb, and esr2b)” has been revised to “After color development for 15 min (gal), 2 hours (vt), or overnight (ara and arb).”

      Line 516: “Thermo Fisher Scientific, Waltham, MA” has been changed to “Thermo Fisher Scientific” to avoid redundancy.

      Line 565: The subsection entitled “Measurement of spatial distances between fish” has been removed.

      Line 585: “6/10 cyp19a1b<sup>+/+</sup>, 3/10 cyp19a1b<sup>+/−</sup>, and 6/10 cyp19a1b<sup>−/−</sup> females were excluded in Fig. 6B;” has been deleted.

      References

      The following references have been removed:

      Capel B. 2017. Vertebrate sex determination: evolutionary plasticity of a fundamental switch. Nature Reviews Genetics 18:675–689. DOI: https://doi.org/10.1038/nrg.2017.60

      Hiraki T, Nakasone K, Hosono K, Kawabata Y, Nagahama Y, Okubo K. 2014. Neuropeptide B is femalespecifically expressed in the telencephalic and preoptic nuclei of the medaka brain. Endocrinology 155:1021–1032. DOI: https://doi.org/10.1210/en.2013-1806

      Juntti SA, Hilliard AT, Kent KR, Kumar A, Nguyen A, Jimenez MA, Loveland JL, Mourrain P, Fernald RD. 2016. A neural basis for control of cichlid female reproductive behavior by prostaglandin F2α. Current Biology 26:943–949. DOI: https://doi.org/10.1016/j.cub.2016.01.067

      Kimchi T, Xu J, Dulac C. 2007. A functional circuit underlying male sexual behaviour in the female mouse brain. Nature 448:1009–1014. DOI: https://doi.org/10.1038/nature06089

      Kobayashi M, Stacey N. 1993. Prostaglandin-induced female spawning behavior in goldfish (Carassius auratus) appears independent of ovarian influence. Hormones and Behavior 27:38–55.

      DOI:https://doi.org/10.1006/hbeh.1993.1004

      Liu H, Todd EV, Lokman PM, Lamm MS, Godwin JR, Gemmell NJ. 2017. Sexual plasticity: a fishy tale. Molecular Reproduction and Development 84:171–194. DOI: https://doi.org/10.1002/mrd.22691

      Munakata A, Kobayashi M. 2010. Endocrine control of sexual behavior in teleost fish. General and Comparative Endocrinology 165:456–468. DOI: https://doi.org/10.1016/j.ygcen.2009.04.011

      Nugent BM, Wright CL, Shetty AC, Hodes GE, Lenz KM, Mahurkar A, Russo SJ, Devine SE, McCarthy MM. 2015. Brain feminization requires active repression of masculinization via DNA methylation. Nature Neuroscience 18:690–697. DOI: https://doi.org/10.1038/nn.3988

      Shaw K, Therrien M, Lu C, Liu X, Trudeau VL. 2023. Mutation of brain aromatase disrupts spawning behavior and reproductive health in female zebrafish. Frontiers in Endocrinology 14:1225199.

      DOI:https://doi.org/10.3389/fendo.2023.1225199

      Stacey NE. 1976. Effects of indomethacin and prostaglandins on the spawning behaviour of female goldfish. Prostaglandins 12:113–126. DOI: https://doi.org/10.1016/s0090-6980(76)80010-x

      Figure 1

      Panel B, which originally showed steroid levels in female brains, has been replaced with steroid levels in the periphery of males, originally presented in Figure S1, panel C. Accordingly, the legend “(A and B) Levels of E2, testosterone, and 11KT in the brain of adult cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males (A) and females (B) (n = 3 per genotype and sex).” has been revised to “(A, B) Levels of E2, testosterone, and 11KT in the brain (A) and periphery (B) of adult cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males (n = 3 per genotype).”

      Figure 3

      The female data have been deleted from Figure 3. The revised Figure 3 is presented.

      The corresponding legend text has been revised as follows:

      Line 862: “males and females (n = 4 and 5 per genotype for males and females, respectively)” has been changed to “males (n = 4 per genotype)”.

      Line 864: “males and females (n = 4 except for cyp19a1b<sup>+/+</sup> males, where n = 3)” has been changed to “males (n = 3 and 4, respectively)”.

      Figure 6

      Figure 6 and its legend have been removed.

      Figure 1—figure supplement 1

      Panel C, showing male data, has been moved to Figure 1B, as described above, while panel D, showing female data, has been deleted. The corresponding legend “(C and D) Levels of E2, testosterone, and 11KT in the periphery of adult cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males (C) and females (D) (n = 3 per genotype and sex). Statistical differences were assessed by Bonferroni’s post hoc test (C and D). Error bars represent SEM. *P < 0.05.” has also been removed.

      Line 804: Following this change, the figure title has been updated from “Generation of cyp19a1bdeficient medaka and evaluation of peripheral sex steroid levels” to “Generation of cyp19a1b-deficient medaka.”

      The statistics comparing "experimental to experimental" and "control to experimental" isn't appropriate 

      This comment is the same as one raised in the first review (Reviewer #1’s comment 7 on weaknesses), which we already addressed in our initial revision. For the reviewer’s convenience, we provide the response below:

      The reviewer raised concerns about the statistical analysis used for Figures 4C and 4E, suggesting that Bonferroni’s test should be used instead of Dunnett’s test. However, Dunnett’s test is commonly used to compare treatment groups to a reference group that receives no treatment, as in our study. Since we do not compare the treated groups with each other, we believe Dunnett’s test is the most appropriate choice.

      Line 576: The reviewer’s concern may have arisen from the phrase “comparisons between control and experimental groups” in the Materials and methods. We have revised it to “comparisons between untreated and E2-treated groups in Figure 4C and D” for clarity.

      Reviewer #3 (Public Review):

      Summary:

      Taking advantage of the existence in fish of two genes coding for estrogen synthase, the enzyme aromatase, one mostly expressed in the brain (Cyp19a1b) and the other mostly found in the gonads (Cyp19a1a), this study investigates the role of brain-derived estrogens in the control of sexual and aggressive behavior in medaka. The constitutive deletion of Cyp19a1b markedly reduced brain estrogen content in males and to a lesser extent in females. These effects are accompanied by reduced sexual and aggressive behavior in males and reduced preference for males in females. These effects are reversed by adult treatment with supporting a role for estrogens. The deletion of Cyp19a1b is associated with a reduced expression of the genes coding for the two androgen receptors, ara and arb, in brain regions involved in the regulation of social behavior. The analysis of the gene expression and behavior of mutants of estrogen receptors indicates that these effects are likely mediated by the activation of the esr1 and esr2a isoforms. These results provide valuable insight into the role of estrogens in social behavior in the most abundant vertebrate taxon, however the conclusion of brain-derived estrogens awaits definitive confirmation.

      We thank this reviewer for their positive evaluation of our work and comments that have improved the manuscript.

      Strength:

      Evaluation of the role of brain "specific" Cyp19a1 in male teleost fish, which as a taxon are more abundant and yet proportionally less studied that the most common birds and rodents. Therefore, evaluating the generalizability of results from higher vertebrates is important. This approach also offers great potential to study the role of brain estrogen production in females, an understudied question in all taxa.

      Results obtained from multiple mutant lines converge to show that estrogen signaling, likely synthesized in the brain drives aspects of male sexual behavior.

      The comparative discussion of the age-dependent abundance of brain aromatase in fish vs mammals and its role in organization vs activation is important beyond the study of the targeted species.  - The authors have made important corrections to tone down some of the conclusions which are more in line with the results. 

      We thank the reviewer again for their positive evaluation of our work and the revisions we have made.

      weaknesses:

      No evaluation of the mRNA and protein products of Cyp19a1b and ESR2a are presented, such that there is no proper demonstration that the mutation indeed leads to aromatase reduction. The conclusion that these effects dependent on brain derived estrogens is therefore only supported by measures of E2 with an EIA kit that is not validated. No discussion of these shortcomings is provided in the discussion thus further weakening the conclusion manuscript.

      In response to this and other comments, we have now provided direct validation that the cyp19a1b mutation in our medaka leads to loss of function. Real-time PCR analysis showed that cyp19a1b transcript levels in the brain were reduced by approximately half in cyp19a1b<sup>+/−</sup> males and were nearly absent in cyp19a1b<sup>−/−</sup> males, consistent with nonsense-mediated mRNA decay

      In addition, AlphaFold 3-based structural modeling indicated that the mutant Cyp19a1b protein lacks essential motifs, including the aromatic region and heme-binding loop, and exhibits severe conformational distortion (see figure; key structural features are annotated as follows: membrane helix (blue), aromatic region (red), and heme-binding loop (orange)). 

      Results:

      Line 101: The following text has been added: “Loss of cyp19a1b function was further confirmed by measuring cyp19a1b transcript levels in the brain and by predicting the three-dimensional structure of the mutant protein. Real-time PCR revealed that transcript levels were reduced by half in cyp19a1b<sup>+/−</sup> males and were nearly undetectable in cyp19a1b<sup>−/−</sup> males, presumably as a result of nonsense-mediated mRNA decay (Lindeboom et al., 2019) (Figure 1C). The wild-type protein, modeled by AlphaFold 3, exhibited a typical cytochrome P450 fold, including the membrane helix, aromatic region, and hemebinding loop, all arranged in the expected configuration (Figure 1—figure supplement 1C). The mutant protein, in contrast, was severely truncated, retaining only the membrane helix (Figure 1—figure supplement 1C). The absence of essential domains strongly indicates that the allele encodes a nonfunctional Cyp19a1b protein. Together, transcript and structural analyses consistently demonstrate that the mutation generated in this study causes a complete loss of cyp19a1b function.”

      Materials and methods

      Line 438: A subsection entitled “Real-time PCR” has been added. The text of this subsection is as follows: “Total RNA was isolated from the brains of cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males using the RNeasy Plus Universal Mini Kit (Qiagen, Hilden, Germany). cDNA was synthesized with the SuperScript VILO cDNA Synthesis Kit (Thermo Fisher Scientific, Waltham, MA). Real-time PCR was performed on the LightCycler 480 System II using the LightCycler 480 SYBR Green I Master (Roche Diagnostics). Melting curve analysis was conducted to verify that a single amplicon was obtained in each sample. The β-actin gene (actb; GenBank accession number NM_001104808) was used to normalize the levels of target transcripts. The primers used for real-time PCR are shown in Supplementary file 2.”

      Line 448: A subsection entitled “Protein structure prediction” has been added. The text of this subsection is as follows: “Structural predictions of Cyp19a1b proteins were conducted using AlphaFold 3 (Abramson et al., 2024). Amino acid sequences corresponding to the wild-type allele and the mutant allele generated in this study were submitted to the AlphaFold 3 prediction server. The resulting models were visualized with PyMOL (Schrödinger, New York, NY), and key structural features, including the membrane helix, aromatic region, and heme-binding loop, were annotated.”

      References

      The following two references have been added:

      Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, Ronneberger O, Willmore L, Ballard AJ, Bambrick J, Bodenstein SW, Evans DA, Hung CC, O'Neill M, Reiman D, Tunyasuvunakool K, Wu Z, Žemgulytė A, Arvaniti E, Beattie C, Bertolli O, Bridgland A, Cherepanov A, Congreve M, CowenRivers AI, Cowie A, Figurnov M, Fuchs FB, Gladman H, Jain R, Khan YA, Low CMR, Perlin K, Potapenko A, Savy P, Singh S, Stecula A, Thillaisundaram A, Tong C, Yakneen S, Zhong ED, Zielinski M, Žídek A, Bapst V, Kohli P, Jaderberg M, Hassabis D, Jumper JM. 2024. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630:493–500. DOI: https://doi.org/10.1038/s41586-024-07487-w

      Lindeboom RGH, Vermeulen M, Lehner B, Supek F. 2019. The impact of nonsense-mediated mRNA decay on genetic disease, gene editing and cancer immunotherapy. Nature Genetics 51:1645–1651.DOI:https://doi.org/10.1038/s41588-019-0517-5

      Figure 1

      The real-time PCR results described above have been incorporated in Figure 1, panel C, with the corresponding legend provided below (line 788).

      (C) Brain cyp19a1b transcript levels in cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males (n = 6 per genotype). Mean value for cyp19a1b<sup>+/+</sup> males was arbitrarily set to 1.

      The subsequent panels have been renumbered accordingly. The entirety of the revised Figure 1.

      Figure 1—figure supplement 1

      The AlphaFold 3-generated structural models described above have been incorporated in Figure 1— figure supplement 1, panel C, with the corresponding legend provided below (line 811).

      (C) Predicted three-dimensional structures of wild-type (left) and mutant (right) Cyp19a1b proteins. Key structural features are annotated as follows: membrane helix (blue), aromatic region (red), and heme-binding loop (orange).

      The entirety of the revised Figure 1—figure supplement 1 is presented

      The information on the primers used for real-time PCR has been included in Supplementary file 2.

      The functional deficiency of esr2a was already addressed in the previous revision. For clarity, we have reproduced the relevant information here.

      A previous study reported that female medaka lacking esr2a fail to release eggs due to oviduct atresia (Kayo et al., 2019, Sci Rep 9:8868). Similarly, in this study, some esr2a-deficient females exhibited spawning behavior but were unable to release eggs, although the sample size was limited (Δ8 line: 2/3; Δ4 line: 1/1). In contrast, this was not observed in wild-type females (Δ8 line: 0/12; Δ4 line: 0/11). These results support the effective loss of esr2a function. To incorporate this information into the manuscript, the following text has been added to the Materials and methods (line 423): “A previous study reported that esr2a-deficient female medaka cannot release eggs due to oviduct atresia (Kayo et al., 2019). Likewise, some esr2a-deficient females generated in this study, despite the limited sample size, exhibited spawning behavior but were unable to release eggs (Δ8 line: 2/3; Δ4 line: 1/1), while such failure was not observed in wild-type females (Δ8 line: 0/12; Δ4 line: 0/11). These results support the effective loss of esr2a function.”

      Most experiments are weakly powered (low sample size).

      This comment is essentially the same as one raised in the first review (Reviewer #3’s comment 7 on weaknesses). We acknowledge the reviewer’s concern that the histological analyses were weakly powered due to the limited sample size. In our earlier revision, we responded as follows:

      Histological analyses were conducted with a relatively small sample size, as our previous experience suggested that interindividual variability in the results would not be substantial. Since significant differences were detected in many analyses, further increasing the sample size was deemed unnecessary.

      The variability of the mRNA content for a same target gene between experiments (genotype comparison vs E2 treatment comparison) raises questions about the reproducibility of the data (apparent disappearance of genotype effect).

      This comment is the same as one raised in the first review (Reviewer #3’s comment 8 on weaknesses), which we already addressed in our initial revision. For the reviewer’s convenience, we provide the response below:

      As the reviewer pointed out, the overall area of ara expression is larger in Figure 2J than in Figure 2F. However, the relative area ratios of ara expression among brain nuclei are consistent between the two figures, indicating the reproducibility of the results. Thus, this difference is unlikely to affect the conclusions of this study.

      Additionally, the differences in ara expression in pPPp and arb expression in aPPp between wild-type and cyp19a1b-deficient males appear less pronounced in Figures 2J and 2K than in Figures 2F and 2H. This is likely attributable to the smaller sample size used in the experiments for Figures 2J and 2K, resulting in less distinct differences. However, as the same genotype-dependent trends are observed in both sets of figures, the conclusion that ara and arb expression is reduced in cyp19a1b-deficient male brains remains valid.

      Conclusions:

      Overall, the claims regarding role of estrogens originating in the brain on male sexual behavior is supported by converging evidence from multiple mutant lines. The role of brain-derived estrogens on gene expression in the brain is weaker as are the results in females. 

      We appreciate the reviewer’s positive evaluation of our findings on male behavior. The concern regarding the role of brain-derived estrogens in gene expression has been addressed in our rebuttal, and the female data have been removed so that the analysis now focuses on males. The specific revisions for removing the female data are described in Response to reviewer #1’s comment 6 on weaknesses.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      The manuscript is improved slightly. I am thankful the authors addressed some concerns, but for several concerns the referees raised, the authors acknowledged them yet did not make corresponding changes to the manuscript or disagreed that they were issues at all without explanation. All reviewers had issues with the imbalanced focus on males versus females and the male aggression assay. Yet, they did not perform additional experiments or even make changes to the framing and scope of the manuscript. If the authors had removed the female data, they may have had a more cohesive story, but then they would still be left with inadequate behavior assays in the males. If the authors don't have the time or resources to perform the additional work, then they should have said so. However, the work would be incomplete relative to the claims. That is a key point here. If they change their scope and claims, the authors avoid overstating their findings. I want to see this work published because I believe it moves the field forward. But the authors need to be realistic in their interpretations of their data. 

      In response to this and related comments, we have removed the female data and focused the manuscript on analyses in males. The specific revisions are described in Response to reviewer #1’s comment 6 on weaknesses. Additionally, we have validated that the cyp19a1b mutation in our medaka leads to loss of function (see Response to reviewer #3’s comment 1 on weaknesses), which further strengthens the reliability of our conclusions regarding male behavior.

      I agree with the reviewer who said we need to see validation of the absence of functional cyp19a1 b in the brain. However, the results from staining for the protein and performing in situ could be quizzical. Indeed, there aren't antibodies that could distinguish between aromatase a and b, and it is not uncommon for expression of a mutated gene to be normal. One approach they could do is measure aromatase activity, but they are *sort of* doing that by measuring brain E2. It's not perfect, but we teleost folks are limited in these areas. At the very least, they should show the predicted protein structure of the mutated aromatase alleles. It could show clearly that the tertiary structure is utterly absent, giving more support to the fact that their aromatase gene is non-functional. 

      As noted above, we have further validated the loss of cyp19a1b function by measuring cyp19a1b transcript levels in the brain and predicting the three-dimensional structure of the mutant protein. These analyses confirmed that cyp19a1b function is indeed lost, thereby increasing the reliability of our conclusions. For further details, please refer to Response to reviewer #3’s comment 1 on weaknesses.

      With all of this said, the work is important, and it is possible that with a reframing of the impact of their work in the context of their findings, I could consider the work complete. I think with a proper reframing, the work is still impactful. 

      In accordance with this feedback, and as described above, we have reframed the manuscript by removing the female data and focusing exclusively on males. This revision clarifies the scope of our study and reinforces the support for our conclusions. For further details, please refer to Response to reviewer #1’s comment 6 on weaknesses.

      (1) Clearly state in the Figure 1 legend that each data point for male aggressive behaviors represents the total # of behaviors calculated over the 4 males in each experimental tank.

      In response to this comment, we have revised the legend of Figure 1K (line 797). The original legend, “(K) Total number of each aggressive act observed among cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, or cyp19a1<sup>−/−</sup> males in the tank (n = 6, 7, and 5, respectively),” has been updated to “(K) Total number of each aggressive act performed by cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males. Each data point represents the sum of acts recorded for the 4 males of the same genotype in a single tank (n = 6, 7, and 5 tanks, respectively).” This clarifies that each data point reflects the total behaviors of the 4 males within each tank.

      (2) The authors wrote under "Response to reviewer #1's major comment "...the development of male behaviors may require moderate neuroestrogen levels that are sufficient to induce the expression of ara and arb, but not esr2b, in the underlying neural circuitry": "This may account for the lack of aggression recovery in E2-treated cyp19a1b-deficient males in this study.".

      What is meant by the latter statement? What accounts for the lack of aggression? The lack of increase in esr2b? Please clarify. 

      Line 365: In response to this comment, “This may account for the lack of aggression recovery in E2treated cyp19a1b-deficient males in this study.” has been revised to “Considering this, the lack of aggression recovery in E2-treated cyp19a1b-deficient males in this study may be explained by the possibility that the E2 dose used was sufficient to induce not only ara and arb but also esr2b expression in aggression-relevant circuits, which potentially suppressed aggression.”

      This revision clarifies that, while moderate brain estrogen levels are sufficient to promote male behaviors via induction of ara and arb, the E2 dose used in this study may have additionally induced esr2b in circuits relevant to aggression, potentially underlying the lack of aggression recovery.

      (3) This is a continuation of my comment/concern directly above. If the induction of ara and arb aren't enough, then how can, as the authors state, androgen signaling be the primary driver of these behaviors? 

      In response to this follow-up comment, we would like to clarify that, as described above, the lack of aggression recovery in E2-treated cyp19a1b-deficient males is not due to insufficient induction of ara and arb, but instead is likely because esr2b was also induced in aggression-relevant circuits, which may have suppressed aggression. Therefore, the concern that androgen signaling cannot be the primary driver of these behaviors is not applicable.

      (4) The authors' point about sticking with the terminology for the ar genes as "ara" and "arb" is not convincing. The whole point of needing a change to match the field of neuroendocrinology as a whole (that is, across all vertebrates) is researchers, especially those with high standing like the Okubo group, adopt the new terminology. Indeed, the Okubo group is THE leader in medaka neuroendocrinology. It would go a long way if they began adopting the new terminology of "ar1" and "ar2". I understand this may be laborious to a degree, and each group can choose to use their terminology, but I'd be remiss if I didn't express my opinion that changing the terminology could help our field as a whole. 

      We sincerely appreciate the reviewer’s thoughtful comments regarding nomenclature consistency in vertebrate neuroendocrinology. We understand the motivation behind the suggestion to adopt ar1 and ar2. However, we consider the established nomenclature of ara and arb to be more appropriate for the following reasons.

      First, adopting the ar1/ar2 nomenclature would introduce a discrepancy between gene and protein symbols. According to the NCBI International Protein Nomenclature Guidelines (Section 2B.Abbreviations and symbols;

      https://www.ncbi.nlm.nih.gov/genbank/internatprot_nomenguide/), the ZFIN Zebrafish Nomenclature Conventions (Section 2. PROTEINS:https://zfin.atlassian.net/wiki/spaces/general/pages/1818394635/ZFIN+Zebrafish+Nomenclature+Con ventions), and the author guidelines of many journal

      (e.g.,https://academic.oup.com/molehr/pages/Gene_And_Protein_Nomenclature), gene and protein symbols should be identical (with proteins designated in non-italic font and with the first letter capitalized). Maintaining consistency between gene and protein symbols helps avoid unnecessary confusion. The ara/arb nomenclature allows this, whereas ar1/ar2 does not.

      Second, the two androgen receptor genes in teleosts are paralogs derived from the third round of wholegenome duplication that occurred early in teleost evolution. For such duplicated genes, the ZFIN Zebrafish Nomenclature Conventions (Section 1.2. Duplicated genes) recommend appending the suffixes “a” and “b” to the approved symbol of the human or mouse ortholog. This convention clearly indicates that these genes are whole-genome duplication paralogs and provides an intuitive way to represent orthologous and paralogous relationships between teleost genes and those of other vertebrates. As a result, it has been widely adopted, and we consider it logical and beneficial to apply the same principle to androgen receptors.

      In light of these considerations, we respectfully maintain that the ara/arb nomenclature is more suitable for the present manuscript than the alternative ar1/ar2 system.

      (5) In the discussion please discuss these potentially unexpected findings.

      (a) gal was unaffected in female cyp19a1 mutants, but they exhibit mating behaviors towards females. Given gal is higher in males and these females act like females, what does this mean about the function of gal/its utility in being a male-specific marker (is it one??)? 

      (b) esr2b expression is higher in female cyp19a1 mutants. this is unexpected as well given esr2b is required for female-typical mating and is higher in females compared to males and E2 increases esr2b expression. please explain...well, what this means for our idea of what esr2b expression tell us. 

      We thank the reviewer for the insightful comments. As the female data have been removed from the manuscript, discussion of these findings in female cyp19a1b mutants is no longer necessary.

      Reviewer #3 (Recommendations For The Authors):

      The authors have addressed a number of answers to the reviewer's comments, notably they provided missing methodological information and rephrased the text. However, the authors have not addressed the main issues raised by the reviewers. Notably, it is regrettable that the reduced amount of brain aromatase cannot be confirmed, this seems to be the primary step when validating a new mutant. Even if protein products of the two genes may not be discriminated (which I can understand), it should be possible to evaluate the expression of a common messenger and/or peptide and confirm that aromatase expression is reduced in the brain. Since Cyp19a1b is relatively more abundant in the brain Cyp19a1a, this would strengthen the conclusion and provide confidence that the mutant indeed does silence aromatase expression in the brain. Although these short comings are acknowledged in the rebuttal letter, this is not mentioned in the discussion. Doing so would make the manuscript more transparent and clearer. 

      As noted in Response to reviewer #3’s comment 1 on weaknesses, we have validated the loss of Cyp19a1b function by measuring its transcript levels in the brain and predicting the three-dimensional structure of the mutant protein. These analyses confirmed that Cyp19a1b function is indeed lost, thereby increasing the reliability of our conclusions.

      FigS1 - panels C&D please indicate in which tissue were hormones measured. Blood?

      We thank the reviewer for pointing this out. In our study, “peripheral” refers to the caudal half of the body excluding the head and visceral organs, not blood. Accordingly, we have revised the figure legend and the description in the Materials and Methods section as follows:

      Legend for Figure 1B (line 787) now reads: “Levels of E2, testosterone, and 11KT in the brain (A) and peripheral tissues (caudal half of the body) (B) of adult cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males (n = 3 per genotype).”

      Materials and methods (line 431): The sentence “Total lipids were extracted from the brain and peripheral tissues (from the caudal half) of” has been revised to “Total lipids were extracted from the brain and from peripheral tissues, specifically the caudal half of the body excluding the head and visceral organs, of.”

      Additional Alterations:

      We have reformatted the text and supporting materials to comply with the journal’s Author Guidelines. The following changes have been made:

      (1) Figures and supplementary files are now provided separately from the main text.

      (2) The title page has been reformatted without any changes to its content.

      (3) In-text citations have been changed from numerical references to the author–year format.

      (4) Figure labels have been revised from “Fig. 1,” “Fig. S1,” etc., to “Figure 1,” “Figure 1—figure supplement 1,” etc.

      (5) Table labels have been revised from “Table S1,” etc., to “Supplementary file 1,” etc.

      (6) Line 324: The typo “is” has been corrected to “are”.

      (7) Line 382: The section heading “Materials and Methods” has been changed to “Materials and methods” (lowercase “m”).

      (8) Line 383: The Key Resources Table has been placed at the beginning of the Materials and methods section.

      (9) Line 389: The sentence “Sexually mature adults (2–6 months) were used for experiments, and tissues were consistently sampled 1–5 hours after lights on.” has been revised to “Sexually mature adults (2–6 months) were used for experiments and assigned randomly to experimental groups. Tissues were consistently sampled 1–5 hours after lights on.”

      (10)  Line 393: The sentence “All fish were handled in accordance with the guidelines of the Institutional Animal Care and Use Committee of the University of Tokyo.” has been removed.

      (11)  Line 589: The following sentence has been added: “No power analysis was conducted due to the lack of relevant data; sample size was estimated based on previous studies reporting inter-individual variation in behavior and neural gene expression in medaka.”

      (12)  Line 598: The reference list has been reordered from numerical sequence to alphabetical order by author.

      (13)  In the figure legends, notations such as “A and B” have been revised to “A, B.”

    1. eLife Assessment

      This paper describes a useful Bayesian model to estimate the probabilities of access, use, and use given access of insecticide-treated bed nets (ITNs), by using sub-national cross-sectional survey data and the annual number of ITNs received at the country level. The authors provide convincing evidence to support their modeling approach, which could be enhanced by more validation and exploration of model assumptions.

    2. Reviewer #1 (Public review):

      Summary:

      This paper provides a novel method to improve the accuracy of predictions of the impact of ITN strategies, by using sub-national estimates of the duration of ITN access and use over time from cross-sectional survey data and annual country ITNs received.

      Strengths:

      The approach is novel, makes use of available data, and has considered all of the relevant components of ITN distributions.

      Weaknesses:

      (1) The main message of the paper was not very clear, and did not seem to fit the title. The title focuses on sub-national tailoring of ITN, but the abstract did not feature results directly about SNT. It was not very clear what the main result of the paper was - there are several ITN observations in the results and discussion. Most did not seem to be directly about SNT, but rather sub-national differences in use and access were accounted for in the analyses. It was not clear if the same conclusions would be reached without accounting for sub-national differences, but the estimates and predictions could be expected to be more accurate.

      (2) Some of the results seemed to me to be apparent even without a modelling exercise (eg high coverage could not be maintained between campaigns, use would be higher with 2-yearly distributions rather than 3-yearly) or were not in themselves new insights (eg estimates of the duration of use). It would be helpful to clearly state what the novel results are in the abstract, the first paragraph of the discussion and the conclusions, and to make sure that the title is consistent.

      (3) On L236, the link to SNT is stated: "the models indicate trends that can support sub-national tailoring of ITNs". They could indeed, but SNT itself is not done in this paper. It seems to be about improving sub-national predictions of the impact of single ITN strategies, by taking into account sub-national variation in access and use duration. This is useful, and the model developed has novel aspects.

      (4) Individual countries may have records on when nets were distributed to the regions rather than needing to use the annual country number of nets together with the DHS data. It could be helpful to say what the analysis steps would be in that case.

      (5) There were several assumptions that needed to be made in building the model. There is some validation of the timing of the distributions (L633 "verified where possible through discussion with interested parties nationally and internationally") and the fit of estimated access and use to survey data, and agreement between predictions of prevalence and MAP estimates. It would be helpful to say which assumptions are important for the results (and would be key knowledge gaps) and which would not make a difference. It might be possible to validate the net timing model using a country where net distributions are known reasonably well.

      (6) What was assumed about what happens to old nets after a mass campaign was not clear. This assumption is likely to affect the predictions of access for the biennial distributions.

      (7) L312 and elsewhere: That use given access declines with net age is plausible. However, I wondered if this could be partly a consequence of the assumptions in the model (eg the two exponential decays for access and use, the possible assumption that new nets displace the current ones when there is a mass campaign).

      (8) The Methods section on Estimating historical use and access seemed to be aimed at readers familiar with formulae, but I think it could lose other interested readers. It could be useful to explain a little more about what is happening at each step and also why.

      (9) The model was fitted to MAP estimates of PfPR2-10, which themselves come from a model. It may be that there is different uncertainty in the MAP estimates for different regions. I couldn't see this on the graph, but maybe the uncertainty is small. Was this taken into account in the fitting?

      (10) Was uncertainty from each estimated component integrated into the other components?

      (11) Eyeballing Figure 2 (Burkina Faso), there is a general pattern of decline in all the regions, some differences between the regions and some differences in how well the model fits between the regions. If possible, it could be helpful to say how much better the fit was when using region-specific compared to countrywide parameter values for access and use, and how different the results would be.

      (12) The question of moving from a campaign every three to every two years may not be the most pertinent question in the current funding landscape. I realise that a paper is in development for a long time, but it would be helpful to comment on what else the model could be used for when fewer rather than more nets are likely to be available.

    3. Reviewer #2 (Public review):

      Summary:

      The authors design a custom Bayesian model to estimate the probabilities of access, use and use given access of insecticide-treated nets in six African countries, providing sub-national estimates and inferring the average duration of ITN use and access. An individual-based model was employed to simulate malaria epidemics and estimate the effectiveness of different ITN distribution strategies. The study finds that the mean probability of use or access did not reach 80% (a universal coverage formely targeted by WHO) for any of the regions, even for biennial campaigns, demonstrates that switching from triennial to biennial distribution campaigns increases population use by 7.9%, and evaluates the impact of employing more efficient ITNs on P. falciparum prevalence.

      Strengths:

      The authors developed a data-driven model that accounts for data collection imperfections and sources of uncertainty while differentiating between ITN use and access. They developed a methodology to infer the timing of a mass campaign from publicly available data instead of assuming fixed dates. The probability of use given access allows for determining the regions where ITN distribution is least effective. This work can help better inform future interventions by identifying regions where increasing mass campaign frequency or employing better ITNs are most effective. Finally, in addition to insights on ITN access and use for the six countries analyzed, the paper contributes a methodological framework that can likely be extended to other countries.

      Weaknesses:

      Since the models employed are rather complex, the description of the methodology may be hard to follow for most readers. In addition, the models assume many hypotheses, including:

      (1) Exponential decay of ITN use/access.

      (2) The decay rates for the probability of the ITN repelling and killing a mosquito are the same.

      (3) Given a time instant, all individuals in the same administrative unit and have the same probability of using a net;

      (4) ITN use/access decay models do not depend on the distribution strategy (e.g. bienal vs trienal distribution).

      (5) The Bayesian model assumes some narrow prior distributions.

      The impact of these hypotheses on the estimated parameters is not explored in the paper, and no sensitivity analyses are performed, although some limitations are discussed.

    4. Author response:

      We would like to thank both reviewers for taking the time to review the manuscript in detail. Your comments have been extremely useful and constructive. A revised version of the manuscript will seek to address the weaknesses raised, clarifying the reasons for the assumptions made, the impact they have and how they influence the policy implication of the work. We will clarify the language to differentiate the work from the standard sub-national tailoring which is typically conducted to support National Malaria Programmes and emphasise why our mechanistic model can provide greater information than simple summary statistics.

    1. eLife Assessment

      The authors provide a useful integrated analytical approach to investigating MASLD focused on diverse multiomic integration methods. The strength of evidence for this new resource is solid, as analyses highlight the importance of previously-described pathophysiologic processes, as well as unveil several new mechanisms as key features of MASLD in obese patients.

    2. Reviewer #1 (Public review):

      Summary:

      Metabolic dysfunction-associated steatotic liver disease (MASLD) ranges from simple steatosis, steatohepatitis, fibrosis/cirrhosis, and hepatocellular carcinoma. In the current study, the authors aimed to determine the early molecular signatures differentiating patients with MASLD associated fibrosis from those patients with early MASLD but no symptoms. The authors recruited 109 obese individuals before bariatric surgery. They separated the cohorts as no MASLD (without histological abnormalities) and MASLD. The liver samples were then subjected to transcriptomic and metabolomic analysis. The serum samples were subjected to metabolomic analysis. The authors identified dysregulated lipid metabolism, including glyceride lipids, in the liver samples of MASLD patients compared to the no MASLD ones. Circulating metabolomic changes in lipid profiles slightly correlated with MASLD, possibly due to the no MASLD samples derived from obese patients. Several genes involved in lipid droplet formation were also found elevated in MASLD patients. Besides, elevated levels of amino acids, which are possibly related to collagen synthesis, were observed in MASLD patients. Several antioxidant metabolites were increased in MASLD patients. Furthermore, dysregulated genes involved in mitochondrial function and autophagy were identified in MASLD patients, likely linking oxidative stress to MASLD progression. The authors then determined the representative gene signatures in the development of fibrosis by comparing this cohort with the other two published cohorts. Top enriched pathways in fibrotic patients included GTPas signaling and innate immune responses, suggesting the involvement of GTPas in MASLD progression to fibrosis. The authors then challenged human patient derived 3D spheroid system with a dual PPARa/d agonist and found that this treatment restored the expression levels of GTPase-related genes in MASLD 3D spheroids. In conclusion, the authors suggested the involvement of upregulated GTPase-related genes during fibrosis initiation.

      Concerns from first round of review:

      (1) A recent study, via proteomic and transcriptomic analysis, revealed that four proteins (ADAMTSL2, AKR1B10, CFHR4 and TREM2) could be used to identify MASLD patients at risk of steatohepatitis (PMID: 37037945). It is not clear why the authors did not include this study in their comparison.

      (2) The authors recruited 109 patients but only performed transcriptomic and metabolomic analysis in 94 liver samples. Why did the authors exclude other samples?

      (3) The authors mentioned clinical data in Table 1 but did not present the table in this manuscript.

      (4) The generated metabolomic data could be a very useful resource to the MASLD community. However, it is very confusing how the data was generated in those supplemental tables. There is no clear labeling of human clinical information in those tables. Also, what do those values mean in columns 47-154? This reviewer assumed that they are the raw data of metabolomic analysis in plasma samples. However, without clear clinical information in these patients, it is impossible that any scientist can use the data to reproduce the authors' findings.

      (5) In Fig. 5B, the authors excluded the steatosis and fibrosis overlapped genes. Steatosis and fibrosis specific genes could simply reflect the outcomes rather than causes. In this case, the obtained results might not identify the gene signatures related to fibrosis initiation.

      (6 In Fig. 6D, the authors used 3D liver spheroid to validate their findings. However, there is no images showing the 3D liver spheroid formation before and after PPARa/d agonist treatment. It is not clear whether the 3D liver spheroid was successfully established.

      (7) The authors suggested that targeting LX-2 cells with Rac1 and Cdc42 inhibitors could reduce collagen production. Did the authors observe these two genes upregulated in mRNA and protein expression levels in their cohort when compared MASLD patients with and without fibrosis?

      (8) Did the authors observe that the expression levels of Rac1 and Cdc42 are correlated with fibrosis progression in MASLD patients?

      (9) Other studies have revealed several metabolite changes related to MASLD progression (PMID: 35434590, PMID: 22364559). However, the authors did not discuss the discrepancies between their findings with the previous studies.

      Significance:

      Overall, the current study might provide some new resources regarding transcriptomic and metabolomic data derived from obese patients with and without MASLD. The MASLD research community will be interested in the resource data.

      Comments on revised version:

      Thank you for the authors' responses to my concerns. I do not have any further comments.

    3. Reviewer #2 (Public review):

      In this paper, Kaldis and collaborators investigate the molecular heterogeneity of a 109 morbidly obese patient cohort, focusing on liver transcriptomics and metabolomics analysis from liver and serum. The main finding (i.e. upregulation of GTPase-coding genes) was validated in spheroids and a human HSC cell line. As these proteins are involved in critical cellular functions related to metabolism and cytoskeleton dynamics, these findings shed light on their involvement in human liver pathology which so far has been poorly (or even not) documented to date. This is an interesting addition to the current knowledge about chronic liver pathology and warranting further in-depth molecular investigations to address molecular mechanisms of action (cellular specificity, GTPase-driven pathways...).

      Strengths:

      Using a well-characterized patient cohort of moderate size, the study provide transcriptomic and metabolomic data of high quality with adequate statistical corrections which are a very useful resource for the community. Mechanistic experiments usefully hint at novel druggable targets in the early steps of fibrosis, hence probably in hepatic stellate cell activation.

      Weaknesses:

      Cross comparisons with other cohorts is informative but of limited interest due to patient classification issues, inherent to histological staging practices. The lack of correlation between transcriptomic and metabolomic data is deceptive but expected due to the systemic nature of metabolomic analysis and was also observed in recently published papers.

      Comments on revised version:

      I have no further comment about this amended version, aside from suggesting to add (if known) the time at which biopsies were collected. Time-of-day is an important yet often overlooked parameter of gene expression variation, and along the same line, the imposed fasting to bariatric surgery patients is also a matter of variation of gene expression and of metabolite abundance. It is hoped that future investigations will more precisely characterize the role of the newly identified targets in MASLD.

    4. Reviewer #3 (Public review):

      Summary:

      Metabolic dysfunction associated liver disease (MASLD) describes a spectrum of progressive liver pathologies linked to life style-associated metabolic alterations (such as increased body weight and elevated blood sugar levels), reaching from steatosis over steatohepatitis to fibrosis and finally end stage complications, such as liver failure and hepatocellular carcinoma. Treatment options for MASLD include diet adjustments, weight loss, and the receptor-β (THR-β) agonist resmetirom, but remain limited at this stage, motivating further studies to elucidate molecular disease mechanisms to identify novel therapeutic targets.

      In their present study, the authors aim to identify early molecular changes in MASLD linked to obesity. To this end, they study a cohort of 109 obese individuals with no or early-stage MASLD combining measurements from two anatomic sides: 1. bulk RNA-sequencing and metabolomics of liver biopsies, and 2. metabolomics from patient blood. Their major finding is that GTPase-related genes are transcriptionally altered in livers of individuals with steatosis with fibrosis compared to steatosis without fibrosis.

      Comments from the first round of review:

      (1) Confounders (such as (pre-)diabetes)

      The patient table shows significant differences in non-MASLD vs. MASLD individuals, with the latter suffering more often from diabetes or hypertriglyceridemia. Rather than just stating corrections, subgroup analyses should be performed (accompanied with designated statistical power analyses) to infer the degree to which these conditions contribute to the observations. I.e., major findings stating MASLD-associated changes should hold true in the subgroup of MASLD patients without diabetes/of female sex and so forth (testing for each of the significant differences between groups).

      Post-rebuttal update: The authors have performed the requested sub-group analysis and find the gene signatures hold for the non-diabetic sub-cohort, but not the diabetic subgroup. They denote a likely interaction between fibrosis and diabetes, that was not corrected for in the original analysis.

      (2) External validation

      Additionally, to back up the major GTPase signature findings, it would be desirable to analyze an external dataset of (pre)diabetes patients (other biased groups) for alternations in these genes. It would be important to know if this signature also shows in non-MASLD diabetic patients vs. healthy patients or is a feature specific to MASLD. Also, could the matched metabolic data be used to validate metabolite alterations that would be expected under GTPase-associated protein dysregulation?

      Post-rebuttal update: The authors confirm that with the present data, insulin resistance cannot be fully ruled out as a confounder to the GTP-ase related gene signature. They however plan future mouse model experiments to study whether the GTPase-fibrosis signature differs in diabetic vs. non-diabetic conditions.

      (3).3D liver spheroid MASH model, Fig. 6D/E

      This 3D experiment is technically not an external validation of GTPase-related genes being involved in MASLD, since patient-derived cells may only retain changes that have happened in vivo. To demonstrate that the GTPase expression signature is specifically invoked by fibrosis the LX-2 set up is more convincing, however, the up-regulation of the GTPase-related genes upon fibrosis induction with TGF-beta, in concordance with the patient data, needs to be shown first (qPCR or RNA-seq). Additionally, the description of the 3D model is too uncritical. The maintenance of functional PHHs is a major challenge (PMID: 38750036, PMID: 21953633, PMID: 40240606, PMID: 31023926). It cannot be ruled out that their findings are largely attributable to either 1) the (other present) mesenchymal cells (i.e., mesenchyme-derived cells, such as for example hepatic stellate cells, not to be confused with mesenchymal stem cells, MSCs), or 2) related to potential changes in PHHs in culture, and these limitations need to be stated.

      Post-rebuttal update: To address the concern of other cells than hepatocytes contributing to the observed effects in culture, the authors performed TGF-beta treatment in independent mono-cultures (Figure R4): LX-2 and hepatocytes, and the spheroid system. Surprisingly, important genes highlighted in Figure 6E for the spheroid system (RAB6A, ARL4A, RAB27B, DIRAS2) are all absent from this qPCR(?) validation experiment. The authors evaluate instead RAC1, RHOU, VAV1, DOCK2, RAB32. ­In spheroids, RHOU and RAB32 are down-regulated with TGF-B. In hepatocytes DOCK2 and RAC seemed up-regulated. They find no difference in these genes in LX-2 cells. Surprisingly, ACTA2 expression values are missing for LX-2 cells. Together, it is hard to judge which individual cell type recapitulates the changes observed in patients in this validation experiment, as the major genes called out in Figure 6E are not analyzed.

      Unfortunately, the 3D liver spheroid model used (as presente­d in PMID39605182) lacks important functional validation tests of maintained hepatocyte identity in culture (at the very least Albumin expression and secretion plus CYP3A4 assay). This functional data (acquired at the time point in culture when the RNA expression analysis in 6E was performed) is indispensable prior to stating that mature hepatocytes cause the observed effects.

      (4) Novelty / references

      Similar studies that also combined liver and blood lipidomics/metabolomics in obese individuals with and without MASLD (e.g. PMID 39731853, 39653777) should be cited. Additionally, it would benefit the quality of the discussion to state how findings in this study add new insights over previous studies, if their findings/insights differ, and if so, why.

      Post-rebuttal update: The authors have included the studies into their discussion.

    1. eLife Assessment

      Argunşah et al. investigate the mechanisms underlying the differential response dynamics of barrel vs septa domains in shaping the responses to single vs multiple whiskers. Based on the observation of a higher density of SST+ interneurons in the septa, the authors investigate the hypothesis that Elfn1-dependent short-term plasticity shapes these responses. This important study is, however, supported by incomplete evidence; factors restricting the strength of evidence are the limited spatial resolution of the multi-unit activity, as well as the lack of a mechanistic explanation. This provocative and intellectually stimulating hypothesis provides a contribution to work on how different cell types shape cortical representation.

    2. Reviewer #1 (Public Review):

      Summary:

      Argunşah et al. describe and investigate the mechanisms underlying the differential response dynamics of barrel vs septa domains in the whisker-related primary somatosensory cortex (S1). Upon repeated stimulation, the authors report that the response ratio between multi- and single-whisker stimulation increases in layer (L) 4 neurons of the septal domain, while remaining constant in barrel L4 neurons. The authors attribute this divergence to differences in short-term synaptic plasticity, particularly within somatostatin-expressing (SST⁺) interneurons. This interpretation is supported by 1) the increased density of SST+ neurons in L4 of the septa compared to barrel domain, 2) the stronger response of (L2/3) SST+ neurons to repeated multi- vs single-whisker stimulation and 3) the reduced functional difference in single- versus multi-whisker response ratios across barrel and septal domains in Elfn1 KO mice, which lack a synaptic protein that confers characteristic short-term plasticity, notably in SST+ neurons. Consistently, a decoder trained on WT data fails to generalize to Elfn1 KO responses. Finally, the authors report a relative enrichment of S2- and M1-projecting cell densities in L4 of the septal domain compared to the barrel domain, suggesting that septal and barrel circuits may differentially route information about single vs multi-whisker stimulation downstream of S1.

      Strengths:

      This paper describes and aims to study a circuit underlying differential response between barrel columns and septal domains of the primary somatosensory cortex. This work supports the view these two domains contribute distinctly to the processing single versus multi-whisker inputs and highlight the role of SST+ neuron and their short-term plasticity. Together, this study suggests that the barrel cortex multiplexes whisker-derived sensory information across its domains, enabling parallel processing within S1.

      Weaknesses:

      Although the divergence in responses to repeated single- versus multi-whisker stimulation between barrel and septal domains is consistent with a role for SST⁺ neuron short-term plasticity, the evidence presented does not conclusively demonstrate that this mechanism is the critical driver of the difference. The lack of targeted recordings and manipulations limits the strength of this conclusion: SST⁺ neuron activity is not measured in L4, nor is it assessed in a domain-specific manner. The Elfn1 knockout manipulation does not appear to selectively affect either stimulus condition, domain or interneuron subtype. Finally, all experiments were performed under anesthesia, which raises concerns about how well the reported dynamics generalize to awake cortical processing.

    3. Reviewer #2 (Public review):

      Summary:

      Argunsah and colleagues demonstrate that SST expressing interneurons are concentrated in the mouse septa and differentially respond to repetitive multi-whisker inputs. Identifying how a specific neuronal phenotype impacts responses is an advance.

      Strengths:

      (1) Careful physiological and imaging studies.

      (2) Novel result showing the role of SST+ neurons in shaping responses.

      (3) Good use of a knockout animal to further the main hypothesis.

      (4) Clear analytical techniques.

      Comments on revisions:

      The authors have effectively responded to my initial critiques - I have no further concerns.

    4. Reviewer #3 (Public review):

      Summary:

      This study investigates the functional differences between barrel and septal columns in the mouse somatosensory cortex, focusing on how local inhibitory dynamics (particularly involving SST⁺ interneurons) may mediate temporal integration of multi-whisker (MW) stimuli in septa. Using a combination of in vivo multi-unit recordings, calcium imaging, and anatomical tracing, the authors propose a model in which Elfn1-dependent synaptic facilitation onto SST⁺ interneurons contributes to the distinct sensory responses to MW input in barrels and septa, enabling functional segregation between these domains.

      Strengths:

      The study presents a thought-provoking and useful conceptual model for understanding sensory processing in the somatosensory cortex. While barrel columns have been widely studied, septal regions remain relatively understudied in mice. If septa indeed act as selective integrators of distributed sensory input, this would suggest a novel computational role for cortical microcircuits beyond the classical view focused on barrels. Although still hypothetical, the proposed model in which SST⁺ interneurons contribute to domain-specific sensory responses between barrel and septal domains is intriguing and opens new avenues for investigating inhibitory circuit mechanisms.

      Weaknesses:

      The primary limitation of this study lies in the spatial and cellular specificity of the recording techniques. The physiological data rely predominantly on unsorted multi-unit activity (MUA) recorded with low-channel-count silicon probes. Because MUA aggregates signals from multiple neurons over a radius of approximately 50-100 µm (often wider than the typical septal width in mice), this approach makes it difficult to confidently isolate activity originating strictly from within septal domains. The manuscript would benefit from additional analyses to validate the spatial specificity of these recordings, such as systematically varying spike detection thresholds to test the robustness of domain attribution, as suggested by the reviewer. Furthermore, although the authors now appropriately frame their findings in the Elfn1 knockout mice as indirect evidence, it is worth emphasizing that the study lacks direct in vivo, cell-type-specific recordings and manipulations to more definitively test the proposed mechanism.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Reviews):

      Summary:

      Argunşah et al. describe and investigate the mechanisms underlying the differential response dynamics of barrel vs septa domains of the whisker-related primary somatosensory cortex (S1). Upon repeated stimulation, the authors report that the response ratio between multi- and single-whisker stimulation increases in layer (L) 4 neurons of the septal domain, while remaining constant in barrel L4 neurons. This difference is attributed to the short-term plasticity properties of interneurons, particularly somatostatin-expressing (SST+) neurons. This claim is supported by the increased density of SST+ neurons found in L4 of the septa compared to barrels, along with a stronger response of (L2/3) SST+ neurons to repeated multi- vs single-whisker stimulation. The role of the synaptic protein Elfn1 is then examined. Elfn1 KO mice exhibited little to no functional domain separation between barrel and septa, with no significant difference in single- versus multi-whisker response ratios across barrel and septal domains. Consistently, a decoder trained on WT data fails to generalize to Elfn1 KO responses. Finally, the authors report a relative enrichment of S2- and M1-projecting cell densities in L4 of the septal domain compared to the barrel domain.

      Strengths:

      This paper describes and aims to study a circuit underlying differential response between barrel columns and septal domains of the primary somatosensory cortex. This work supports the view that barrel and septal domains contribute differently to processing single versus multi-whisker inputs, suggesting that the barrel cortex multiplexes sensory information coming from the whiskers in different domains.

      We thank the reviewer for the very neat summary of our findings that barrel cortex multiplexes converging information in separate domains.

      Weaknesses:

      While the observed divergence in responses to repeated SWS vs MWS between the barrel and septal domains is intriguing, the presented evidence falls short of demonstrating that short-term plasticity in SST+ neurons critically underpins this difference. The absence of a mechanistic explanation for this observation limits the work’s significance. The measurement of SST neurons’ response is not specific to a particular domain, and the Elfn1 manipulation does not seem to be specific to either stimulus type or a particular domain.

      We appreciate the reviewer’s perspective. Although further research is needed to understand the circuit mechanisms underlying the observed phenomenon, we believe our data suggest that altering the short-term dynamics of excitatory inputs onto SST neurons reduces the divergent spiking dynamics in barrels versus septa during repetitive single- and multi-whisker stimulation. Future work could examine how SST neurons, whose somata reside in barrels and septa, respond to different whisker stimuli and the circuits in which they are embedded. At this time, however, the authors believe there is no alternative way to test how the short-term dynamics of excitatory inputs onto SST neurons, as a whole, contribute to the temporal aspects of barrel versus septa spiking.

      The study's reach is further constrained by the fact that results were obtained in anesthetized animals, which may not generalize to awake states.

      We appreciate the reviewer’s concern regarding the generalizability of our findings from anesthetized animals to awake states. Anesthesia was employed to ensure precise individual whisker stimulation (and multi-whisker in the same animal), which is challenging in awake rodents due to active whisking. While anesthesia may alter higher-order processing, core mechanisms, such as short and long term plasticity in the barrel cortex, are preserved under anesthesia (Martin-Cortecero et al., 2014; Mégevand et al., 2009).

      The statistical analysis appears inappropriate, with the use of repeated independent tests, dramatically boosting the false positive error rate.

      Thank you for your feedback on our analysis using independent rank-based tests for each time point in wild-type (WT) animals. To address concerns regarding multiple comparisons and temporal dependencies (for Figure 1F and 4D for now but we will add more in our revision), we performed a repeated measures ANOVA for WT animals (13 Barrel, 8 Septa, 20 time points), which revealed a significant main effect of Condition (F(1,19) = 16.33, p < 0.001) and a significant Condition-Time interaction (F(19,361) = 2.37, p = 0.001). Post-hoc tests confirmed significant differences between Barrel and Septa at multiple time points (e.g., p < 0.0025 at times 3, 4, 6, 7, 8, 10, 11, 12, 16, 19 after Bonferroni posthoc correction), supporting a differential multi-whisker vs. single-whisker ratio response in WT animals. In contrast, a repeated measures ANOVA for knock-out (KO) animals (11 Barrel, 7 Septa, 20 time points) showed no significant main effect of Condition (F(1,14) = 0.17, p = 0.684) or Condition-Time interaction (F(19,266) = 0.73, p = 0.791), indicating that the BarrelSepta difference observed in WT animals is absent in KO animals.

      Furthermore, the manuscript suffers from imprecision; its conclusions are occasionally vague or overstated. The authors suggest a role for SST+ neurons in the observed divergence in SWS/MWS responses between barrel and septal domains. However, this remains speculative, and some findings appear inconsistent. For instance, the increased response of SST+ neurons to MWS versus SWS is not confined to a specific domain. Why, then, would preferential recruitment of SST+ neurons lead to divergent dynamics between barrel and septal regions? The higher density of SST+ neurons in septal versus barrel L4 is not a sufficient explanation, particularly since the SWS/MWS response divergence is also observed in layers 2/3, where no difference in SST+ neuron density is found.

      Moreover, SST+ neuron-mediated inhibition is not necessarily restricted to the layer in which the cell body resides. It remains unclear through which differential microcircuits (barrel vs septum) the enhanced recruitment of SST+ neurons could account for the divergent responses to repeated SWS versus MWS stimulation.

      We fully appreciate the reviewer’s comment. We currently do not provide any evidence on the contribution of SST neurons in the barrels versus septa in layer 4 on the response divergence of spiking observed in SWS versus MWS. We only show that these neurons differentially distribute in the two domains in this layer. It is certainly known that there is molecular and circuit-based diversity of SST-positive neurons in different layers of the cortex, so it is plausible that this includes cells located in the two domains of vS1, something which has not been examined so far. Our data on their distribution are one piece of information that SST neurons may have a differential role in inhibiting barrel stellate cells versus septa ones. Morphological reconstructions of SST neurons in L4 of the somatosensory barrel cortex has shown that their dendrites and axons project locally and may confine to individual domains, even though not specifically examined (Fig. 3 of Scala F et al., 2019). The same study also showed that L4 SST cells receive excitatory input from local stellate cells) and is known that they are also directly excited by thalamocortical fibers (Beierlein et al., 2003; Tan et al., 2008), both of which facilitate.

      As shown in our supplementary figure, the divergence is also observed in L2/3 where, as the reviewer also points out, where we do not have a differential distribution of SST cells, at least based on a columnar analysis extending from L4. There are multiple scenarios that could explain this “discrepancy” that one would need to examine further in future studies. One straightforward one is that the divergence in spiking in L2/3 domains may be inherited from L4 domains, where L4 SST act on. Another is that even though L2/3 SST neurons are not biased in their distribution their input-output function is, something which one would need to examine by detailed in vitro electrophysiological and perhaps optogenetic approaches in S1. Despite the distinctive differences that have been found between the L4 circuitry in S1 and V1 (Scala F et al., 2019), recent observations indicate that small but regular patches of V1 marked by the absence of muscarinic receptor 2 (M2) have high temporal acuity (Ji et al., 2015), and selectively receive input from SST interneurons (Meier et al., 2025). Regions lacking M2 have distinct input and output connectivity patterns from those that express M2 (Meier et al., 2021; Burkhalter et al., 2023). These findings, together with ours, suggest that SST cells preferentially innervate and regulate specific domains columns- in sensory cortices.

      Regardless of the mechanism, the Elfn1 knock-out mouse line almost exclusively affects the incoming excitability onto SST neurons (see also reply to comment below), hence what can be supported by our data is that changing the incoming short-term synaptic plasticity onto these neurons brings the spiking dynamics between barrels and septa closer together.

      The Elfn1 KO mouse model seems too unspecific to suggest the role of the short-term plasticity in SST+ neurons in the differential response to repeated SWS vs MWS stimulation across domains. Why would Elfn1-dependent short-term plasticity in SST+ neurons be specific to a pathway, or a stimulation type (SWS vs MWS)? Moreover, the authors report that Elfn1 knockout alters synapses onto VIP+ as well as SST+ neurons (Stachniak et al., 2021; previous version of this paper)-so why attribute the phenotype solely to SST+ circuitry? In fact, the functional distinctions between barrel and septal domains appear largely abolished in the Elfn1 KO.

      Previous work by others and us has shown that globally removing Elfn1 selectively removes a synaptic process from the brain without altering brain anatomy or structure. This allows us to study how the temporal dynamics of inhibition shape activity, as opposed to inhibition from particular cell types. We will nevertheless update the text to discuss more global implications for SST interneuron dynamics and include a reference to VIP interneurons that contain Elfn1.

      When comparing SWS to MWS, we find that MWS replaces the neighboring excitation which would normally be preferentially removed by short-term plasticity in SST interneurons, thus providing a stable control comparison across animals and genotypes. On average, VIP interneurons failed to show modulation by MWS. We were unable to measure a substantial contribution of VIP cells to this process and also note that the Elfn1 expressing multipolar neurons comprise only ~5% of VIP neurons (Connor and Peters, 1984; Stachniak et al., 2021), a fraction that may be lost when averaging from 138 VIP cells. Moreover, the effect of Elfn1 loss on VIP neurons is quite different and marginal compared to that of SST cells, suggesting that the primary impact of Elfn1 knockout is mediated through SST+ interneuron circuitry. Therefore, even if we cannot rule out that these 5% of VIP neurons contribute to barrel domain segregation, we are of the opinion that their influence would be very limited if any.

      Reviewer #2 (Public Reviews):

      Summary:

      Argunsah and colleagues demonstrate that SST-expressing interneurons are concentrated in the mouse septa and differentially respond to repetitive multi-whisker inputs. Identifying how a specific neuronal phenotype impacts responses is an advance.

      Strengths:

      (1)  Careful physiological and imaging studies.

      (2)  Novel result showing the role of SST+ neurons in shaping responses.

      (3)  Good use of a knockout animal to further the main hypothesis.

      (4)  Clear analytical techniques.

      We thank the reviewer for their appreciation of the study.

      Weaknesses:

      No major weaknesses were identified by this reviewer. Overall, I appreciated the paper but feel it overlooked a few issues and had some recommendations on how additional clarifications could strengthen the paper. These include:

      (1) Significant work from Jerry Chen on how S1 neurons that project to M1 versus S2 respond in a variety of behavioral tasks should be included (e.g. PMID: 26098757). Similarly, work from Barry Connor’s lab on intracortical versus thalamocortical inputs to SST neurons, as well as excitatory inputs onto these neurons (e.g. PMID: 12815025) should be included.

      We thank the reviewer for these valuable resources that we overlooked. We will include Chen et al. (2015), Cruikshank et al. (2007) and Gibson et al. (1999) to contextualize S1 projections and SST+ inputs, strengthening the study’s foundation as well as Beierlein et al. (2003) which nicely show both local and thalamocortical facilitation of excitatory inputs onto L4 SST neurons, in contrast to PV cells. The paper also shows the gradual recruitment of SST neurons by thalamocortical inputs to provide feed-forward inhibition onto stellate cells (regular spiking) of the barrel cortex L4 in rat.

      (2) Using Layer 2/3 as a proxy to what is happening in layer 4 (~line 234). Given that layer 2/3 cells integrate information from multiple barrels, as well as receiving direct VPm thalamocortical input, and given the time window that is being looked at can receive input from other cortical locations, it is not clear that layer 2/3 is a proxy for what is happening in layer 4.

      We agree with the reviewer that what we observe in L2/3 is not necessarily what is taking place in L4 SST-positive cells. The data on L2/3 was included to show that these cells, as a population, can show divergent responses when it comes to SWS vs MWS, which is not seen in L2/3 VIP neurons. Regardless of the mechanisms underlying it, our overall data support that SST-positive neurons can change their activation based on the type of whisker stimulus and when the excitatory input dynamics onto these neurons change due to the removal of Elfn1 the recruitment of barrels vs septa spiking changes at the temporal domain. Having said that, the data shown in Supplementary Figure 3 on the response properties of L2/3 neurons above the septa vs above the barrels (one would say in the respective columns) do show the same divergence as in L4. This suggests that a circuit motif may exist that is common to both layers, involving SST neurons that sit in L4, L5 or even L2/3. This implies that despite the differences in the distribution of SST neurons in septa vs barrels of L4 there is an unidentified input-output spatial connectivity motif that engages in both L2/3 and L4. Please also see our response to a similar point raised by reviewer 1.

      (3) Line 267, when discussing distinct temporal response, it is not well defined what this is referring to. Are the neurons no longer showing peaks to whisker stimulation, or are the responses lasting a longer time? It is unclear why PV+ interneurons which may not be impacted by the Elfn1 KO and receive strong thalamocortical inputs, are not constraining activity.

      We thank the reviewer for their comment and will clarify the statement.

      This convergence of response profiles was further clear in stimulus-aligned stacked images, where the emergent differences between barrels and septa under SWS were largely abolished in the KO (Figure 4B). A distinction between directly stimulated barrels and neighboring barrels persisted in the KO. In addition, the initial response continued to differ between barrel and septa and also septa and neighbor (Figure 4B). This initial stimulus selectivity potentially represents distinct feedforward thalamocortical activity, which includes PV+ interneuron recruitment that is not directly impacted by the Elfn1 KO (Sun et al., 2006; Tan et al., 2008). PV+ cells are strongly excited by thalamocortical inputs, but these exhibit short-term depression, as does their output, contrasting with the sustained facilitation observed in SST+ neurons. These findings suggest that in WT animals, activity spillover from principal barrels is normally constrained by the progressive engagement of SST+ interneurons in septal regions, driven by Elfn1-dependent facilitation at their excitatory synapses. In the absence of Elfn1, this local inhibitory mechanism is disrupted, leading to longer responses in barrels, delayed but stronger responses in septa, and persistently stronger responses in unstimulated neighbors, resulting in a loss of distinction between the responses of barrel and septa domains that normally diverge over time (see Author response image 1 below).

      Author response image 1.

      (A) Barrel responses are longer following whisker stimulation in KO. (B) Septal responses are slightly delayed but stronger in KO. (C) Unstimulated neighbors show longer persistent responses in KO.

       

      (4) Line 585 “the earliest CSD sink was identified as layer 4…” were post-hoc measurements made to determine where the different shank leads were based on the post-hoc histology?

      Post hoc histology was performed on plane-aligned brain sections which would allow us to detect barrels and septa, so as to confirm the insertion domains of each recorded shank. Layer specificity of each electrode therefore could therefore not be confirmed by histology as we did not have coronal sections in which to measure electrode depth.

      (5) For the retrograde tracing studies, how were the M1 and S2 injections targeted (stereotaxically or physiologically)? How was it determined that the injections were in the whisker region (or not)?

      During the retrograde virus injection, the location of M1 and S2 injections was determined by stereotaxic coordinates (Yamashita et al., 2018). After acquiring the light-sheet images, we were able to post hoc examine the injection site in 3D and confirm that the injections were successful in targeting the regions intended. Although it would have been informative to do so, we did not functionally determine the whisker-related M1 and whisker-related S2 region in this experiment.

      (6) Were there any baseline differences in spontaneous activity in the septa versus barrel regions, and did this change in the KO animals?

      Thank you for this interesting question. Our previous study found that there was a reduction in baseline activity in L4 barrel cortex of KO animals at postnatal day (P)12, but no differences were found at P21 (Stachniak et al., 2023).

      Reviewer #3 (Public Reviews):

      Summary:

      This study investigates the functional differences between barrel and septal columns in the mouse somatosensory cortex, focusing on how local inhibitory dynamics, particularly involving Elfn1-expressing SST⁺ interneurons, may mediate temporal integration of multiwhisker (MW) stimuli in septa. Using a combination of in vivo multi-unit recordings, calcium imaging, and anatomical tracing, the authors propose that septa integrate MW input in an Elfn1-dependent manner, enabling functional segregation from barrel columns.

      Strengths:

      The core hypothesis is interesting and potentially impactful. While barrels have been extensively characterized, septa remain less understood, especially in mice, and this study's focus on septal integration of MW stimuli offers valuable insights into this underexplored area. If septa indeed act as selective integrators of distributed sensory input, this would add a novel computational role to cortical microcircuits beyond what is currently attributed to barrels alone. The narrative of this paper is intellectually stimulating.

      We thank the reviewer for finding the study intellectually stimulating.

      Weaknesses:

      The methods used in the current study lack the spatial and cellular resolution needed to conclusively support the central claims. The main physiological findings are based on unsorted multi-unit activity (MUA) recorded via low-channel-count silicon probes. MUA inherently pools signals from multiple neurons across different distances and cell types, making it difficult to assign activity to specific columns (barrel vs. septa) or neuron classes (e.g., SST⁺ vs. excitatory).

      The recording radius (~50-100 µm or more) and the narrow width of septa (~50-100 µm or less) make it likely that MUA from "septal" electrodes includes spikes from adjacent barrel neurons.

      The authors do not provide spike sorting, unit isolation, or anatomical validation that would strengthen spatial attribution. Calcium imaging is restricted to SST⁺ and VIP⁺ interneurons in superficial layers (L2/3), while the main MUA recordings are from layer 4, creating a mismatch in laminar relevance.

      We thank the reviewer for pointing out the possibility of contamination in septal electrodes. Importantly, it may not have been highlighted, although reported in the methods, but we used an extremely high threshold (7.5 std, in methods, line 583) for spike detection in order to overcome the issue raised here, which restricts such spatial contaminations. Since the spike amplitude decays rapidly with distance, at high thresholds, only nearby neurons contribute to our analysis, potentially one or two. We believe that this approach provides a very close approximation of single unit activity (SUA) in our reported data. We will include a sentence earlier in the manuscript to make this explicit and prevent further confusion.

      Regarding the point on calcium imaging being performed on L2/3 SST and VIP cells instead of L4. Both reviewer 1 and 2 brought up the same issue and we responded as follows. As shown in our supplementary figure, the divergence is also observed in L2/3 where we do not have a differential distribution of SST cells, at least based on a columnar analysis extending from L4. There are multiple scenarios that could explain this “discrepancy” that one would need to examine further in future studies. One straightforward one is that the divergence in spiking in L2/3 domains may be inherited from L4 domains, where L4 SST act on. Another is that even though L2/3 SST neurons are not biased in their distribution their input-output function is, something which one would need to examine by detailed in vitro electrophysiological and perhaps optogenetic approaches in S1. Despite the distinctive differences that have been found between the L4 circuitry in S1 and V1 (Scala F et al., 2019), recent observations indicate that small but regular patches of V1 marked by the absence of muscarinic receptor 2 (M2) have high temporal acuity (Ji et al., 2015), and selectively receive input from SST interneurons (Meier et al., 2025). Regions lacking M2 have distinct input and output connectivity patterns from those that express M2 (Meier et al., 2021; Burkhalter et al., 2023). These findings, together with ours, suggest that SST cells preferentially innervate and regulate specific domains -columns- in sensory cortices.

      Furthermore, while the role of Elfn1 in mediating short-term facilitation is supported by prior studies, no new evidence is presented in this paper to confirm that this synaptic mechanism is indeed disrupted in the knockout mice used here.

      We thank Reviewer #3 for noting the absence of new evidence confirming Elfn1’s disruption of short-term facilitation in our knockout mice. We acknowledge that our study relies on previously strong published data demonstrating that Elfn1 mediates short-term synaptic facilitation of excitatory inputs onto SST+ interneurons (Sylwestrak and Ghosh, 2012; Tomioka et al., 2014; Stachniak et al., 2019, 2023). These studies consistently show that Elfn1 knockout abolishes facilitation in SST+ synapses, leading to altered temporal dynamics, which we hypothesize underlies the observed loss of barrel-septa response divergence in our Elfn1 KO mice (Figure 4). Nevertheless, to address the point raised, we will clarify in the revised manuscript (around lines 245-247 and 271-272) that our conclusions are based on these established findings, stating: “Building on prior evidence that Elfn1 knockout disrupts short-term facilitation in SST+ interneurons (Sylwestrak and Ghosh, 2012; Tomioka et al., 2014; Stachniak et al., 2019, 2023), we attribute the abolished barrel-septa divergence in Elfn1 KO mice to altered SST+ synaptic dynamics, though direct synaptic measurements were not performed here.”

      Additionally, since Elfn1 is constitutively knocked out from development, the possibility of altered circuit formation-including changes in barrel structure and interneuron distribution, cannot be excluded and is not addressed.

      We thank Reviewer #3 for raising the valid concern that constitutive Elfn1 knockout could potentially alter circuit formation, including barrel structure and interneuron distribution. To address this, we will clarify in the revised manuscript (around line ~271 and in the Discussion) that in our previous studies that included both whole-cell patch-clamp in acute brain slices ranging from postnatal day 11 to 22 (P11 - P21) and in vivo recordings from barrel cortex at P12 and P21, we saw no gross abnormalities in barrel structure, with Layer 4 barrels maintaining their characteristic size and organization, consistent with wildtype (WT) mice (Stachniak et al., 2019, 2023). While we cannot fully exclude subtle developmental changes, prior studies indicate that Elfn1 primarily modulates synaptic function rather than cortical cytoarchitecture (Tomioka et al., 2014). Elfn1 KO mice show no gross morphological or connectivity differences and the pattern and abundance of Elfn1 expressing cells (assessed by LacZ knock in) appears normal (Dolan and Mitchell, 2013).

      We will add the following to the Discussion: “Although Elfn1 is constitutively knocked out, we find here and in previous studies that barrel structure is preserved (Stachniak et al., 2019, 2023). Further, the distribution of Elfn1 expressing interneurons is not different in KO mice, suggesting minimal developmental disruption (Dolan and Mitchell, 2013).

      Nonetheless, we acknowledge that subtle circuit changes cannot be ruled out without the usage of time-depended conditional knockout of the gene.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) My biggest concern is regarding statistics. Did the authors repeatedly apply independent tests (Mann-Whitney) without any correction for multiple comparisons (Figures 1 and 4)? In that case, the chances of a spurious "significant" result rise dramatically. 

      In response to the reviewer’s comment, we now present new statistical results by utilizing ANOVA and blended these results in the manuscript between lines 172 and 192 for WT data and 282 and 298 for Elfn1 KO data. This new statistical approach shows the same differences as we had previously reported, hence consolidating the statements made. 

      (2) The findings only hint at a mechanism involving SST+ neurons for how SWS and MWS are processed differently in the barrel vs septal domains. As a direct test of SST+ neuron involvement in the divergence of barrel and septal responses, the authors might consider SST-specific manipulations - for example, inhibitory chemo- or optogenetics during SWS and MWS stimulation.

      We thank the reviewer for this comment and agree that a direct manipulation of SST+ neurons via inhibitory chemo- or opto-genetics could provide further supporting evidence for the main claims in our study. We have opted out from performing these experiments for this manuscript as we feel they can be part of a future study.  At the same time, it is conceivable that such manipulations and depending on how they are performed may lead to larger and non-specific effects on cortical activity, since SST neurons will likely be completely shut down. So even though we certainly appreciate and value the strengths of such approaches, our experiments have addressed a more nuanced hypothesis, namely that the synaptic dynamics onto SST+ neurons matter for response divergence of septa versus barrels, which could not have been easily and concretely addressed by manipulating SST+ cell firing activity.  

      (3) In general, it is hard to comprehend what microcircuit could lead to the observed divergence in the MWS/SWS ratio in the barrel vs septal domain. There preferential recruitment of SST+ neurons during MWS is not specific to a particular domain, and the higher density of SST+ neurons specifically in L4 septa cannot per se explain the diverging MWS/SWS ratio in L4 septal neurons since similar ratio divergence is observed across domains in L2/3 neurons without increase SST+ neuron density in L2/3. This view would also assume that SST+ inhibition remains contained to its own layer and domain. Is this the case? Is it that different microcircuits between barrels and septa differently shape the response to repeated MWS? This is partially discussed in the paper; can the authors develop on that? What would the proposed mechanism be? Can the short-term plasticity of the thalamic inputs (VPM vs POm) be part of the picture?

      We thank the reviewer for raising this important point. We propose that the divergence in MWS/SWS ratios across barrel and septal domains arises from dynamic microcircuit interactions rather than static anatomical features such as SST+ density, which we describe and can provide a hint. In L2/3, where SST+ density is uniform, divergence persists, suggesting that trans-laminar and trans-domain interactions are key. Barrel domains, primarily receiving VPM inputs, exhibit short-term depression onto excitatory cells and engage PV+ and SST+ neurons to stabilize the MWS/SWS ratio, with Elfn1-dependent facilitation of SST+ neurons gradually increasing inhibition during repetitive SWS. Septal domains, in contrast, are targeted by facilitating POm inputs, combined with higher L4 SST+ density and Elfn1-mediated facilitation, producing progressive inhibitory buildup that amplifies the MWS/SWS ratio. SST+ projections in septa may extend trans-laminarly and laterally, influencing L2/3 and neighboring barrels, thereby explaining L2/3 divergence despite uniform SST+ density in L2/3. In this regards, direct laminar-dependent manipulations will be required to confirm whether L2/3 divergence is inherited from L4 dynamics. In Elfn1 KO mice, the loss of facilitation in SST+ neurons likely flattens these dynamics, disrupting functional segregation. Future experiments using VPM/POm-specific optogenetic activation and SST+ silencing will be critical to directly test this model.

      We expanded the discussion accordingly.

      (4) Can the decoder generalize between SWS and MWS? In this condition, if the decoder accuracy is higher for barrels than septa, it would support the idea that septa are processing the two stimuli differently. 

      Our results show that septal decoding accuracy is generally higher than barrel accuracy when generalizing from multi-whisker stimulation (MWS) to single-whisker stimulation (SWS), indicating distinct information processing in septa compared to barrels.

      In wild-type (WT) mice, septal accuracy exceeds barrel accuracy across all time windows (150ms, 51-95ms, 1-95ms), with the largest difference in the 51-95ms window (0.9944 vs. 0.9214 at pulse 20, 10Hz stimulation). This septal advantage grows with successive pulses, reflecting robust, separable neural responses, likely driven by the posterior medial nucleus (POm)’s strong MWS integration contrasting with minimal SWS activation. Barrel responses, driven by consistent ventral posteromedial nucleus (VPM) input for both stimuli, are less distinguishable, leading to lower accuracy.

      In Elfn1 knockout (KO) mice, which disrupt excitatory drive to somatostatin-positive (SST+) interneurons, barrel accuracy is higher initially in the 1-50ms window (0.8045 vs. 0.7500 at pulse 1), suggesting reduced early septal distinctiveness. However, septal accuracy surpasses barrels in later pulses and time windows (e.g., 0.9714 vs. 0.9227 in 51-95ms at pulse 20), indicating restored septal processing. This supports the role of SST+ interneurons in shaping distinct MWS responses in septa, particularly in late-phase responses (51-95ms), where inhibitory modulation is prominent, as confirmed by calcium imaging showing stronger SST+ activation during MWS.

      These findings demonstrate that septa process SWS and MWS differently, with higher decoding accuracy reflecting structured, POm- and SST+-driven response patterns. In Elfn1 KO mice, early deficits in septal processing highlight the importance of SST+ interneurons, with later recovery suggesting compensatory mechanisms. 

      We have added Supplementary Figure 4 and included this interpretation between lines 338353. 

      We thank the reviewer for suggesting this analysis.

      (5) It is not clear to me how the authors achieve SWS. How is it that the pipette tip "placed in contact with the principal whisker" does not detach from the principal whisker or stimulate other whiskers? Please clarify the methods. 

      Targeting the specific principal whisker is performed under the stereoscope.  

      Specifically, we have added this statement in line 628:

      “We trimmed the whiskers where necessary, to avoid them touching each other and to avoid stimulating other whiskers. By putting the pipette tip very close (almost touching) to the principal whisker, the movement of the tip (limited to 1mm) would reliably move the targeted whisker. The specificity of the stimulation of the selected principal whisker was observed under the stereoscope.”

      (6) The method for calculating decoder accuracy is not clearly described-how can accuracy exceed 1? The authors should clarify this metric and provide measures of variability (e.g., confidence intervals or standard deviations across runs) to assess the significance of their comparisons. Additionally, using a consistent scale across all plots would improve interoperability. 

      We thank the reviewer for raising this point. We have now changed the way accuracies are calculated and adopted a common scale among different plots (see updated Figure 5). We have also changed the methods section accordingly.

      (7) Figure 1: The sample size is not specified. It looks like the numbers match the description in the methods, but the sample size should be clearly stated here. 

      These are the numbers the reviewer is inquiring about. 

      WT: (WT) animals: a 280 × 95 × 20 matrix for the stimulated barrel (14 Barrels, 95ms, 20 pulses), a 180 × 95 × 20 matrix for the septa (9 Septa, 95ms, 20 pulses), and a 360 × 95 × 20 matrix for the neighboring barrel (18 Neighboring barrels, 95ms, 20 pulses). N=4 mice.

      KO: 11-barrel columns, 7 septal columns, 11 unstimulated neighbors from N=4 mice.

      Panels D-F are missing axes and axis labels (firing rate, p-value). Panel D is mislabeled (left, middle, and right). I can't seem to find the yellow line. 

      Thank you for this observation. We made changes in the figures to make them easier to navigate based on the collective feedback from the reviewers.

      Why is changing the way to compare the differences in the responses to repeated stimulation between SWS and MWS? 

      To assess temporal accumulation of information, we compared responses to repeated single-whisker stimulation (SWS) and multi-whisker stimulation (MWS) using an accumulative decoding approach rather than simple per-pulse firing rates. This method captures domain-specific integration dynamics over successive pulses.

      The use of the term "principal whisker" is confusing, as it could refer to the whisker that corresponds to the recorded barrel. 

      When we use the term principal whisker, the intention is indeed to refer to the whisker corresponding to the recorded barrel during single whisker stimulation. The term principal whisker is removed from Figure legend 1 and legend S1C where it may have led to  ambiguity.    

      Why the statement "after the start of active whisking"? Mice are under anesthesia here; it does not appear to be relevant for the figure. 

      “After the start of active whisking” refers to the state of the barrel cortex circuitry at the time of recordings. The particular reference we use comes from the habit of assessing sensory processing also from a developmental point of view. The reviewer is correct that it has nothing to do the with the status of the experiment. Nevertheless, since the reviewer found that it may create confusion, we have now taken it out. 

      (8) Figure 3: The y-axis label is missing for panel C. 

      This is now fixed. (dF/F).

      (9) Figure 4: Axis labels are missing.

      Added.

      Minor: 

      (10) Line 36: "progressive increase in septal spiking activity upon multi-whisker stimulation". There is no increase in septal spiking activity upon MWS; the ratio MWS/SWS increases.

      We have changed the sentence as follows: Genetic removal of Elfn1, which regulates the incoming excitatory synaptic dynamics onto SST+ interneurons, leads to the loss of the progressive increase in septal spiking ratio (MWS/SWS) upon stimulation.

      (11) Line 105: domain-specific, rather than column-specific, for consistency.

      We have changed it.

      (12) Lines 173-174: "a divergence between barrel and septa domain activity also occurred in Layer 4 from the 2nd pulse onward (Figure 1E)". The authors only show a restricted number of comparisons. Why not show the p-values as for SWS?

      The statistics is now presented in current Figure 1E.

      (13) Lines 151-153: "Correspondingly, when a single whisker is stimulated repeatedly, the response to the first pulse is principally bottom-up thalamic-driven responses, while the later pulses in the train are expected to also gradually engage cortico-thalamo-cortical and cortico-cortical loops." Can the authors please provide a reference?

      We have now added the following references : (Kyriazi and Simons, 1993; Middleton et al., 2010; Russo et al., 2025).

      (14) Lines 184-186: "Our electrophysiological experiments show a significant divergence of responses over time upon both SWS and MWS in L4 between barrels (principal and neighboring) and adjacent septa, with minimal initial difference". The only difference between the neighboring barrel and septa is the responses to the initial pulse. Can the author clarify? 

      We have now changed the sentence as follows: Our electrophysiological experiments show a significant divergence of responses between domains upon both SWS and MWS in L4. (Line 198 now)

      (15) Line 214: "suggest these interneurons may play a role in diverging responses between barrels and septa upon SWS". Why SWS specifically?

      We have changed the sentence as follows: These results confirmed that SST+ and VIP+ interneurons have higher densities in septa compared to barrels in L4 and suggest these interneurons may play a role in diverging responses between barrels and septa. (Line 231 now).

      (16) Line 235: "This result suggests that differential activation of SST+ interneurons is more likely to be involved in the domain-specific temporal ratio differences between barrels and septa". Why? The results here are not domain-specific.

      We have now revised this statement to: This result suggested that temporal ratio differences specific to barrels and septa might involve differential activation of SST+ interneurons rather than VIP+ interneurons.

      (17) Lines 241-243: "SST+ interneurons in the cortex are known to show distinct short-term synaptic plasticity, particularly strong facilitation of excitatory inputs, which enables them to regulate the temporal dynamics of cortical circuits." Please provide a reference.

      We have now added the following references: (Grier et al., 2023; Liguz-Lecznar et al., 2016).

      (18) Lines 245-247: "A key regulator of this plasticity is the synaptic protein Elfn1, which mediates short-term synaptic facilitation of excitation on SST+ interneurons (Stachniak et al., 2021, 2019; Tomioka et al., 2014)". Is Stachniak et al., 2021 not about the role of Elf1n in excitatory-to-VIP+ neuron synapses?

      The reviewer correctly spotted this discrepancy . This reference has now been removed from this statement.

      (19) Lines 271-272: "Building on our findings that Elfn1-dependent facilitation in SST+ interneurons is critical for maintaining barrel-septa response divergence". The authors did not show that.

      We have now changed the statement to: Building on our findings that Elfn1 is critical for maintaining barrel-septa response divergence  

      (20) Line 280: second firing peak, not "peal".

      Thank you, it is now fixed.

      (21) Lines 304-305: "These results highlight the critical role of Elfn1 in facilitating the temporal integration of 305 sensory inputs through its effects on SST+ interneurons". This claim is also overstated. 

      We have now changed the statement to: These results highlight the contribution of Elfn1 to the temporal integration of sensory inputs. (Line 362)

      (22) Line 329: Any reason why not cite Chen et al., Nature 2013?

      We have now added this reference, as also pointed out by reviewer 1.

      (23) Line 341-342: "wS1" and "wS2" instead of S1 and S2 for consistency.

      Thanks, we have now updated the terms.

      Reviewer #2 (Recommendations for the authors): 

      (1) Figure 3D - the SW conditions are labeled but not the MW conditions (two right graphs) - they should be labeled similarly (SSTMW, VIPMW). 

      The two right graphs in Figure 3D represent paired SW vs MW comparisons of the evoked responses for SST and VIP populations, respectively.

      (2) Figure 6 D and E I think it would be better if the Depth measurements were to be on the yaxis, which is more typical of these types of plots. 

      We thank the reviewer for this comment. Although we appreciate this may be the case, we feel that the current presentation may be easier for the reader to navigate, and we have hence kept it. 

      (3) Having an operational definition of septa versus barrel would be useful. As the authors point out, this is a tough distinction in a mouse, and often you read papers that use Barrel Wall versus Barrel Hollow/Center - operationally defining how these areas were distinguished would be helpful. 

      We thank the reviewer for this comment and understand the point made.

      We have now updated the methods section in line 611: 

      DiI marks contained within the vGlut2 staining were defined as barrel recordings, while DiI marks outside vGlut2 staining were septal recordings.

      Reviewer #3 (Recommendations for the authors): 

      To support the manuscript's major claims, the authors should consider the following:

      (1) Validate the septal identity of the neurons studied, either anatomically or functionally at the single-cell level (e.g., via Ca²⁺ imaging with confirmed barrel/septa mapping). 

      We thank the reviewer for this suggestion, but we feel that these extensive experiments are beyond the scope of this study. 

      (2) Provide both anatomical and physiological evidence to assess the possibility of altered cortical development in Elfn1 KO mice, including potential changes in barrel structure or SST⁺ cell distribution. 

      To address the reviewer’s point, we have now added the following to the Discussion: “Although Elfn1 is constitutively knocked out, we find here and in previous studies that barrel structure is preserved (Stachniak et al., 2019, 2023). Further, the distribution of Elfn1 expressing interneurons is not different in KO mice, suggesting minimal developmental disruption (Dolan and Mitchell, 2013). Nonetheless, we acknowledge that subtle circuit changes cannot be ruled out without conditional knockouts.”,

      (3) Examine the sensory responses of SST⁺ and VIP⁺ interneurons in deeper cortical layers, particularly layer 4, which is central to the study's main conclusions.

      We thank the reviewer for this suggestion and appreciate the value it would bring to the study. We nevertheless feel that these extensive experiments are beyond the scope of this study and hence opted out from performing them. 

      Minor Comments:

      (1)  The authors used a CLARITY-based passive clearing protocol, which is known to sometimes induce tissue swelling or distortion. This may affect anatomical precision, especially when assigning neurons to narrow domains such as septa versus barrels. Please clarify whether tissue expansion was measured, corrected, or otherwise accounted for during analysis.

      Yes, the tissue expansion was accounted during analysis for the laminar specification. We excluded the brains with severe distortion. 

      (2) While the anatomical data are plotted as a function of "depth from the top of layer 4," the manuscript does not specify the precise depth ranges used to define individual cortical layers in the cleared tissue. Given the importance of laminar specificity in projection and cell type analyses, the criteria and boundaries used to delineate each layer should be explicitly stated.

      Thank you for pointing this out. We now include the criteria for delineating each layer in the manuscript. “Given that the depth of Layer 4 (L4) can be reliably measured due to its welldefined barrel boundaries, and that the relative widths of other layers have been previously characterized (El-Boustani et al., 2018), we estimated laminar boundaries proportionally. Specifically, Layer 2/3 was set to approximately 1.3–1.5 times the width of L4, Layer 5a to ~0.5 times, and Layer 5b to a similar width as L4. Assuming uniform tissue expansion across the cortical column, we extrapolated the remaining laminar thicknesses proportionally.”

      (3)  In several key comparisons (e.g., SST⁺ vs. VIP⁺ interneurons, or S2-projecting vs. M1projecting neurons), it is unclear whether the same barrel columns were analyzed across conditions. Given the anatomical and functional heterogeneity across wS1 columns, failing to control for this may introduce significant confounds. We recommend analyzing matched columns across groups or, if not feasible, clearly acknowledging this limitation in the manuscript.

      We thank the reviewer for raising this important point. For the comparison of SST⁺ versus VIP⁺ interneurons, it would in principle have been possible to analyze the same barrel columns across groups. However, because some of the cleared brains did not reach the optimal level of clarity, our choice of columns was limited, and we were not always able to obtain sufficiently clear data from the same columns in both groups. Similarly, for the analysis of S2- versus M1-projecting neurons, variability in the position and spread of retrograde virus injections made it difficult to ensure measurements from identical barrel columns. We have now added a statement in the Discussion to acknowledge this limitation.

      (4) Figure 1C: Clarify what each point in the t-SNE plot represents-e.g., a single trial, a recording channel, or an averaged response. Also, describe the input features used for dimensionality reduction, including time windows and preprocessing steps.

      In response to the reviewer’s comment, we have now added the following in the methods: In summary, each point in the t-SNE plots represents an averaged response across 20 trials for a specific domain (barrel, septa, or neighbor) and genotype (WT or KO), with approximately 14 points per domain derived from the 280 trials in each dataset. The input features are preprocessed by averaging blocks of 20 trials into 1900-dimensional vectors (95ms × 20), which are then reduced to 2D using t-SNE with the specified parameters. This approach effectively highlights the segregation and clustering patterns of neural responses across cortical domains in both WT and KO conditions.

      (5) Figures 1D, E (left panels): The y-axes lack unit labeling and scale bars. Please indicate whether values are in spikes/sec, spikes/bin, or normalized units.

      We have now clarified this. 

      (6) Figures 1D, E (right panels): The color bars lack units. Specify whether the values represent raw firing rates, z-scores, or other normalized measures. Replace the vague term "Matrix representation" with a clearer label such as "Pulse-aligned firing heatmap."

      Thank you, we have now done it.

      (7) Figure 1E (bottom panel): There appears to be no legend referring to these panels. Please define labels such as "B" and "S." 

      Thank you, we have now done it.

      (8) Figure 1E legend: If it duplicates the legend from Figure 1D, this should be made explicit or integrated accordingly. 

      We have changed the structure of this figure.

      (9) Figure 1F: Define "AUC" and explain how it was computed (e.g., area under the firing rate curve over 0-50 ms). Indicate whether the plotted values represent percentages and, if so, label the y-axis accordingly. If normalization was applied, describe the procedure. Include sample sizes (n) and specify what each data point represents (e.g., animal, recording site). 

      The following paragraph has been added in the methods section:

      The Area Under the Curve (AUC) was computed as the integral of the smoothed firing rate (spikes per millisecond) over a 50ms window following each whisker stimulation pulse, using trapezoidal integration. Firing rate data for layer 4 barrel and septal regions in wild-type (WT) and knockout (KO) mice were smoothed with a 3-point moving average and averaged across blocks of 20 trials. Plotted values represent the percentage ratio of multi-whisker (MW) to single whisker (SW) AUC with error bars showing the standard error of the mean. Each data point reflects the mean AUC ratio for a stimulation pulse across approximately 11 blocks (220 trials total). The y-axis indicates percentages.

      (10) Figure 3C: Add units to the vertical axis.

      We have added them.

      (11) Figure 3D: Specify what each line represents (e.g., average of n cells, individual responses?). 

      Each line represents an average response of a neuron.  

      (12) Figure 4C legend: Same with what?". No legend refers to the bottom panels - please revise to clarify. 

      Thank you. We have now changed the figure structure and legends and fixed the missing information issue.

      (13) Supplementary Figure 1B: Indicate the physical length of the scale bar in micrometers. 

      This has been fixed. The scale bar is 250um.

      (14) Indicate the catalog number or product name of the 8×8 silicon probe used for recordings.

      We have added this information. It is the A8x8-Edge-5mm-100-200-177-A64

      References

      (1) Beierlein, M., Gibson, J. R. & Connors, B. W. (2003). Two dynamically distinct inhibitory networks in layer 4 of the neocortex. J. Neurophysiol. 90, 2987–3000.

      (2) Burkhalter, A., D’Souza, R. D. & Ji, W. (2023). Integration of feedforward and feedback information streams in the modular architecture of mouse visual cortex. Annu. Rev. Neurosci. 46, 259–280.

      (3) Chen, J. L., Margolis, D. J., Stankov, A., Sumanovski, L. T., Schneider, B. L. & Helmchen, F. (2015). Pathway-specific reorganization of projection neurons in somatosensory cortex during learning. Nat. Neurosci. 18, 1101–1108.

      (4) Connor, J. R. & Peters, A. (1984). Vasoactive intestinal polypeptide-immunoreactive neurons in rat visual cortex. Neuroscience 12, 1027–1044.

      (5) Cruikshank, S. J., Lewis, T. J. & Connors, B. W. (2007). Synaptic basis for intense thalamocortical activation of feedforward inhibitory cells in neocortex. Nat. Neurosci. 10, 462–468.

      (6) Dolan, J. & Mitchell, K. J. (2013). Mutation of Elfn1 in mice causes seizures and hyperactivity. PLoS One 8, e80491.

      (7) Gibson, J. R., Beierlein, M. & Connors, B. W. (1999). Two networks of electrically coupled inhibitory neurons in neocortex. Nature 402, 75–79.

      (8) Ji, W., Gămănuţ, R., Bista, P., D’Souza, R. D., Wang, Q. & Burkhalter, A. (2015). Modularity in the organization of mouse primary visual cortex. Neuron 87, 632–643.

      (9) Martin-Cortecero, J. & Nuñez, A. (2014). Tactile response adaptation to whisker stimulation in the lemniscal somatosensory pathway of rats. Brain Res. 1591, 27–37.

      (10) Mégevand, P., Troncoso, E., Quairiaux, C., Muller, D., Michel, C. M. & Kiss, J. Z. (2009). Long-term plasticity in mouse sensorimotor circuits after rhythmic whisker stimulation. J. Neurosci. 29, 5326–5335.

      (11) Meier, A. M., Wang, Q., Ji, W., Ganachaud, J. & Burkhalter, A. (2021). Modular network between postrhinal visual cortex, amygdala, and entorhinal cortex. J. Neurosci. 41, 4809– 4825.

      (12) Meier, A. M., D’Souza, R. D., Ji, W., Han, E. B. & Burkhalter, A. (2025). Interdigitating modules for visual processing during locomotion and rest in mouse V1. bioRxiv 2025.02.21.639505.

      (13) Scala, F., Kobak, D., Shan, S., Bernaerts, Y., Laturnus, S., Cadwell, C. R., Hartmanis, L., Froudarakis, E., Castro, J. R., Tan, Z. H., et al. (2019). Layer 4 of mouse neocortex differs in cell types and circuit organization between sensory areas. Nat. Commun. 10, 4174.

      (14) Stachniak, T. J., Sylwestrak, E. L., Scheiffele, P., Hall, B. J. & Ghosh, A. (2019). Elfn1induced constitutive activation of mGluR7 determines frequency-dependent recruitment of somatostatin interneurons. J. Neurosci. 39, 4461–4475.

      (15) Stachniak, T. J., Kastli, R., Hanley, O., Argunsah, A. Ö., van der Valk, E. G. T., Kanatouris, G. & Karayannis, T. (2021). Postmitotic Prox1 expression controls the final specification of cortical VIP interneuron subtypes. J. Neurosci. 41, 8150–8166.

      (16) Stachniak, T. J., Argunsah, A. Ö., Yang, J. W., Cai, L. & Karayannis, T. (2023). Presynaptic kainate receptors onto somatostatin interneurons are recruited by activity throughout development and contribute to cortical sensory adaptation. J. Neurosci. 43, 7101–7118.

      (17) Sun, Q.-Q., Huguenard, J. R. & Prince, D. A. (2006). Barrel cortex microcircuits: Thalamocortical feedforward inhibition in spiny stellate cells is mediated by a small number of fast-spiking interneurons. J. Neurosci. 26, 1219–1230.

      (18) Sylwestrak, E. L. & Ghosh, A. (2012). Elfn1 regulates target-specific release probability at CA1-interneuron synapses. Science 338, 536–540.

      (19) Tan, Z., Hu, H., Huang, Z. J. & Agmon, A. (2008). Robust but delayed thalamocortical activation of dendritic-targeting inhibitory interneurons. Proc. Natl. Acad. Sci. USA 105, 2187–2192.

      (20) Tomioka, N. H., Yasuda, H., Miyamoto, H., Hatayama, M., Morimura, N., Matsumoto, Y., Suzuki, T., Odagawa, M., Odaka, Y. S., Iwayama, Y., et al. (2014). Elfn1 recruits presynaptic mGluR7 in trans and its loss results in seizures. Nat. Commun. 5, 4501.

      (21) Yamashita, T., Vavladeli, A., Pala, A., Galan, K., Crochet, S., Petersen, S. S. & Petersen, C. C. (2018). Diverse long-range axonal projections of excitatory layer 2/3 neurons in mouse barrel cortex. Front. Neuroanat. 12, 33.

    1. eLife Assessment

      This important manuscript provides insights into the competition between Splicing Factor 1 (SF1) and Quaking (QKI) for binding at the ACUAA branch point sequence in a model intron, regulating exon inclusion. The study employs convincing, rigorous transcriptomic, proteomic, and reporter assays, with both mammalian cell culture and yeast models.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Pereira de Castro and coworkers are studying potential competition between a more standard splicing factor SF1 and an alternative splicing factor called QK1. This is interesting because they bind to overlapping sequence motifs and could potentially have opposing effects on promoting the splicing reaction. To test this idea, the authors KD either SF1 or QK1 in mammalian cells and uncover several exons whose splicing regulation follows the predicted pattern of being promoted for splicing by SF1 and repressed by QK1. Importantly, these have introns enriched in SF1 and QK1 motifs. The authors then focus on one exon in particular with two tandem motifs to study the mechanism of this in greater detail and their results confirm the competition model. Mass spec analysis largely agrees with their proposal; however, it is complicated by apparently quick transition of SF1 bound complexes to later splicing intermediates. An inspired experiment in yeast shows how QK1 competition could potentially have a determinental impact on splicing in an orthogonal system. Overall these results show how splicing regulation can be achieved by competition between a "core" and alternative splicing factor and provide additional insight into the complex process of branch site recognition. The manuscript is exceptionally clear and the figures and data very logically presented. The work will be valuable to those in the splicing field who are interested in both mechanism and bioinformatics approaches to deconvolve any apparent "splicing code" being used by cells to regulate gene expression.

      Strengths:

      (1) The main discovery of the manuscript involving evidence for SF1/QK1 competition is quite interesting and important for this field. This evidence has been missing and may change how people think about branch site recognition.

      (2) The experiments and the rationale behind them are clearly and logically presented.

      (3) The experiments are carried out to a high standard and well-designed controls are included.

      (4) The extrapolation of the result to yeast in order to show the potentially devastating consequences of QK1 competition was creative and informative.

      Weaknesses:

      Overall the weaknesses are relatively minor and involve cases where conclusions could potentially have been strengthened with additional experimentation. For example, pull-down of the U2 snRNP could be strengthened by detection of the snRNA whereas the proteins may themselves interact with these factors in the absence of the snRNA. In addition the discussion is a bit speculative given the data, but compelling nonetheless.

    3. Reviewer #3 (Public review):

      Summary:

      In this manuscript the authors were trying to establish whether competition between the RNA binding proteins SF1 and QKI controlled splicing outcomes. These two proteins have similar binding sites and protein sequences, but SF1 lacks a dimerization motif and seems to bind a single version of the binding sequence. Importantly, these binding sequences correspond to branchpoint consensus sequences, with SF1 binding leading to productive splicing, but QKI binding leading instead to association with paraspeckle proteins. They show that in human cells SF1 generally activates exons and QKI represses, and a large group of the jointly regulated exons (43% of joint targets) are reciprocally controlled by SF1 and QKI. They focus on one of these exons RAI14 that shows this reciprocal pattern of regulation, and has 2 repeats of the binding site that make it a candidate for joint regulation, and confirm regulation within a minigene context. The authors used assembly of proteins within nuclear extracts to explain the effect of QKI versus SF1 binding. Finally the authors show that expression of QKI is lethal in yeast, and causes splicing defects.

      How this fits in the field. This study is interesting and provides a conceptual advance by providing a general rule how SF1 and QKI interact with relation to binding sites, and the relative molecular fates followed, so is very useful. Most of the analysis seems to focus on one example, but the choice of this example was carefully explained in the text. The molecular analysis and global work significantly adds to the picture from the previously published paper about NUMB joint regulation by QKI and SF (Zong et al, cited in text as reference 50, that looked at SF1 and QKI binding in relation to a duplicated binding site/branchpoint sequence in NUMB).

      Strengths:

      The data presented are strong and clear. The ideas discussed in this paper are of wide interest, and present a simple model where two binding sites generates a potentially repressive QKI response, whereas exons that have a single upstream sequence are just regulated by SF1. The assembly of splicing complexes on RNAs derived from RAI14 in nuclear extracts, followed by mass spec gave interesting mechanistic insight into what was occurring as a result of QKI versus SF1 binding.

      Weaknesses:

      The authors have addressed the previous weaknesses of the study, resulting in a much stronger manuscript

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important manuscript provides insights into the competition between Splicing Factor 1 (SF1) and Quaking (QKI) for binding at the ACUAA branch point sequence in a model intron, regulating exon inclusion. The study employs rigorous transcriptomic, proteomic, and reporter assays, with both mammalian cell culture and yeast models. Nevertheless, while the data are convincing, broadening the analysis to additional exons and narrowing the manuscript's title to better align with the experimental scope would strengthen the work.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors aimed to show that SF1 and QKI compete for the intron branch point sequence ACUAA and provide evidence that QKI represses inclusion when bound to it.

      Major strengths of this manuscript include:

      (1) Identification of the ACUAA-like motif in exons regulated by QKI and SF1.

      (2) The use of the splicing reporter and mutant analysis to show that upstream and downstream ACUAAC elements in intron 10 of RAI are required for repressing splicing.

      (3) The use of proteomic to identify proteins in C2C12 nuclear extract that binds to the wild type and mutant sequence.

      (4) The yeast studies showing that ectopic lethality when Qki5 expression was induced, due to increased mis-splicing of transcripts that contain the ACUAA element.

      The authors conclusively show that the ACUAA sequence is bound by QKI and provide strong evidence that this leads to differences in exons inclusion and exclusion. In animal cells, and especially in human, branchpoint sequences are degenerate but seem to be recognized by specific splicing factors. Although a subset of splicing factors shows tissue-specific expression patterns most don't, suggesting that yet-to-be-identified mechanisms regulate splicing. This work suggests that an alternate mechanism could be related to the binding affinity of specific RNA binding factors for branchpoint sequences coupled with the level of these different splicing factors in a given cell.

      We thank the reviewer for the positive comments.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Pereira de Castro and coworkers are studying potential competition between a more standard splicing factor SF1, and an alternative splicing factor called QK1. This is interesting because they bind to overlapping sequence motifs and could potentially have opposing effects on promoting the splicing reaction. To test this idea, the authors KD either SF1 or QK1 in mammalian cells and uncover several exons whose splicing regulation follows the predicted pattern of being promoted for splicing by SF1 and repressed by QK1. Importantly, these have introns enriched in SF1 and QK1 motifs. The authors then focus on one exon in particular with two tandem motifs to study the mechanism of this in greater detail and their results confirm the competition model. Mass spec analysis largely agrees with their proposal; however, it is complicated by the apparently quick transition of SF1-bound complexes to later splicing intermediates. An inspired experiment in yeast shows how QK1 competition could potentially have a detrimental impact on splicing in an orthogonal system. Overall, these results show how splicing regulation can be achieved by competition between a "core" and alternative splicing factor and provide additional insight into the complex process of branch site recognition. The manuscript is exceptionally clear and the figures and data are very logically presented. The work will be valuable to those in the splicing field who are interested in both mechanism and bioinformatics approaches to deconvolve any apparent "splicing code" being used by cells to regulate gene expression. Criticisms are minor and the most important of them stem from overemphasis on parts of the manuscript on the evolutionary angle when evolution itself wasn't analyzed per se.

      We thank the reviewer for the positive comments and very clear and fair critical points.

      Strengths:

      (1) The main discovery of the manuscript involving evidence for SF1/QK1 competition is quite interesting and important for this field. This evidence has been missing and may change how people think about branch site recognition.

      (2) The experiments and the rationale behind them are exceptionally clearly and logically presented. This was wonderful!

      Thank you so much. We felt the overall flow of the paper and data make for a nice “story” that conveys a relatively easy-to-understand explanation for a complex subject.

      (3) The experiments are carried out to a high standard and well-designed controls are included.

      (4) The extrapolation of the result to yeast in order to show the potentially devastating consequences of the QK1 competition was very exciting and creative.

      We agree this is a very exciting result and finding! Thanks.

      Weaknesses:

      Overall the weaknesses are relatively minor and involve cases where clarification is necessary, some additional analysis could bolster the arguments, and suggestions for focusing the manuscript on its strengths.

      (1) The title (Ancient...evolutionary outcomes), abstract, and some parts of the discussion focus heavily on the evolutionary implications of this work. However, evolutionary analysis was not performed in these studies (e.g., when did QK1 and SF1 proteins arise and/or diverge? How does this line up with branch site motifs and evolution of U2? Any insight from recent work from Scott Roy et al?). I think this aspect either needs to be bolstered with experimental work/data or this should be tamped down in the manuscript. I suggest highlighting the idea expressed in the sentence "A nuanced implication of this model is that loss-of-function...". To me, this is better supported by the data and potentially by some analysis of mutations associated with human disease.

      We have revised the title and dampened the evolutionary aspects of the previous version of the manuscript.

      (2) One paper that I didn't see cited was that by Tanackovic and Kramer (Mol Biol Cell 2005). This paper is relevant because they KD SF1 and found it nonessential for splicing in vivo. Do their results have implications for those here? How do the results of the KD compare? Could QK1 competition have influenced their findings (or does their work influence the "nuanced implication" model referenced above?)?

      This is an interesting point, and thank you for the suggestion. We have now included a brief description of this study in the Introduction of the revised manuscript and do note that the authors measured intron retention of a beta globin reporter and SF3A1, SF3A2, and SF3A3 during SF1 knockdown, but did not detect elevated unspliced RNA in these targets.

      (3) Can the authors please provide a citation for the statement "degeneracy is observed to a higher degree in organisms with more alternative splicing"? Does recent evolutionary analysis support this?

      We have removed the statement, as it did not add much to the content and I am not sure I can state the concept I was attempting to convey in a simple manner with few citations.

      (4) For the data in Figure 3, I was left wondering if NMD was confounding this analysis. Can the authors respond to this and address this concern directly?

      We have not measured if the reporters used in Figure 3 produce protein(s). Presumably, though, all spliced reporter RNA would be degraded equally (the included/skipped isoforms’ “reading frames” are not altered from one another). This would not be case for unspliced nuclear reporter RNA, however. Given this difference, and that our analysis can not resolve the subcellular localization of the different reporter species, we have removed the measurement of and subsequent results describing unspliced reporter RNA from Figure 3.

      (5) To me, the idea that an engaged U2 snRNP was pulled down in Figure 4F would be stronger if the snRNA was detected. Was that able to be observed by northern or primer extension? Would SF1 be enriched if the U2 snRNA was degraded by RNaseH in the NE?

      We did not measure any co-associating RNAs in this experimental approach, but agree that this approach would strengthen the evidence for it.

      (6) I'm wondering how additive the effects of QK1 and SF1 are... In Figure 2, if QK1 and SF1 are both knocked down, is the splicing of exon 11 restored to "wt" levels?

      This is an interesting question that we were unfortunately unable to address experimentally here.

      (7) The first discussion section has two paragraphs that begin "How does competition between SF1..." and "Relatively little is known about how...". I found the discussion and speculation about localization, paraspekles, and lncRNAs interesting but a bit detracting from the strengths of the manuscript. I would suggest shortening these two paragraphs into a single one.

      We have revised the Discussion.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors were trying to establish whether competition between the RNA-binding proteins SF1 and QKI controlled splicing outcomes. These two proteins have similar binding sites and protein sequences, but SF1 lacks a dimerization motif and seems to bind a single version of the binding sequence. Importantly, these binding sequences correspond to branchpoint consensus sequences, with SF1 binding leading to productive splicing, but QKI binding leading instead to association with paraspeckle proteins. They show that in human cells SF1 generally activates exons and QKI represses, and a large group of the jointly regulated exons (43% of joint targets) are reciprocally controlled by SF1 and QKI. They focus on one of these exons RAI14 that shows this reciprocal pattern of regulation, and has 2 repeats of the binding site that make it a candidate for joint regulation, and confirm regulation within a minigene context. The authors used the assembly of proteins within nuclear extracts to explain the effect of QKI versus SF1 binding. Finally, the authors show that the expression of QKI is lethal in yeast, and causes splicing defects.

      How this fits in the field. This study is interesting and provides a conceptual advance by providing a general rule on how SF1 and QKI interact in relation to binding sites, and the relative molecular fates followed, so is very useful. Most of the analysis seems to focus on one example, although the molecular analysis and global work significantly add to the picture from the previously published paper about NUMB joint regulation by QKI and SF (Zong et al, cited in text as reference 50, that looked at SF1 and QKI binding in relation to a duplicated binding site/branchpoint sequence in NUMB).

      Thank you for the encouraging remarks.

      Strengths:

      The data presented are strong and clear. The ideas discussed in this paper are of wide interest, and present a simple model where two binding sites generate a potentially repressive QKI response, whereas exons that have a single upstream sequence are just regulated by SF1. The assembly of splicing complexes on RNAs derived from RAI14 in nuclear extracts, followed by mass spec gave interesting mechanistic insight into what was occurring as a result of QKI versus SF1 binding.

      Weaknesses:

      I did not think the title best summarises the take-home message and could be perhaps a bit more modest. Although the authors investigated splicing patterns in yeast and human cells, yeast do not have QKI so there is no ancient competition in that case, and the study did not really investigate physiological or evolutionary outcomes in splicing, although it provides interesting speculation on them. Also as I understood it, the important issue was less conserved branchpoints in higher eukaryotes enabling alternative splicing, rather than competition for the conserved branchpoint sequence. So despite the the data being strong and properly analysed and discussed in the paper, could the authors think whether they fit best with the take-home message provided in the title? Just as a suggestion (I am sure the authors can do a better job), maybe "molecular competition between variant branchpoint sequences predict physiological and evolutionary outcomes in splicing"?

      Thank you for this point (Reviewer 2 had a similar comment) and the suggestion. We have revised the title.

      Although the authors do provide some global data, most of the detailed analysis is of RAI14. It would have been useful to examine members of the other quadrants in Figure 1C as well for potential binding sites to give a reason why these are not co-regulated in the same way as RAI14. How many of the RAI14 quadrants had single/double sites (the motif analysis seemed to pull out just one), and could one of the non-reciprocally regulated exons be moved into a different quadrant by addition or subtraction of a binding site or changing the branchpoint (using a minigene approach for example).

      This is an interesting point that we have considered. Our intent with the focus on RAI14 was to use a naturally occurring intron bps with evidence of strong QKI binding that did not require a high degree of sequence manipulation or engineering.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Most of my recommendations are really centered on the figures. In their current state, they detract from the data shown and could be improved: I recommend the authors use a uniform font. For example, Figure 1E and F have at least three different fonts of varying sizes making it very messy. In Figure 1C, the authors could bold the Ral14 ex11 or simply indicate that the blue is this exon in the legend, thus removing the text from this very busy graph. In Figure 4F, I would recommend, having all the labels the same size and putting those genes of interest like Sf3a1 in bold. This could also be done in Figure 4E.

      Thank you for the suggestion and we have edited these (FYI the font in Fig’s 1E and 1F were from the rMAPS default output, but I agree, it gives a sloppy appearance).

      (2) In Figures 4D and 4G, is there QKI binding to the downstream deletion mutant after 30 minutes? Also, in Figure 4G, are these all from the same blot? The band sizes seem to be very different between lanes. If these were not on the same blot, the original gels should be submitted.

      A small amount of Qki appears to be binding after 30 min. All lanes/blots are from the same gels/membranes; see new Supplemental Figure 4 for the original (uncropped) images of the blots.

      (3) The authors should indicate, the source and concentration of the antibodies used for their WB. They should also indicate the primers used for RT-PCRs.

      We have revised the methods to include the antibody information and have uploaded a supplemental table 8 with all oligonucleotide sequences used (which I (Sam Fagg) neglected to do initially, so that’s my bad).

      Reviewer #2 (Recommendations for the authors):

      (1) This may come down to the author's preference but branch point and branch site are frequently two words, not a single compound word (branch point vs. branchpoint). In addition, the authors may want to use branchsite with the abbreviation BS more frequently since they often don't describe the specific point of branching, and bp and bps could be confused for the more frequent abbreviations for base pair(s).

      Good suggestion; we have edited the text accordingly.

      (2) In general the addition of page numbers and line numbers to the manuscript would greatly aid reviewers!

      Point taken…

      (3) Introduction; "...under normal growth conditions they are efficiently spliced". I would say MOST introns in yeast are efficiently spliced. This is definitely not universal.

      Text edited to indicate that most are efficiently spliced.

      (4) Introduction; " recognition of the bps by SF1 (mammals) (20)". The choice of reference 20 is an odd one here. I think the Robin Reed and Michael Rosbash paper was the first to show SF1 was the human homolog of BBP.

      Got it, thanks (added #14 here and kept #20 also since it shows the structure of SF1 in complex with a UACUAAC bps.)

      (5) Results; "QK1 and SF1 co-regulate.."; it may be useful for the reader if you could explain in more detail why exon inclusion and intron retention are expected outcomes for QK1 knockdown and vice versa for SF1. The exon inclusion here is more obvious than the intron retention phenotype. (In other words, if more exons are included shouldn't it follow that more introns are removed?)

      We explain the expected results for exon inclusion in the Introduction and this paragraph of the Results. Although we have observed more intron retention under QKI loss-of-function approaches before, I am uncertain where the reviewer sees that we indicate any expected result for intron retention from either QKI or SF1 knockdown. I believe the statement you refer to might be on line 162 and starts with: “Consistent with potentially opposing functions in splicing…” ?

      Also, I agree that if SF1 is a “splicing activator,” one might expect more IR in its absence (but this is not the case; there is, in fact, less), but nonetheless, the opposite outcome is observed with QKI knockdown (more IR). It is unclear why this is the case, and we did not investigate it.

      (6) Results; "QK1 and SF1 co-regulate.."; "Thus the most highly represented set.." To me, the most highly represented set is those which are not both QK1-repressed and SF1-activated. Does this indicate that other factors are involved at most sites than simple competition between these two?

      We have revised the sentence in question to include the text “by quadrant” in order to convey our meaning more precisely.

      (7) Throughout the manuscript, 5 apostrophes and 3 apostrophes are used instead of 5 prime symbols and 3 prime symbols.

      Thank you for pointing that out. We have fixed each instance of this.

      (8) Sometimes SF1 is written as Sf1. (also Tatsf1)

      This was a mouse/human gene/protein nomenclature error that we have fixed; thank you for pointing this out.

      (9) You may want to make sure that figures are labeled consistently with the manuscript text. In Figure 1B, it is RI rather than IR. In Figure 4 it is myoblast NE rather than C2C12 nuclear extract.

      We have fixed these, checked for other examples, and where relevant, edited those too.

      (10) I think Figure 1A could be improved by also including a depiction of the domain arrangements of SF1 and QK1.

      Done.

      (11) I was a bit confused with all the lines in Figure 1E and 1F. What is the difference between the log (pVal) and upregulated plots? Can these figures be simplified or explained more thoroughly?

      Based on this comment and one from Reviewer 1, we have slightly revised the wording (and font) on the output, which hopefully clarifies. These are motif enrichment plots generated by rMAPS (Refs 61 and 62) analysis of rMATS (Ref 60) data for exons more included (depicted by the red lines) or more skipped (depicted by the blue lines) compared to control versus a “background” set of exons that are detectable but unchanged. The -log<sub>10</sub> is P-value (dotted line) indicates the significance of exons more included in shRNA treatment vs control shRNA (previously read “upregulated”) compared to background exons that are detectable but unchanged; the solid lines indicate the motif score; these are described in the references indicated.

      (12) Figure 1B, it is a bit hard to conclude that there is more AltEx or "RI/IR" in one sample vs. the other from these plots since the points overlay one another. Can you include numbers here?

      Added (and deleted Suppl Fig S1, which was simply a chart showing the numbers).

      (13) How was PSI calculated in Figure 2A?

      VAST-tools (we state this in the legend in the revised version).

      You may want to include rel protein (or the lower limit of detection) for Figure 2B to be consistent with 2C. Why is KD of SF1 so poor and variable between 2C and 2D?

      We have not investigated this, but these blots show an optimized result that we were able to obtain for the knockdown in each cell type. It may be that HEK293 cells (Fig 2B) have a stronger requirement for SF1 than C2C12 cells…? I would argue that it is not necessarily “poor” in Fig 2C, as we observe ~70% depletion of the protein.

      Why are two bands present in the gel?

      Two to three isoforms of SF1 are present in most cell types.

      A good (or bad, really) example of an SF1 western blot (and knockdown of ~35% in K562 or ~45% in HepG2 can also be seen on the ENCODE project website, for reference:

      https://www.encodeproject.org/documents/6001a414-b096-4073-94ff-3af165617eb5/@@download/attachment/SF1_BGKLV28-49.pdf

      By comparison, I think ours are much more cosmetically pleasing, and our knockdown (especially in C2C12) is much more efficient.

      (14) Figure 3, The asterisk refers to a cryptic product. Can the uaAcuuuCAG be used as a branch point? Presumably the natural 3' SS is now too close so this would result in activation of a downstream 3'SS?

      We did not pursue determining the identity of this minor and likely artefactual product, but we (and others) have observed a similar phenomenon when using splicing reporter-based mutational approaches.

      (15) For the methods. The "RNA extraction, RT -PCR,..." subheading needs to be on its own line. Please add (w/v) or (v/v) to percentages where appropriate. Please convert ug to the symbol for "micro".

      Thank you, we have made these changes.

      (16) In Figure 4B, the text here and legend are microscopic. Even with reading glasses, I couldn't make anything out!

      We have increased the font sizes for the text and scale bar…when referring to “legend” does the reviewer mean the scale bar?

      (17) As a potential discussion item, it is worth noting that SF1 could also repress splicing if it could either not engage with U2AF or be properly displaced by U2 snRNP so the snRNA could pair. I was wondering if QK1 could similarly be activating if it could engage with U2AF. I'm unsure if this could be tested by domain swaps (and is beyond the scope of this paper). It just may be worth speculating about.

      Good point and suggestion…we are looking into this.

      Reviewer #3 (Recommendations for the authors):

      (1) Is the reference in the text to Figure 5F correct for actin splicing (this is just before the discussion)?

      I see references several lines up from this, but I do not see a reference just before the discussion…?

      (2) I was not sure why the minigene experiments showed such high levels of intron retention that seemed to be impacted also by deletion of the branchpoint sequences, and suggest that the two branchpoints are not equal in strength.

      Neither were we, but Reviewer 2 has suggested that degradation of the spliced products could be rapid (NMD substrates) which could complicate the interpretation of what appears to be higher levels of intron retention. Given the possibility that this could be a non-physiological artefact, we have removed the measurement of unspliced reporter and now only show the spliced products (equally subject to degradation) and report their percent inclusion.

    1. eLife Assessment

      The study presents convincing quantitative evidence, supported by appropriate negative controls, for the presence of low-abundance glycine receptors (GlyRs) within inhibitory synapses in telencephalic regions of the mouse brain. Using sensitive single-molecule localization microscopy of endogenously tagged GlyRs, the authors reveal previously undetected populations of these receptors. Although the functional significance of these low-abundance GlyRs remains to be established, the findings offer valuable insights and methodologies that will be of interest to neuroscientists studying inhibitory synapse biology.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate the nanoscopic distribution of glycine receptor subunits in the hippocampus, dorsal striatum, and ventral striatum of the mouse brain using single-molecule localization microscopy (SMLM). They demonstrate that only a small number of glycine receptors are localized at hippocampal inhibitory synapses. Using dual-color SMLM, they further show that clusters of glycine receptors are predominantly localized within gephyrin-positive synapses. A comparison between the dorsal and ventral striatum reveals that the ventral striatum contains approximately eight times more glycine receptors and this finding is consistent with electrophysiological data on postsynaptic inhibitory currents. Finally, using cultured hippocampal neurons, they examine the differential synaptic localization of glycine receptor subunits (α1, α2, and β). This study is significant as it provides insights into the nanoscopic localization patterns of glycine receptors in brain regions where this protein is expressed at low levels. Additionally, the study demonstrates the different localization patterns of GlyR in distinct striatal regions and its physiological relevance using SMLM and electrophysiological experiments. However, several concerns should be addressed.

      Specific comments on the original version:

      (1) Colocalization analysis in Figure 1A. The colocalization between Sylite and mEos-GlyRβ appears to be quite low. It is essential to assess whether the observed colocalization is not due to random overlap. The authors should consider quantifying colocalization using statistical methods, such as a pixel shift analysis, to determine whether colocalization frequencies remain similar after artificially displacing one of the channels.

      (2) Inconsistency between Figure 3A and 3B. While Figure 3B indicates an ~8-fold difference in the number of mEos4b-GlyRβ detections per synapse between the dorsal and ventral striatum, Figure 3A does not appear to show a pronounced difference in the localization of mEos4b-GlyRβ on Sylite puncta between these two regions. If the images presented in Figure 3A are not representative, the authors should consider replacing them with more representative examples or providing an expanded images with multiple representative examples. Alternatively, if this inconsistency can be explained by differences in spot density within clusters, the authors should explain that.

      (3) Quantification in Figure 5. It is recommended that the authors provide quantitative data on cluster formation and colocalization with Sylite puncta in Figure 5 to support their qualitative observations.

      (4) Potential for pseudo replication. It's not clear whether they're performing stats tests across biological replica, images, or even synapses. They often quote mean +/- SEM with n = 1000s, and so does that mean they're doing tests on those 1000s? Need to clarify.

      (5) Does mEoS effect expression levels or function of the protein? Can't see any experiments done to confirm this. Could suggest WB on homogenate, or mass spec?

      (6) Quantification of protein numbers is challenging with SMLM. Issues include i) some of FP not correctly folded/mature, and ii) dependence of localisation rate on instrument, excitation/illumination intensities, and also the thresholds used in analysis. Can the authors compare with another protein that has known expression levels- e.g. PSD95? This is quite an ask, but if they could show copy number of something known to compare with, it would be useful.

      (7) Rationale for doing nanobody dSTORM not clear at all. They don't explain the reason for doing the dSTORM experiments. Why not just rely on PALM for coincidence measurements, rather than tagging mEoS with a nanobody, and then doing dSTORM with that? Can they explain? Is it to get extra localisations- i.e. multiple per nanobody? If so, localising same FP multiple times wouldn't improve resolution. Also, no controls for nanobody dSTORM experiments- what about non-spec nb, or use on WT sections?

      (8) What resolutions/precisions were obtained in SMLM experiments? Should perform Fourier Ring Correlation (FRC) on SR images to state resolutions obtained (particularly useful for when they're presenting distance histograms, as this will be dependent on resolution). Likewise for precision, what was mean precision? Can they show histograms of localisation precision.

      (9) Why were DBSCAN parameters selected? How can they rule out multiple localisations per fluor? If low copy numbers (<10), then why bother with DBSCAN? Could just measure distance to each one.

      (10) For microscopy experiment methods, state power densities, not % or "nominal power".

      (11) In general, not much data presented. Any SI file with extra images etc.?

      (12) Clarification of the discussion on GlyR expression and synaptic localization: The discussion on GlyR expression, complex formation, and synaptic localization is sometimes unclear, and needs terminological distinctions between "expression level", "complex formation" and "synaptic localization". For example, the authors state: "What then is the reason for the low protein expression of GlyRβ? One possibility is that the assembly of mature heteropentameric GlyR complexes depends critically on the expression of endogenous GlyR α subunits." Does this mean that GlyRβ proteins that fail to form complexes with GlyRα subunits are unstable and subject to rapid degradation? If so, the authors should clarify this point. The statement "This raises the interesting possibility that synaptic GlyRs may depend specifically on the concomitant expression of both α1 and β transcripts." suggests a dependency on α1 and β transcripts. However, is the authors' focus on synaptic localization or overall protein expression levels? If this means synaptic localization, it would be beneficial to state this explicitly to avoid confusion. To improve clarity, the authors should carefully distinguish between these different aspects of GlyR biology throughout the discussion. Additionally, a schematic diagram illustrating these processes would be highly beneficial for readers.

      (13) Interpretation of GlyR localization in the context of nanodomains. The distribution of GlyR molecules on inhibitory synapses appears to be non-homogeneous, instead forming nanoclusters or nanodomains, similar to many other synaptic proteins. It is important to interpret GlyR localization in the context of nanodomain organization.

      Significance:

      The paper presents biological and technical advances. The biological insights revolve mostly on the documentation of Glycine receptors in particular synapses in forebrain, where they are typically expressed at very low levels. The authors provide compelling data indicating that the expression is of physiological significance. The authors have done a nice job of combining genetically tagged mice with advanced microscopy methods to tackle the question of distributions of synaptic proteins. Overall, these advances are more incremental than groundbreaking.

      Comments on revised version:

      The authors have addressed the majority of the significant issues raised in the review and revised the manuscript appropriately. One issue that can be further addressed relates to the issue of pseudo-replication. The authors state in their response that "All experiments were repeated at least twice to ensure reproducibility (N independent experiments). Statistical tests were performed on pooled data across the biological replicates; n denotes the number of data points used for testing (e.g., number of synaptic clusters, detections, cells, as specified in each case).". This suggests that they're not doing their stats on biological replicates, and instead are pseudo replicating. It's not clear how they have ensured reproducibility, when the stats seem to have been done on pooled data across repeats.

    3. Reviewer #2 (Public review):

      Summary:

      In their manuscript "Single molecule counting detects low-copy glycine receptors in hippocampal and striatal synapses" Camuso and colleagues apply single molecule localization microscopy (SMLM) methods to visualize low copy numbers of GlyRs at inhibitory synapses in the hippocampal formation and the striatum. SMLM analysis revealed higher copy numbers in striatum compared to hippocampal inhibitory synapses. They further provide evidence that these low copy numbers are tightly linked to post-synaptic scaffolding protein gephyrin at inhibitory synapses. Their approach profits from the high detection sensitivity and resolution of SMLM and challenges the controversial view on the presence of GlyRs in these formations although there are reports (electrophysiology) on the presence of GlyRs in these particular brain regions. These new datasets in the current manuscript may certainly assist in understanding the complexity of fundamental building blocks of inhibitory synapses.

      Strengths:

      The manuscript provides new insights to presence of low-copy numbers by visualizing them via SMLM. This is the first report that visualizes GlyR optically in the brain applying the knock-in model of mEOS4b tagged GlyRß and quantifies their copy number comparing distribution and amount of GlyRs from hippocampus and striatum. Imaging data correspond well to electrophysiological measurements in the manuscript.

      Comments on revised version:

      My concerns have been successfully addressed by the authors during the revision process.

    4. Reviewer #3 (Public review):

      In this study, Camuso et al., make use of a knock-in mouse model expressing endogenously mEos4b-tagged GlyRβ subunits to detect endogenous glycine receptors in mouse brain using single-molecule localization microscopy (SMLM). At synapses in the hippocampus GlyRβ molecules are detected at very low copy numbers. Assuming that each detected GlyRβ molecule is incorporated in a pentameric glycine receptor, it was estimated that while the majority of hippocampal inhibitory synapses do not contain glycine receptors, a small population of inhibitory synapses contain a few (up to 10) glycine receptors. Using dual-color SMLM approaches it is furthermore confirmed that the detected GlyRβ molecules are embedded in the postsynaptic domain marked by gephyrin. In contrast to the hippocampus, at inhibitory synapses in the striatum GlyRβ molecules were detected at considerably higher copy numbers. Interestingly, the observed number of GlyRβ detections was significantly higher in the ventral striatum compared to the dorsal striatum. These findings are corroborated by electrophysiological recordings showing that postsynaptic glycinergic currents can be readily detected in the ventral striatum but are almost absent in the dorsal striatum. Using lentiviral overexpression of recombinant GlyRalpha1, alpha2, and beta subunits in cultured hippocampal neurons, it is shown that GlyR alpha1 subunits are readily detectable at synapses, but overexpressed GlyRalpha2 and beta subunits did not strongly enrich at synapses. This could indicate that GlyRa1 expression is limiting the synaptic expression of GlyRβ-containing glycine receptors in hippocampal neurons.

      Comments on revised version:

      This revised manuscript is significantly improved. New experimental and quantitative analysis is presented that strengthen the conclusions. Overall, the results presented in this manuscript are based on carefully performed SMLM experiments and are well-presented and described. The knock-in mouse with endogenously tagged GlyRβ molecules is a very strong aspect of this study and provides confidence in the labeling, the combination with SMLM is very strong as it provides high sensitivity and spatial resolution. These results confirm previous studies and will be of interest to a specialised audience interested in glycine receptors, inhibitory synapse biology and super-resolution microscopy.

    5. Author response:

      The following is the authors’ response to the current reviews.

      We thank the editors of eLife and the reviewers for their thorough evaluation of our study. As regards the final comments of reviewer 1 please note that all experimental replicates were first analyzed separately, and were then pooled, since the observed changes were comparable between experiments. This mean that statistical analyses were done on pooled biological replicates.


      The following is the authors’ response to the original reviews.

      General Statements

      We thank the reviewers for their thorough and constructive evaluation of our work. We have revised the manuscript carefully and addressed all the criticisms raised, in particular the issues mentioned by several of the reviewers (see point-by-point response below). We have also added a number of explanations in the text for the sake of clarity, while trying to keep the manuscript as concise as possible.

      In our view, the novelty of our research is two-fold. From a neurobiological point of view, we provide conclusive evidence for the existence of glycine receptors (GlyRs) at inhibitory synapses in various brain regions including the hippocampus, dentate gyrus and sub-regions of the striatum. This solves several open questions and has fundamental implications for our understanding of the organisation and function of inhibitory synapses in the telencephalon. Secondly, our study makes use of the unique sensitivity of single molecule localisation microscopy (SMLM) to identify low protein copy numbers. This is a new way to think about SMLM as it goes beyond a mere structural characterisation and towards a quantitative assessment of synaptic protein assemblies.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity): 

      In this manuscript, the authors investigate the nanoscopic distribution of glycine receptor subunits in the hippocampus, dorsal striatum, and ventral striatum of the mouse brain using single-molecule localization microscopy (SMLM). They demonstrate that only a small number of glycine receptors are localized at hippocampal inhibitory synapses. Using dual-color SMLM, they further show that clusters of glycine receptors are predominantly localized within gephyrinpositive synapses. A comparison between the dorsal and ventral striatum reveals that the ventral striatum contains approximately eight times more glycine receptors and this finding is consistent with electrophysiological data on postsynaptic inhibitory currents. Finally, using cultured hippocampal neurons, they examine the differential synaptic localization of glycine receptor subunits (α1, α2, and β). This study is significant as it provides insights into the nanoscopic localization patterns of glycine receptors in brain regions where this protein is expressed at low levels. Additionally, the study demonstrates the different localization patterns of GlyR in distinct striatal regions and its physiological relevance using SMLM and electrophysiological experiments. However, several concerns should be addressed. 

      The following are specific comments: 

      (1) Colocalization analysis in Figure 1A. The colocalization between Sylite and mEos-GlyRβ appears to be quite low. It is essential to assess whether the observed colocalization is not due to random overlap. The authors should consider quantifying colocalization using statistical methods, such as a pixel shift analysis, to determine whether colocalization frequencies remain similar after artificially displacing one of the channels. 

      Following the suggestion of reviewer 1, we re-analysed CA3 images of Glrb<sup>eos/eos</sup> hippocampal slices by applying a pixel-shift type of control, in which the Sylite channel (in far red) was horizontally flipped relative to the mEos4b-GlyRβ channel (in green, see Methods). As expected, the number of mEos4b-GlyRβ detections per gephyrin cluster was markedly reduced compared to the original analysis (revised Fig. 1B), confirming that the synaptic mEos4b detections exceed chance levels (see page 5). 

      (2) Inconsistency between Figure 3A and 3B. While Figure 3B indicates an ~8-fold difference in the number of mEos4b-GlyRβ detections per synapse between the dorsal and ventral striatum, Figure 3A does not appear to show a pronounced difference in the localization of mEos4bGlyRβ on Sylite puncta between these two regions. If the images presented in Figure 3A are not representative, the authors should consider replacing them with more representative examples or providing an expanded images with multiple representative examples. Alternatively, if this inconsistency can be explained by differences in spot density within clusters, the authors should explain that. 

      The pointillist images in Fig. 3A are essentially binary (red-black). Therefore, the density of detections at synapses cannot be easily judged by eye. For clarity, the original images in Fig. 3A have been replaced with two other examples that better reflect the different detection numbers in the dorsal and ventral striatum. 

      (3) Quantification in Figure 5. It is recommended that the authors provide quantitative data on cluster formation and colocalization with Sylite puncta in Figure 5 to support their qualitative observations. 

      This is an important point that was also raised by the other reviewers. We have performed additional experiments to increase the data volume for analysis. For quantification, we used two approaches. First, we counted the percentage of infected cells in which synaptic localisation of the recombinant receptor subunit was observed (Fig. 5C). We found that mEos4b-GlyRa1 consistently localises at synapses, indicating that all cells express endogenous GlyRb. When neurons were infected with mEos4b-GlyRb, fewer cells had synaptic clusters, meaning that indeed, GlyR alpha subunits are the limiting factor for synaptic targeting. In cultures infected with mEos4b-GlyRa2, only very few neurons displayed synaptic localisation (as judged by epifluorescence imaging). We think this shows that GlyRa2 is less capable of forming heteromeric complexes than GlyRa1, in line with our previous interpretation (see pp. 9-10, 13). 

      Secondly, we quantified the total intensity of each subunit at gephyrin-positive domains, both in infected neurons as well as non-infected control cultures (Fig. 5D). We observed that mEos4bGlyRa1 intensity at gephyrin puncta was higher than that of the other subunits, again pointing to efficient synaptic targeting of GlyRa1. Gephyrin cluster intensities (Sylite labelling) were not significantly different in GlyRb and GlyRa2 expressing neurons compared to the uninfected control, indicating that the lentiviral expression of recombinant subunits does not fundamentally alter the size of mixed inhibitory synapses in hippocampal neurons. Interestingly, gephyrin levels were slightly higher in hippocampal neurons expressing mEos4b-GlyRa1. In our view, this comes from an enhanced expression and synaptic targeting of mEos4b-GlyRa1 heteromers with endogenous GlyRb, pointing to a structural role of GlyRa1/b in hippocampal synapses (pp. 10, 13).

      The new data and analyses have been described and illustrated in the relevant sections of the manuscript.

      (4) Potential for pseudo replication. It's not clear whether they're performing stats tests across biological replica, images, or even synapses. They often quote mean +/- SEM with n = 1000s, and so does that mean they're doing tests on those 1000s? Need to clarify. 

      All experiments were repeated at least twice to ensure reproducibility (N independent experiments). Statistical tests were performed on pooled data across the biological replicates; n denotes the number of data points used for testing (e.g., number of synaptic clusters, detections, cells, as specified in each case). We have systematically given these numbers in the revised manuscript (n, N, and other experimental parameters such as the number of animals used, coverslips, images or cells). Data are generally given as mean +/- SEM or as mean +/- SD as indicated.

      (5) Does mEoS effect expression levels or function of the protein? Can't see any experiments done to confirm this. Could suggest WB on homogenate, or mass spec? 

      The Glrb<sup>eos/eos</sup> knock-in mouse line has been characterised previously and does not to display any ultrastructural or functional deficits at inhibitory synapses (Maynard et al. 2021 eLife). GlyRβ expression and glycine-evoked responses were not significantly different to those of the wildtype. The synaptic localisation of mEos4b-GlyRb in KI animals demonstrates correct assembly of heteromeric GlyRs and synaptic targeting. Accordingly, the animals do not display any obvious phenotype. We have clarified this in the manuscript (p. 4). In the case of cultured neurons, long-term expression of fluorescent receptor subunits with lentivirus   has proven ideal to achieve efficient synaptic targeting. The low and continuous supply of recombinant receptors ensures assembly with endogenous subunits to form heteropentameric receptor complexes (e.g. [Patrizio et al. 2017 Sci Rep]). In the present study, lentivirus infection did not induce any obvious differences in the number or size of inhibitory synapses compared to control neurons, as judged by Sylite labelling of synaptic gephyrin puncta (new Fig. 5D).

      (6) Quantification of protein numbers is challenging with SMLM. Issues include i) some of FP not correctly folded/mature, and ii) dependence of localisation rate on instrument, excitation/illumination intensities, and also the thresholds used in analysis. Can the authors compare with another protein that has known expression levels- e.g. PSD95? This is quite an ask, but if they could show copy number of something known to compare with, it would be useful. 

      We agree that absolute quantification with SMLM is challenging, since the number of detections depends on fluorophore maturation, photophysics, imaging conditions, and analysis thresholds (discussed in Patrizio & Specht 2016, Neurophotonics). For this reason, only very few datasets provide reliable copy numbers, even for well-studied proteins such as PSD-95. One notable exception is the study by Maynard et al. (eLife 2021) that quantified endogenous GlyRβcontaining receptors in spinal cord synapses using SMLM combined with correlative electron microscopy. The strength of this work was the use of a KI mouse strain, which ensures that mEos4b-GlyRβ expression follows intrinsic regional and temporal profiles. The authors reported a stereotypic density of ~2,000 GlyRs/µm² at synapses, corresponding to ~120 receptors per synapse in the dorsal horn and ~240 in the ventral horn, taking into account various parameters including receptor stoichiometry and the functionality of the fluorophore. These values are very close to our own calculations of GlyR numbers at spinal cord synapses that were obtained slightly differently in terms of sample preparation, microscope setup, imaging conditions, and data analysis, lending support to our experimental approach. Nevertheless, the obtained GlyR copy numbers at hippocampal synapses clearly have to be taken as estimates rather than precise figures, because the number of detections from a single mEos4b fluorophore can vary substantially, meaning that the fluorophores are not represented equally in pointillist images. This can affect the copy number calculation for a specific synapse, in particular when the numbers are low (e.g. in hippocampus), however, it should not alter the average number of detections (Fig. 1B) or the (median) molecule numbers of the entire population of synapses (Fig. 1C). We have discussed the limitations of our approach (p. 11).

      (7) Rationale for doing nanobody dSTORM not clear at all. They don't explain the reason for doing the dSTORM experiments. Why not just rely on PALM for coincidence measurements, rather than tagging mEoS with a nanobody, and then doing dSTORM with that? Can they explain? Is it to get extra localisations- i.e. multiple per nanobody? If so, localising same FP multiple times wouldn't improve resolution. Also, no controls for nanobody dSTORM experiments- what about non-spec nb, or use on WT sections? 

      As discussed above (point 6), the detection of fluorophores with SMLM is influenced by many parameters, not least the noise produced by emitting molecules other than the fluorophore used for labelling. Our study is exceptional in that it attempts to identify extremely low molecule numbers (down to 1). To verify that the detections obtained with PALM correspond to mEos4b, we conducted robust control experiments (including pixel-shift as suggested by the reviewer, see point 1, revised Fig. 1B). The rationale for the nanobody-based dSTORM experiments was twofold: (1) to have an independent readout of the presence of low-copy GlyRs at inhibitory synapses and (2) to analyse the nanoscale organisation of GlyRs relative to the synaptic gephyrin scaffold using dual-colour dSTORM with spectral demixing (see p. 6). The organic fluorophores used in dSTORM (AF647, CF680) ensure high photon counts, essential for reliable co-localisation and distance analysis. PALM and dSTORM cannot be combined in dual-colour mode, as they require different buffers and imaging conditions. 

      The specificity of the anti-Eos nanobody was demonstrated by immunohistochemistry in spinal cord cultures expressing mEos4b-GlyRb and wildtype control tissue (Fig. S3). In response to the reviewer's remarks, we also performed a negative control experiment in Glrb<sup>eos/eos</sup> slices (dSTORM), in which the nanobody was omitted (new Fig. S4F,G). Under these conditions, spectral demixing produced a single peak corresponding to CF680 (gephyrin) without any AF647 contribution (Fig. S4F). The background detection of "false" AF647 detections at synapses was significantly lower than in the slices labelled with the nanobody. We conclude that the fluorescence signal observed in our dual-colour dSTORM experiments arises from the specific detection of mEos4b-GlyRb by the nanobody, rather than from background, crossreactivity or wrong attribution of colour during spectral demixing. We have added these data and explanations in the results (p. 7) and in the figure legend of Fig. S4F,G.

      (8) What resolutions/precisions were obtained in SMLM experiments? Should perform Fourier Ring Correlation (FRC) on SR images to state resolutions obtained (particularly useful for when they're presenting distance histograms, as this will be dependent on resolution). Likewise for precision, what was mean precision? Can they show histograms of localisation precision. 

      This is an interesting question in the context of our experiments with low-copy GlyRs, since the spatial resolution of SMLM is limited also by the density of molecules, i.e. the sampling of the structure in question (Nyquist-Shannon criterion). Accordingly, the priority of the PALM experiments was to improve the sensibility of SMLM for the identification of mEos4b-GlyRb subunits, rather than to maximize the spatial resolution. The mean localisation precision in PALM was 33 +/- 12 nm, as calculated from the fitting parameters of each detection (Zeiss, ZEN software), which ultimately result from their signal-to-noise ratio. This is a relatively low precision for SMLM, which can be explained by the low brightness of mEos4b compared to organic fluorophores together with the elevated fluorescence background in tissue slices.

      In the case of dSTORM, the aim was to study the relative distribution of GlyRs within the synaptic scaffold, for which a higher localisation precision was required (p. 6). Therefore, detections with a precision ≥ 25 nm were filtered during analysis with NEO software (Abbelight). The retained detections had a mean localisation precision of 12 +/- 5 for CF680 (Sylite) and 11 +/- 4 for AF647 (nanobody). These values are given in the revised manuscript (pp. 18, 22).

      (9) Why were DBSCAN parameters selected? How can they rule out multiple localisations per fluor? If low copy numbers (<10), then why bother with DBSCAN? Could just measure distance to each one. 

      Multiple detections of the same fluorophore are intrinsic to dSTORM imaging and have not been eliminated from the analysis. Small clusters of detections likely represent individual molecules (e.g. single receptors in the extrasynaptic regions, Fig. 2A). DBSCAN is a robust clustering method that is quite insensitive to minor changes in the choice of parameters. For dSTORM of synaptic gephyrin clusters (CF680), a relatively low length (80 nm radius) together with a high number of detections (≥ 50 neighbours) were chosen to reconstruct the postsynaptic domain with high spatial resolution (see point 8). In the case of the GlyR (nanobody-AF647), the clustering was done mostly for practical reasons, as it provided the coordinates of the centre of mass of the detections. The low stringency of this clustering (200 nm radius, ≥ 5 neighbours) effectively filters single detections that can result from background noise or incorrect demixing. An additional reference explaining the use of DBSCAN including the choice of parameters is given on p. 22 (see also R2 point 4).

      (10) For microscopy experiment methods, state power densities, not % or "nominal power". 

      Done. We now report the irradiance (laser power density) instead of nominal power (pp. 18, 21). 

      (11) In general, not much data presented. Any SI file with extra images etc.? 

      The original submission included four supplementary figures with additional data and representative images that should have been available to the reviewer (Figs. S1-S4). The SI file has been updated during revision (new Fig. S4E-G). 

      (12) Clarification of the discussion on GlyR expression and synaptic localization: The discussion on GlyR expression, complex formation, and synaptic localization is sometimes unclear, and needs terminological distinctions between "expression level", "complex formation" and "synaptic localization". For example, the authors state:"What then is the reason for the low protein expression of GlyRβ? One possibility is that the assembly of mature heteropentameric GlyR complexes depends critically on the expression of endogenous GlyR α subunits." Does this mean that GlyRβ proteins that fail to form complexes with GlyRα subunits are unstable and subject to rapid degradation? If so, the authors should clarify this point. The statement "This raises the interesting possibility that synaptic GlyRs may depend specifically on the concomitant expression of both α1 and β transcripts." suggests a dependency on α1 and β transcripts. However, is the authors' focus on synaptic localization or overall protein expression levels? If this means synaptic localization, it would be beneficial to state this explicitly to avoid confusion. To improve clarity, the authors should carefully distinguish between these different aspects of GlyR biology throughout the discussion. Additionally, a schematic diagram illustrating these processes would be highly beneficial for readers. 

      We thank the reviewer to point this out. We are dealing with several processes; protein expression that determines subunit availability and the assembly of pentameric GlyRs complexes, surface expression, membrane diffusion and accumulation of GlyRb-containing receptor complexes at inhibitory synapses. We have edited the manuscript, particularly the discussion and tried to be as clear as possible in our wording.

      We chose not to add a schematic illustration for the time being, because any graphical representation is necessarily a simplification. Instead, we preferred to summarise the main numbers in tabular form (Table 1). We are of course open to any other suggestions.

      (13) Interpretation of GlyR localization in the context of nanodomains. The distribution of GlyR molecules on inhibitory synapses appears to be non-homogeneous, instead forming nanoclusters or nanodomains, similar to many other synaptic proteins. It is important to interpret GlyR localization in the context of nanodomain organization. 

      The dSTORM images in Fig. 2 are pointillist representations that show individual detections rather than molecules. Small clusters of detections are likely to originate from a single AF647 fluorophore (in the case of nanobody labelling) and therefore represent single GlyRb subunits. Since GlyR copy numbers are so low at hippocampal synapses (≤ 5), the notion of nanodomain is not directly applicable. Our analysis therefore focused on the integration of GlyRs within the postsynaptic scaffold, rather than attempting to define nanodomain structures (see also response to point 8 of R1). A clarification has been added in the revised manuscript (p. 6).

      Reviewer #1 (Significance): 

      The paper presents biological and technical advances. The biological insights revolve mostly on the documentation of Glycine receptors in particular synapses in forebrain, where they are typically expressed at very low levels. The authors provide compelling data indicating that the expression is of physiological significance. The authors have done a nice job of combining genetically-tagged mice with advanced microscopy methods to tackle the question of distributions of synaptic proteins. Overall these advances are more incremental than groundbreaking. 

      We thank the reviewer for acknowledging both the technical and biological advances of our study. While we recognize that our work builds upon established models, we consider that it also addresses important unresolved questions, namely that GlyRs are present and specifically anchored at inhibitory synapses in telencephalic regions, such as the hippocampus and striatum. From a methodological point of view, our study demonstrates that SMLM can be applied not only for structural analysis of highly abundant proteins, but also to reliably detect proteins present at very low copy numbers. This ability to identify and quantify sparse molecule populations adds a new dimension to SMLM applications, which we believe increases the overall impact of our study beyond the field of synaptic neuroscience.

      Reviewer #2 (Evidence, reproducibility and clarity): 

      In their manuscript "Single molecule counting detects low-copy glycine receptors in hippocampal and striatal synapses" Camuso and colleagues apply single molecule localization microscopy (SMLM) methods to visualize low copy numbers of GlyRs at inhibitory synapses in the hippocampal formation and the striatum. SMLM analysis revealed higher copy numbers in striatum compared to hippocampal inhibitory synapses. They further provide evidence that these low copy numbers are tightly linked to post-synaptic scaffolding protein gephyrin at inhibitory synapses. Their approach profits from the high sensitivity and resolution of SMLM and challenges the controversial view on the presence of GlyRs in these formations although there are reports (electrophysiology) on the presence of GlyRs in these particular brain regions. These new datasets in the current manuscript may certainly assist in understanding the complexity of fundamental building blocks of inhibitory synapses. 

      However I have some minor points that the authors may address for clarification: 

      (1) In Figure 1 the authors apply PALM imaging of mEos4b-GlyRß (knockin) and here the corresponding Sylite label seems to be recorded in widefield, it is not clearly stated in the figure legend if it is widefield or super-resolved. In Fig 1 A - is the scale bar 5 µm? Some Sylite spots appear to be sized around 1 µm, especially the brighter spots, but maybe this is due to the lower resolution of widefield imaging? Regarding the statistical comparison: what method was chosen to test for normality distribution, I think this point is missing in the methods section. 

      This is correct; the apparent size of the Sylite spots does not reflect the real size of the synaptic gephyrin domain due to the limited resolution of widefield imaging including the detection of outof-focus light. We have clarified in the legend of Fig. 1A that Sylite labelling was with classic epifluorescence microscopy. The scale bar in Fig. 1A corresponds to 5 µm. Since the data were not normally distributed, nonparametric tests (Kruskal- Wallis one-way ANOVA with Dunn’s multiple comparison test or Mann-Whitney U-test for pairwise comparisons) were used (p. 23). 

      Moreover I would appreciate a clarification and/or citation that the knockin model results in no structural and physiological changes at inhibitory synapses, I believe this model has been applied in previous studies and corresponding clarification can be provided. 

      The Glrbeos/eos mouse model has been described previously and does not exhibit any structural or physiological phenotypes (Maynard et al. 2021 eLife). The issue was also raised by reviewer R1 (point 5) and has been clarified in the revised manuscript (p. 4).

      (2) In the next set of experiments the authors switch to demixing dSTORM experiments - an explanation why this is performed is missing in the text - I guess better resolution to perform more detailed distance measurements? For these experiments: which region of the hippocampus did the authors select, I cannot find this information in legend or main text. 

      Yes, the dSTORM experiments enable dual-colour structural analysis at high spatial resolution (see response to R1 point 7). An explanation has been added (p. 6).

      (3) Regarding parameters of demixing experiments: the number of frames (10.000) seems quite low and the exposure time higher than expected for Alexa 647. Can the authors explain the reason for chosing these particular parameters (low expression profile of the target - so better separation?, less fluorophores on label and shorter collection time?) or is there a reference that can be cited? The laser power is given in the methods in percentage of maximal output power, but for better comparison and reproducibility I recommend to provide the values of a power meter (kW/cm2) as lasers may change their maximum output power during their lifetime. 

      Acquisition parameters (laser power, exposure time) for dSTORM were chosen to obtain a good localisation precision (~12 nm; see R1 point 8). The number of frames is adequate to obtain well sampled gephyrin scaffolds in the CF680 channel. In the case of the GlyR (nanobody-AF647), the concept of spatial resolution does not really apply due to the low number of targets (see R1, point 13). Power density (irradiance) values have now been given (pp. 18, 21).

      (4) For analysis of subsynaptic distribution: how did the authors decide to choose the parameters in the NEO software for DBSCAN clustering - was a series of parameters tested to find optimal conditions and did the analysis start with an initial test if data is indeed clustered (K-ripley) or is there a reference in literature that can be provided? 

      DBSCAN parameters were optimised manually, by testing different values. Identification of dense and well-delimited gephyrin clusters (CF680) was achieved with a small radius and a high number of detections (80 nm, ≥ 50 neighbours), whereas filtering of low-density background in the AF647 channel (GlyRs) required less stringent parameters (200 nm, ≥ 5) due to the low number of target molecules. Similar parameters were used in a previous publication (Khayenko et al. 2022, Angewandte Chemie). The reference has been provided on p. 22 (see also R1 point 9).

      (5) A conclusion/discussion of the results presented in Figure 5 is missing in the text/discussion. 

      This part of the manuscript has been completely overhauled. It includes new experimental data, quantification of the data (new Fig.5), as well as the discussion and interpretation of our findings (see also R1, point 3). In agreement with our earlier interpretation, the data confirm that low availability of GlyRa1 subunits limits the expression and synaptic targeting of GlyRa1/b heteropentamers. The observation that GlyRa1 overexpression with lentivirus increases the size of the postsynaptic gephyrin domain further points to a structural role, whereby GlyRs can enhance the stability (and size) of inhibitory synapses in hippocampal neurons, even at low copy numbers (pp. 13-14). 

      (6) In line 552 "suspension" is misleading, better use "solution" 

      Done.

      Reviewer #2 (Significance): 

      Significance: The manuscript provides new insights to presence of low-copy numbers by visualizing them via SMLM. This is the first report that visualizes GlyR optically in the brain applying the knock-in model of mEOS4b tagged GlyRß and quantifies their copy number comparing distribution and amount of GlyRs from hippocampus and striatum. Imaging data correspond well to electrophysiological measurements in the manuscript. 

      Field of expertise: Super-Resolution Imaging and corresponding analysis 

      Reviewer #4 (Evidence, reproducibility and clarity): 

      In this study, Camuso et al., make use of a knock-in mouse model expressing endogenously mEos4b-tagged GlyRβ to detect endogenous glycine receptors using single-molecule localization microscopy. The main conclusion from this study is that in the hippocampus GlyRβ molecules are barely detected, while inhibitory synapses in the ventral striatum seem to express functionally relevant GlyR numbers. 

      I have a few points that I hope help to improve the strength of this study. 

      - In the hippocampus, this study finds that the numbers of detections are very low. The authors perform adequate controls to indicate that these localizations are above noise level. Nevertheless, it remains questionable that these reflect proper GlyRs. The suggestion that in hippocampal synapses the low numbers of GlyRβ molecules "are important in assembly or maintenance of inhibitory synaptic structures in the brain" is on itself interesting, but is not at all supported. It is also difficult to envision how such low numbers could support the structure of a synapse. A functional experiment showing that knockdown of GlyRs affects inhibitory synapse structure in hippocampal neurons would be a minimal test of this. 

      It is not clear what the reviewer means by “it remains questionable that these reflect proper GlyRs”. The PALM experiments include a series of stringent controls (see R1, point 1) demonstrating the existence of low-copy GlyRs at inhibitory synapses in the hippocampus (Fig. 1) and in the striatum (Fig. 3), and are backed up by dSTORM experiments (Fig. 2). We have no reason to doubt that these receptors are fully functional (as demonstrated for the ventral striatum (Fig. 4). However, due to their low number, a role in inhibitory synaptic transmission is clearly limited, at least in the hippocampus and dorsal striatum. 

      We therefore propose a structural role, where the GlyRs could be required to stabilise the postsynaptic gephyrin domain in hippocampal neurons. This is based on the idea that the GlyRgephyrin affinity is much higher than that of the GABAAR-gephyrin interaction (reviewed in Kasaragod & Schindelin 2018 Front Mol Neurosci). Accordingly, there is a close relationship between GlyRs and gephyrin numbers, sub-synaptic distribution, and dynamics in spinal cord synapses that are mostly glycinergic (Specht et al. 2013 Neuron; Maynard et al. 2021 eLife; Chapdelaine et al. 2021 Biophys J). It is reasonable to assume that low-copy GlyRs could play a similar structural role at hippocampal synapses. A knockdown experiment targeting these few receptors is technically very challenging and beyond the scope of this study. However, in response to the reviewer's question we have conducted new experiments in cultured hippocampal neurons (new Fig. 5). They demonstrate that overexpression of GlyRa1/b heteropentamers increases the size of the postsynaptic domain in these neurons, supporting our interpretation of a structural role of low-copy GlyRs (p. 14).

      - The endogenous tagging strategy is a very strong aspect of this study and provides confidence in the labeling of GlyRβ molecules. One caveat however, is that this labeling strategy does not discriminate whether GlyRβ molecules are on the cell membrane or in internal compartments. Can the authors provide an estimate of the ratio of surface to internal GlyRβ molecules? 

      Gephyrin is known to form a two-dimensional scaffold below the synaptic membrane to which inhibitory GlyRs and GABAARs attach (reviewed in Alvarez 2017 Brain Res). The majority of the synaptic receptors are therefore thought to be located in the synaptic membrane, which is supported by the close relationship between the sub-synaptic distribution of GlyRs and gephyrin in spinal cord neurons (e.g. Maynard et al. 2021 eLife). To demonstrate the surface expression of GlyRs at hippocampal synapses we labelled cultured hippocampal neurons expressing mEos4b-GlyRa1 with anti-Eos nanobody in non-permeabilised neurons (see Author response image 1). The close correspondence between the nanobody (AF647) and the mEos4b signal confirms that the majority of the GlyRs are indeed located in the synaptic membrane.

      Author response image 1.

      Left: Lentivirus expression of mEos4b-GlyRa1 in fixed and non-permeabilised hippocampal neurons (mEos4b signal). Right: Surface labelling of the recombinant subunit with anti-Eos nanoboby (AF647). 

      - “We also estimated the absolute number of GlyRs per synapse in the hippocampus. The number of mEos4b detections was converted into copy numbers by dividing the detections at synapses by the average number of detections of individual mEos4b-GlyRβ containing receptor complexes”. In essence this is a correct method to estimate copy numbers, and the authors discuss some of the pitfalls associated with this approach (i.e., maturation of fluorophore and detection limit). Nevertheless, the authors did not subtract the number of background localizations determined in the two negative control groups. This is critical, particularly at these low-number estimations. 

      We fully agree that background subtraction can be useful with low detection numbers. In the revised manuscript, copy numbers are now reported as background-corrected values. Specifically, the mean number of detections measured in wildtype slices was used to calculate an equivalent receptor number, which was then subtracted from the copy number estimates across hippocampus, spinal cord and striatum. This procedure is described in the methods (p. 20) and results (p. 5, 8), and mentioned in the figure legends of Fig. 1C, 3C. The background corrected values are given in the text and Table 1.

      - Furthermore, the authors state that "The advantage of this estimation is that it is independent of the stoichiometry of heteropentameric GlyRs". However, if the stoichometry is unknown, the number of counted GlyRβ subunits cannot simply be reported as the number of GlyRs. This should be discussed in more detail, and more carefully reported throughout the manuscript. 

      The reviewer is right to point this out. There is still some debate about the stoichiometry of heteropentameric GlyRs. Configurations with 2a:3b, 3a:2b and 4a:1b subunits have been advanced (e.g. Grudzinska et al. 2005 Neuron; Durisic et al. 2012 J Neurosci; Patrizio et al. 2017 Sci Rep; Zhu & Gouaux 2021 Nature). We have therefore chosen a quantification that is independent of the underlying stoichiometry. Since our quantification is based on very sparse clusters of mEos4b detections that likely originate from a single receptor complex (irrespective of its stoichiometry), the reported values actually reflect the number of GlyRs (and not GlyRb subunits). We have clarified this in the results (p. 5) and throughout the manuscript (Table 1). 

      - The dual-color imaging provides insights in the subsynaptic distribution of GlyRβ molecules in hippocampal synapses. Why are similar studies not performed on synapses in the ventral striatum where functionally relevant numbers of GlyRβ molecules are found? Here insights in the subsynaptic receptor distribution would be of much more interest as it can be tight to the function. 

      This is an interesting suggestion. However, the primary aim of our study was to identify the existence of GlyRs in hippocampal regions. At low copy numbers, the concept of sub-synaptic domains (SSDs, e.g. Yang et al. 2021 EMBO Rep) becomes irrelevant (see R1 point 13). It should be pointed out that the dSTORM pointillist images (Fig. 2A) represent individual GlyR detections rather than clusters of molecules. In the striatum, our specific purpose was to solve an open question about the presence of GlyRs in different subregions (putamen, nucleus accumbens).

      - It is unclear how the experiments in Figure 5 add to this study. These results are valid, but do not seem to directly test the hypothesis that "the expression of α subunits may be limiting factor controlling the number of synaptic GlyRs". These experiments simply test if overexpressed α subunits can be detected. If the α subunits are limiting, measuring the effect of α subunit overexpression on GlyRβ surface expression would be a more direct test. 

      Both R1 and R2 have also commented on the data in Fig. 5 and their interpretation. We have substantially revised this section as described before (see R1 point 3) including additional experiments and quantification of the data (new Fig. 5). The findings lend support to our earlier hypothesis that GlyR alpha subunits (in particular GlyRa1) are the limiting factor for the expression of heteropentameric GlyRa/b in hippocampal neurons (pp. 13-14). Since the GlyRa1 subunit itself does not bind to gephyrin (Patrizio et al. 2017 Sci Rep), the synaptic localisation of the recombinant mEos4b-GlyRa1 subunits is proof that they have formed heteropentamers with endogenous GlyRb subunits and driven their membrane trafficking, which the GlyRb subunits are incapable of doing on their own.

      Reviewer #4 (Significance): 

      These results are based on carefully performed single-molecule localization experiments, and are well-presented and described. The knockin mouse with endogenously tagged GlyRβ molecules is a very strong aspect of this study and provides confidence in the labeling, the combination with single-molecule localization microscopy is very strong as it provides high sensitivity and spatial resolution. 

      The conceptual innovation however seems relatively modest, these results confirm previous studies but do not seem to add novel insights. This study is entirely descriptive and does not bring new mechanistic insights. 

      This study could be of interest to a specialized audience interested in glycine receptor biology, inhibitory synapse biology and super-resolution microscopy. 

      My expertise is in super-resolution microscopy, synaptic transmission and plasticity 

      As we have stated before, the novelty of our study lies in the use of SMLM for the identification of very small numbers of molecules, which requires careful control experiments. This is something that has not been done before and that can be of interest to a wider readership, as it opens up SMLM for ultrasensitive detection of rare molecular events. Using this approach, we solve two open scientific questions: (1) the demonstration that low-copy GlyRs are present at inhibitory synapses in the hippocampus, (2) the sub-region specific expression and functional role of GlyRs in the ventral versus dorsal striatum.

      The following review was provided later under the name “Reviewer #4”. To avoid confusion with the last reviewer from above we will refer to this review as R4-2.

      Reviewer #4-2 (Evidence, reproducibility and clarity):  

      Summary:

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).

      The authors investigate the presence of synaptic glycine receptors in the telencephalon, whose presence and function is poorly understood. 

      Using a transgenically labeled glycine receptor beta subunit (Glrb-mEos4b) mouse model together with super-resolution microscopy (SLMM, dSTORM), they demonstrate the presence of a low but detectable amount of synaptically localized GLRB in the hippocampus. While they do not perform a functional analysis of these receptors, they do demonstrate that these subunits are integrated into the inhibitory postsynaptic density (iPSD) as labeled by the scaffold protein gephyrin. These findings demonstrate that a low level of synaptically localized glycerine receptor subunits exist in the hippocampal formation, although whether or not they have a functional relevance remains unknown.

      They then proceed to quantify synaptic glycine receptors in the striatum, demonstrating that the ventral striatum has a significantly higher amount of GLRB co-localized with gephyrin than the dorsal striatum or the hippocampus. They then recorded pharmacologically isolated glycinergic miniature inhibitory postsynaptic currents (mIPSCs) from striatal neurons. In line with their structural observations, these recordings confirmed the presence of synaptic glycinergic signaling in the ventral striatum, and an almost complete absence in the dorsal striatum. Together, these findings demonstrate that synaptic glycine receptors in the ventral striatum are present and functional, while an important contribution to dorsal striatal activity is less likely.

      Lastly, the authors use existing mRNA and protein datasets to show that the expression level of GLRA1 across the brain positively correlates with the presence of synaptic GLRB.

      The authors use lentiviral expression of mEos4b-tagged glycine receptor alpha1, alpha2, and beta subunits (GLRA1, GLRA1, GLRB) in cultured hippocampal neurons to investigate the ability of these subunits to cause the synaptic localization of glycine receptors. They suggest that the alpha1 subunit has a higher propensity to localize at the inhibitory postsynapse (labeled via gephyrin) than the alpha2 or beta subunits, and may therefore contribute to the distribution of functional synaptic glycine receptors across the brain.

      Major comments:

      - Are the key conclusions convincing?

      The authors are generally precise in the formulation of their conclusions.

      (1) They demonstrate a very low, but detectable, amount of a synaptically localized glycine receptor subunit in a transgenic (GlrB-mEos4b) mouse model. They demonstrate that the GLRB-mEos4b fusion protein is integrated into the iPSD as determined by gephyrin labelling. The authors do not perform functional tests of these receptors and do not state any such conclusions.

      (2) The authors show that GLRB-mEos4b is clearly detectable in the striatum and integrated into gephyrin clusters at a significantly higher rate in the ventral striatum compared to the dorsal striatum, which is in line with previous studies.

      (3) Adding to their quantification of GLRB-mEos4b in the striatum, the authors demonstrate the presence of glycinergic miniature IPSCs in the ventral striatum, and an almost complete absence of mIPSCs in the dorsal striatum. These currents support the observation that GLRB-mEos4b is more synaptically integrated in the ventral striatum compared to the dorsal striatum.

      (4) The authors show that lentiviral expression of GLRA1-mEos4b leads to a visually higher number of GLR clusters in cultured hippocampal neurons, and a co-localization of some clusters with gephyrin. The authors claim that this supports the idea that GLRA1 may be an important driver of synaptic glycine receptor localization. However, no quantification or statistical analysis of the number of puncta or their colocalization with gephyrin is provided for any of the expressed subunits. Such a claim should be supported by quantification and statistics 

      A thorough analysis and quantification of the data in Fig.5 has been carried out as requested by all the other reviewers (e.g. R1, point 3). The new data and results have been described in the revised manuscript (pp. 9-10, 13-14).

      - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      One unaddressed caveat is the fact that a GLRB-mEos4b fusion protein may behave differently in terms of localization and synaptic integration than wild-type GLRB. While unlikely, it is possible that mEos4b interacts either with itself or synaptic proteins in a way that changes the fused GLRB subunit’s localization. Such an effect would be unlikely to affect synaptic function in a measurable way, but might be detected at a structural level by highly sensitive methods such as SMLM and STORM in regions with very low molecule numbers (such as the hippocampus). Since reliable antibodies against GLRB in brain tissue sections are not available, this would be difficult to test. Considering that no functional measures of the hippocampal detections exist, we would suggest that this possible caveat be mentioned for this particular experiment.

      This question has also been raised before (R1, point 5). According to an earlier study the mEos4b-GlyRb knock-in does not cause any obvious phenotypes, with the possible exception of minor loss of glycine potency (Maynard et al. 2021 eLife). The fact that the synaptic levels in the spinal cord in heterozygous animals are precisely half of those of homozygous animals argues against differences in receptor expression, heteropentameric assembly, forward trafficking to the plasma membrane and integration into the synaptic membrane as confirmed using quantitative super-resolution CLEM (Maynard et al. 2021 eLife). Accordingly, we did not observe any behavioural deficits in these animals, making it a powerful experimental model. We have added this information in the revised manuscript (p. 4). 

      In addition, without any quantification or statistical analysis, the author’s claims regarding the necessity of GLRA1 expression for the synaptic localization of glycine receptors in cultured hippocampal neurons should probably be described as preliminary (Fig. 5).

      As mentioned before, we have substantially revised this part (R1, point 3). The quantification and analysis in the new Fig. 5 support our earlier interpretation.

      - Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      The authors show that there is colocalization of gephyrin with the mEos4b-GlyRβ subunit using the Dual-colour SMLM. This is a powerful approach that allows for a claim to be made on the synaptic location of the glycine receptors. The images presented in Figure 1, together with the distance analysis in Figure 2, display the co-localization of the fluorophores. The co-localization images in all the selected regions, hippocampus and striatum, also show detections outside of the gephyrin clusters, which the authors refer to as extrasynaptic. These punctated small clusters seem to have the same size as the ones detected and assigned as part of the synapse. It would be informative if the authors analysed the distribution, density and size of these nonsynaptic clusters and presented the data in the manuscript and also compared it against the synaptic ones. Validating this extrasynaptic signal by staining for a dendritic marker, such as MAP-2 or maybe a somatic marker and assessing the co-localization with the non-synaptic clusters would also add even more credibility to them being extrasynaptic. 

      The existence of extrasynaptic GlyRs is well attested in spinal cord neurons (e.g. Specht et al. 2013 Neuron; this study see Fig. S2). The fact that these appear as small clusters of detections in SMLM recordings results from the fact that a single fluorophore can be detected several times in consecutive image frames and because of blinking. Therefore, small clusters of detections likely represent single GlyRs (that can be counted), and not assemblies of several receptor complexes. Due to their diffusion in the neuronal membrane, they are seen as diffuse signals throughout the somatodendritic compartment in epifluorescence images (e.g. Fig. 5A). SMLM recordings of the same cells resolves this diffuse signal into discrete nanoclusters representing individual receptors (Fig. 5B). It is not clear what information co-localisation experiments with specific markers could provide, especially in hippocampal neurons, in which the copy numbers (and density) of GlyRs is next to zero.

      In addition we would encourage the authors to quantify the clustering and co-localization of virally expressed GLRA1, GLRA2, and GLRB with gephyrin in order to support the associated claims (Fig. 5). Preferably, the density of GLR and gephyrin clusters (at least on the somatic surface, the proximal dendrites, or both) as well as their co-localization probability should be quantified if a causal claim about subunit-specific requirements for synaptic localization is to be made.

      Quantification of the data have been carried out (new Fig.5C,D). The results have been described before (R1, point 3) and support our earlier interpretation of the data (pp. 13-14).

      Lastly, even though it may be outside of the scope of such a study analysing other parts of the hippocampal area could provide additional important information. If one looks at the Allen Institute’s ISH of the beta subunit the strongest signal comes from the stratum oriens in the CA1 for example, suggesting that interneurons residing there would more likely have a higher expression of the glycine receptors. This could also be assessed by looking more carefully at the single cell transcriptomics, to see which cell types in the hippocampus show the highest mRNA levels. If the authors think that this is too much additional work, then perhaps a mention of this in the discussion would be good. 

      We have added the requested information from the ISH database of the Allen Institute in the discussion as suggested by the reviewer (p. 12). However, in combination with the transcriptomic data (Fig. S1) our finding strongly suggest that the expression of synaptic GlyRs depends on the availability of alpha subunits rather than on the presence of the GlyRb transcript. This is obvious when one compares the mRNA levels in the hippocampus with those in the basal ganglia (striatum) and medulla. While the transcript concentrations of GlyRb are elevated in all three regions and essentially the same, our data show that the GlyRb copy numbers at synapses differ over more than 2 orders of magnitude (Fig. 1B, Table 1). 

      - Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Since the labeling and some imaging has been performed already, the requested experiment would be a matter of deploying a method of quantification. In principle, it should not require any additional wet-lab experiments, although it may require additional imaging of existing samples.

      - Are the data and the methods presented in such a way that they can be reproduced?

      Yes, for the most part.

      - Are the experiments adequately replicated and statistical analysis adequate?

      Yes

      Minor comments:

      - Specific experimental issues that are easily addressable.

      N/A

      - Are prior studies referenced appropriately?

      Yes

      - Are the text and figures clear and accurate?

      Yes, although quantification in figure 5 is currently not present.

      A quantification has been added (see R1, point 3).

      - Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      This paper presents a method that could be used to localize receptors and perhaps other proteins that are in low abundance or for which a detailed quantification is necessary. I would therefore suggest that Figure S4 is included into Figure 2 as the first panel, showcasing the demixing, followed by the results. 

      We agree in principle with this suggestion. However, the revised Fig. S4 is more complex and we think that it would distract from the data shown in Fig. 2. Given that Fig. S4 is mostly methodological and not essential to understand the text, we have kept it in the supplement for the time being. We leave the final decision on this point to the editor.

      Reviewer #4-2 (Significance): 

      [This review was supplied later]

      - Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      Using a novel and high resolution method, the authors have provided strong evidence for the presence of glycine receptors in the murine hippocampus and in the dorsal striatum. The number of receptors calculated is small compared to the numbers found in the ventral striatum. This is the first study to quantify receptor numbers in these region. In addition it also lays a roadmap for future studies addressing similar questions. 

      - Place the work in the context of the existing literature (provide references, where appropriate).

      This is done well by the authors in the curation of the literature. As stated above, the authors have filled a gap in the presence of glycine receptors in different brain regions, a subject of importance in understanding the role they play in brain activity and function. 

      - State what audience might be interested in and influenced by the reported findings.

      Neuroscientists working at the synaptic level, on inhibitory neurotransmission and on fundamental mechanisms of expression of genes at low levels and their relationship to the presence of the protein would be interested. Furthermore, researchers in neuroscience and cell biology may benefit from and be inspired by the approach used in this manuscript, to potentially apply it to address their own aims. 

      We thank the reviewer for the positive assessment of the technical and biological implications of our work, as well as the interest of our findings to a wide readership of neuroscientists and cell biologists. 

      - Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Synaptic transmission, inhibitory cells and GABAergic synapses functionally and structurally, cortex and cortical circuits. No strong expertise in super-resolution imaging methods.

    1. eLife Assessment

      This important study characterizes with rigorous methodology anatomical and functional aspects of the peripheral innervation of the Drosophila male reproductive tract. The convincing analysis reveals two distinct types of glutamatergic neurons that co-release either serotonin or octopamine. While serotonergic neurons are required for male fertility, octopaminergic neurons are dispensable. The work is providing invaluable insight into neurochemical control of insemination, peripheral motor control and neuromodulation in the male reproductive tract.

    2. Reviewer #1 (Public review):

      Summary:

      This very thorough anatomical study addresses the innervation of the Drosophila male reproductive tract. Two distinct glutamatergic neuron types were classified: serotonergic (SGNs) and octopaminergic (OGNs). By expansion microscopy, it was established that glutamate and serotonin /octopamine are co-released. The expression of different receptors for 5-HT and OA in muscles and epithelial cells of the innervation target organs was characterized. The pattern of neurotransmitter receptor expression in the target organs suggests that seminal fluid and sperm transport and emission are subjected to complex regulation. While silencing of abdominal SGNs leads to male infertility and prevents sperm from entering the ejaculatory duct, silencing of OGNs does not render males infertile.

      Strengths:

      The studied neurons were analysed with different transgenes and methods, as well as antibodies against neurotransmitter synthesis enzymes, building a consistent picture of their neurotransmitter identity. The careful anatomical description of innervation patterns together with receptor expression patterns if the target organs provides a solid basis for advancing the understanding how seminal fluid and sperm transport and emission are subjected to complex regulation. The functional data showing that SGNs are required for male fertility and for the release of sperm from the seminal vesicle into the ejaculatory duct is convincing.

      Weaknesses:

      The functional analysis of the characterized neurons is not as comprehensive as the anatomical description and phenotypic characterization was limited to simple fertility assays. It is understandable that a full functional dissection is beyond the scope of the present work. The paper contains experiments showing neuron-independent peristaltic waves in the reproductive tract muscles, which are thematically not very well integrated into the paper. Although very interesting, one wonders if these experiments would not fit better into a future work that also explores these peristaltic waves and their interrelation with neuromodulation mechanistically.

      Comments on revisions:

      The manuscript has improved after fixing many small issues/errors. The new sections in the discussion are likewise adding to the quality of the manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      Cheverra et al. present a comprehensive anatomical and functional analysis of the motor neurons innervating the male reproductive tract in Drosophila melanogaster, addressing a gap in our understanding of the peripheral circuits underlying ejaculation and male fertility. They identify two classes of multi-transmitter motor neurons-OGNs (octopamine/glutamate) and SGNs (serotonin/glutamate)-with distinct innervation patterns across reproductive organs. The authors further characterize the differential expression of glutamate, octopamine, and serotonin receptors in both epithelial and muscular tissues of these organs. Behavioral assays reveal that SGNs are essential for male fertility, whereas OGNs and glutamatergic transmission are dispensable. This work provides a high-resolution map linking neuromodulatory identity to organ-specific motor control, offering a valuable framework to explore the neural basis of male reproductive function.

      Strengths:

      Through the use of an extensive set of GAL4 drivers and antibodies, this work successfully and precisely defines the neurons that innervate the male reproductive tract, identifying the specific organs they target and the nature of the neurotransmitters they release. It also characterizes the expression patterns and localization of the corresponding neurotransmitter receptors across different tissues. The authors describe two distinct groups of dual-identity neurons innervating the male reproductive tract: OGNs, which co-express octopamine and glutamate, and SGNs, which co-express serotonin and glutamate. They further demonstrate that the various organs within the male reproductive system differentially express receptors for these neurotransmitters. Based on these findings, the authors propose that a single neuron capable of co-releasing a fast-acting neurotransmitter along side a slower-acting one may more effectively synchronize and stagger events that require precise timing. This, together with the differential expression of ionotropic glutamate receptors and metabotropic aminergic receptors in postsynaptic muscle tissue, adds an additional layer of complexity to the coordinated regulation of fluid secretion, organ contractility, and directional sperm movement-all contributing to the optimization of male fertility.

      Weaknesses:

      One potential limitation of the study is the absence of information regarding the number of individuals examined for the various characterizations, which may weaken the strength of the conclusions. Another limitation may be the lack of quantitative analyses in the colocalization and morphological differentiation experiments. Nevertheless, the authors have indicated that such quantifications will be provided in a forthcoming publication; therefore, this should be considered only a partial limitation, as it is expected to be addressed in the near future.

      Wider context:

      This study delivers the first detailed anatomical map connecting multi-transmitter motor neurons with specific male reproductive structures. It highlights a previously unrecognized functional specialization between serotonergic and octopaminergic pathways and lays the groundwork for exploring fundamental neural mechanisms that regulate ejaculation and fertility in males. The principles uncovered here may help explain how males of Drosophila and other organisms adjust reproductive behaviors in response to environmental changes. Furthermore, by shedding light on how multi-transmitter systems operate in reproductive control, this model could provide insights into therapeutic targets for conditions such as male infertility and prostate cancer-where similar neuronal populations are involved in humans. Ultimately, this genetically accessible system serves as a powerful tool for uncovering how multi-transmitter neurons orchestrate coordinated physiological actions necessary for the functioning of complex organs.

    4. Reviewer #3 (Public review):

      Summary:

      This work provides an overview of the motor neuron landscape in the male reproductive system. Some work had been done to elucidate the circuits of ejaculation in the spine, as well as, the cord but this work fills a gap of knowledge at the level of the reproductive organs. Using complementary approaches the authors show that there are two types of motor neurons that are mutually exclusive: neurons that co-express octopamine and glutamate and neurons that co-express serotonin and glutamate. They also show evidence that both types of neurons express large dense core vesicles indicating that neuropeptides play a role in male fertility. This paper provides a thorough characterization of expression of the different glutamate, octopamine and serotonin receptors in the different organs and tissues of the male reproductive system. The differential expression in different tissues and organs allows building initial theories on the control of emission and expulsion. Additionally, the authors characterize the expression of synaptic proteins and the neuromuscular junction sites. On a mechanistic level, the authors show that neither octopamine/glutamate neuron transmission nor glutamate transmission in serotonin/glutamate neurons are required for male fertility. This final result is quite surprising and opens up many questions on how ejaculation is coordinated.

      Strengths:

      This work fills an important gap on characterization of innervation of the male reproductive system by providing an extensive characterization of the motor neurons and the potential receptors of motor neuron release.The authors show convincing evidence of glutamate/monoamine co-release and of mutual exclusivity of serotonin/glutamate and octopamine/glutamate neurons.

      Weaknesses:

      The experiment looking at peristaltic waves in the male organs is missing labeling of the different regions and quantification of the observed waves.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This very thorough anatomical study addresses the innervation of the Drosophila male reproductive tract. Two distinct glutamatergic neuron types were classified: serotonergic (SGNs) and octopaminergic (OGNs). By expansion microscopy, it was established that glutamate and serotonin /octopamine are co-released. The expression of different receptors for 5-HT and OA in muscles and epithelial cells of the innervation target organs was characterized. The pattern of neurotransmitter receptor expression in the target organs suggests that seminal fluid and sperm transport and emission are subjected to complex regulation. While silencing of abdominal SGNs leads to male infertility and prevents sperm from entering the ejaculatory duct, silencing of OGNs does not render males infertile. 

      Strengths: 

      The studied neurons were analysed with different transgenes and methods, as well as antibodies against neurotransmitter synthesis enzymes, building a consistent picture of their neurotransmitter identity. The careful anatomical description of innervation patterns together with receptor expression patterns of the target organs provides a solid basis for advancing the understanding of how seminal fluid and sperm transport and emission are subjected to complex regulation. The functional data showing that SGNs are required for male fertility and for the release of sperm from the seminal vesicle into the ejaculatory duct is convincing. 

      Weaknesses: 

      The functional analysis of the characterized neurons is not as comprehensive as the anatomical description, and phenotypic characterization was limited to simple fertility assays. It is understandable that a full functional dissection is beyond the scope of the present work. The paper contains experiments showing neuron-independent peristaltic waves in the reproductive tract muscles, which are thematically not very well integrated into the paper. Although very interesting, one wonders if these experiments would not fit better into a future work that also explores these peristaltic waves and their interrelation with neuromodulation mechanistically. 

      Reviewer #2 (Public review): 

      Summary: 

      Cheverra et al. present a comprehensive anatomical and functional analysis of the motor neurons innervating the male reproductive tract in Drosophila melanogaster, addressing a gap in our understanding of the peripheral circuits underlying ejaculation and male fertility. They identify two classes of multi-transmitter motor neurons-OGNs (octopamine/glutamate) and SGNs (serotonin/glutamate)-with distinct innervation patterns across reproductive organs. The authors further characterize the differential expression of glutamate, octopamine, and serotonin receptors in both epithelial and muscular tissues of these organs. Behavioral assays reveal that SGNs are essential for male fertility, whereas OGNs and glutamatergic transmission are dispensable. This work provides a high-resolution map linking neuromodulatory identity to organ-specific motor control, offering a valuable framework to explore the neural basis of male reproductive function. 

      Strengths: 

      Through the use of an extensive set of GAL4 drivers and antibodies, this work successfully and precisely defines the neurons that innervate the male reproductive tract, identifying the specific organs they target and the nature of the neurotransmitters they release. It also characterizes the expression patterns and localization of the corresponding neurotransmitter receptors across different tissues. The authors describe two distinct groups of dual-identity neurons innervating the male reproductive tract: OGNs, which co-express octopamine and glutamate, and SGNs, which co-express serotonin and glutamate. They further demonstrate that the various organs within the male reproductive system differentially express receptors for these neurotransmitters. Based on these findings, the authors propose that a single neuron capable of co-releasing a fast-acting neurotransmitter alongside a slower-acting one may more effectively synchronize and stagger events that require precise timing. This, together with the differential expression of ionotropic glutamate receptors and metabotropic aminergic receptors in postsynaptic muscle tissue, adds an additional layer of complexity to the coordinated regulation of fluid secretion, organ contractility, and directional sperm movement-all contributing to the optimization of male fertility. 

      Weaknesses: 

      The main weakness of the manuscript is the lack of detail in the presentation of the results. Specifically, all microscopy image figures are missing information about the number of samples (N), and in the case of colocalization experiments, quantitative analyses are not provided. Additionally, in the first behavioral section, it would be beneficial to complement the data table with figures similar to those presented later in the manuscript for consistency and clarity. 

      Wider context: 

      This study delivers the first detailed anatomical map connecting multi-transmitter motor neurons with specific male reproductive structures. It highlights a previously unrecognized functional specialization between serotonergic and octopaminergic pathways and lays the groundwork for exploring fundamental neural mechanisms that regulate ejaculation and fertility in males. The principles uncovered here may help explain how males of Drosophila and other organisms adjust reproductive behaviors in response to environmental changes. Furthermore, by shedding light on how multi-transmitter systems operate in reproductive control, this model could provide insights into therapeutic targets for conditions such as male infertility and prostate cancer, where similar neuronal populations are involved in humans. Ultimately, this genetically accessible system serves as a powerful tool for uncovering how multi-transmitter neurons orchestrate coordinated physiological actions necessary for the functioning of complex organs. 

      Reviewer #3 (Public review): 

      Summary: 

      This work provides an overview of the motor neuron landscape in the male reproductive system. Some work had been done to elucidate the circuits of ejaculation in the spine, as well as the cord, but this work fills a gap in knowledge at the level of the reproductive organs. Using complementary approaches, the authors show that there are two types of motor neurons that are mutually exclusive: neurons that co-express octopamine and glutamate and neurons that co-express serotonin and glutamate. They also show evidence that both types of neurons express large dense core vesicles, indicating that neuropeptides play a role in male fertility. This paper provides a thorough characterization of the expression of the different glutamate, octopamine, and serotonin receptors in the different organs and tissues of the male reproductive system. The differential expression in different tissues and organs allows building initial theories on the control of emission and expulsion. Additionally, the authors characterize the expression of synaptic proteins and the neuromuscular junction sites. On a mechanistic level, the authors show that neither octopamine/glutamate neuron transmission nor glutamate transmission in serotonin/glutamate neurons is required for male fertility. This final result is quite surprising and opens up many questions on how ejaculation is coordinated. 

      Strengths: 

      This work fills an important gap in the characterization of innervation of the male reproductive system by providing an extensive characterization of the motor neurons and the potential receptors of motor neuron release. The authors show convincing evidence of glutamate/monoamine co-release and of mutual exclusivity of serotonin/glutamate and octopamine/glutamate neurons. 

      Weaknesses: 

      (1) Often, it is mentioned that the expression is higher or lower or regional without quantification or an indication of the number of samples analysed. 

      (2) The experiment aimed at tracking sperm in the male reproductive system is difficult to interpret when it is not assessed whether ejaculation has occurred. 

      (3) The experiment looking at peristaltic waves in the male organs is missing labeling of the different regions and quantification of the observed waves. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) While the peripheral innervations are very carefully described, it is not clear to which SGNs and OGNs (i.e., cell bodies in the central nervous system) these innervations belong. Are SV, AG, and ED innervated by branches of one neuron or by separate neurons? Multi-color flip-out experiments could provide an answer to this. 

      We agree this is important and are planning these experiments for follow-up study.

      (2) In contrast, for the analysis of the VT19028 split line (Figure 9), only vnc and cell body images are shown. How do the arborisations of these split combinations look in the periphery? Are the same reproductive organs innervated as shown in Figure 2?

      Figure 9S3 was inadvertently omitted from the initial submission.  That figure is now included and shows that the VT019028 split broadly innervates the SV, AG, and ED.

      (3) In the discussion, I think it would be helpful to offer some potential explanations for the role of octopaminergic and glutamatergic signaling. If not required for basic fertility, they probably have some other role.

      Thank you, we have included speculation in the Discussion section "Potential for adaptation to environment".

      (4) Line 543: Figure 8S4 E, (not 8E). 

      Correction made.

      Reviewer #2 (Recommendations for the authors): 

      (1) Line 213-217 

      Comment:

      The use of "significantly less expression" may be misleading, as no quantification or statistical analysis is provided to support this comparison. 

      Suggestion:

      Consider using a more neutral term, such as "markedly less" or "noticeably less," unless quantitative data and statistical analysis are included to substantiate the claim.

      Good recommendation.This suggestion has been incorporated.

      (2) Line 264-267 

      Comment:

      The observation regarding the distinct morphology of SGNs and OGNs is interesting and could strengthen the argument regarding functional differences. 

      Suggestion: 

      Consider including a quantification of morphological complexity (e.g., branching) to support the claim. A method such as Sholl analysis (Sholl, 1953), as adapted in Fernández et al., 2008, could be applied. 

      This is a good suggestion, and we will consider it as part of a follow-up study.

      (3) Line 269-271 

      Comment:

      The anatomical context of the observation is not explicitly stated. 

      Suggestion:

      Add "in the ED" for clarity: "With the TRH-GAL4 experiment in the ED, vGlut-40XMYC (Figure 5S1, A and E) and 6XV5-vMAT (Figure 5S1, B and F) were both present with a highly overlapping distribution (Figure 5S1, I)." 

      Suggestion has been incorporated.

      (4) Line 275-276 

      Comment:

      The claim about the reduced ability to distinguish SGNs and OGNs in the ED would benefit from quantitative support. 

      Suggestion:

      Include a morphological comparison or quantification between SGNs and OGNs in the ED and SV to reinforce this point.

      Certain information on morphological comparison can be inferred within the images themselves, and we will include quantitation in a follow-up study.

      (5) Line 277-279 

      Comment:

      As with line 269, the anatomical site could be specified more clearly. 

      Suggestion: 

      Rephrase as: "With the Tdc2-GAL4 experiment in the ED, vGlut-40XMYC (Figure 5S1, M and Q) and 6XV5-vMAT (Figure 5S1, N and R) were both observed in a highly overlapping distribution (Figure 5S1, U)." 

      Suggestion has been incorporated.

      (6) Line 348-350 

      Comment:

      The phrase "significantly higher density" implies a statistical comparison that is not shown. 

      Suggestion:

      If no quantification is provided, replace with a qualitative term such as "visibly higher" or "notably more dense." Alternatively, add a quantitative analysis with statistical testing to justify the use of "significantly." 

      Suggestion has been incorporated.

      (7) Lines 415-458 (Section comment) 

      Comment:

      There appears to be differential localization of neurotransmitter receptor expression (glutamate in muscle vs. 5-HT in epithelium or neurons), which could have functional implications. 

      Suggestion:

      Expand this section to briefly discuss the differential localization patterns of these receptors and potential implications for signal transduction in male reproductive tissues. 

      (8) Lines 638-682 (Section comment) 

      Comment:

      The table summarizing fertility phenotypes would be more informative with additional detail on experimental outcomes. 

      Suggestion:

      Add a column showing the number of fertile males over the total tested (e.g., "n fertile / n total"). Also, clarify whether the fertility assays are identical to those reported in Figure 10S2, and whether similar analyses were conducted for females. Consider including a figure summarizing fertility results for all genotypes listed in the table, similar to Figure 10S2. 

      The fertility tests reported in Table 1 were separate from those reported in Figure 10S2.  For these tests, the results were clear-cut with 100% of males and females reported as infertile exhibiting the infertile phenotype.  For the males and females reported as fertile, it was also clear-cut with nearly 100% showing fertility at a high level.  In subsequent figures we attempted to assess degrees of fertility.

      (9) Line 724-727 

      Comment:

      There seems to be a mistake in the identification of the driver lines used to silence OA neurons. Also, figure references might be incorrect. 

      Suggestion:

      The OA neuron driver line should be corrected to "Tdc2-GAL4-DBD ∩ AbdB-AD" instead of TRH-GAL4. Additionally, the figure references should be verified; specifically, the letter "B" (in "Figure 10B, D" and "10B, E") appears to be unnecessary or misplaced.

      Thanks for catching this, the corrections have been made.

      (10) Line 872-877 

      Comment:

      The discussion on the co-release of fast-acting glutamate and slower aminergic neurotransmitters is interesting and well-articulated. However, it remains somewhat disconnected from the behavioral findings. 

      Suggestion:

      Consider linking this proposed mechanism to the results observed in the mating duration assays. For instance, the sequential action of neurotransmitters described here could potentially underlie the prolonged mating observed when specific neuromodulators are active, helping to functionally integrate molecular and behavioral data. 

      (11) Line 926-928 

      Comment:

      The interpretation of 5-HT7 receptor expression in the sphincter is compelling, suggesting a role in regulating its function. However, this anatomical observation could be further contextualized with the functional data. 

      Suggestion:

      It may strengthen the interpretation to explicitly connect this finding with the fertility assays, where SGNs - presumably acting via serotonergic signaling - are shown to be necessary for male fertility. This would support a functional role for 5-HT7 in reproductive success via sphincter regulation.

      This has been added. 

      (12) Figure 1 

      Comment:

      The figure legend is generally clear, but could benefit from more consistency and precision in the color-coded labeling. Additionally, the naming of some structures could be more explicit. 

      Suggestion: 

      Revise the figure and the legend as follows:

      Figure 1. The Drosophila male reproductive system. A) Schematic diagram showing paired testes (colour), SVs (green), AGs (purple), Sph (red), ED (gray), and EB (colour). B) Actual male reproductive system. Te - testes, SV - seminal vesicle, AG - accessory gland, Sph - singular sphincter, ED - ejaculatory duct, EB - ejaculatory bulb. Scale bar: 200 µm.

      This suggestion has been incorporated.

      (13) Figure 3S2 

      Comment:

      There appears to be a typographical error in the description of the genotypes, which may lead to confusion. 

      Suggestion:

      Correct the legend to reflect the appropriate genotypes:

      Figure 3S2. Expression of vGlut-LexA and Tdc2-GAL4 in the Drosophila male reproductive system. A, D, G, J, M, P) vGlut-LexA, LexAop-6XmCherry; B, E, H, K, N, Q) Tdc2-GAL4, UAS-6XGFP; C, F, I, L, O, R) Overlay. Scale bars: O - 50 µm; R - 10 µm.

      The corrections have been made.

      (14) Figure 3S3

      Comment:

      The genotypes for panels D and E appear to be incomplete; the DBD component of the split-GAL4 drivers is missing. 

      Suggestion:

      Update the figure legend to: 

      Figure 3S3. Fruitless and Doublesex expression in the Drosophila male reproductive system. A) fru-GAL4, UAS-6XGFP; B) vGlut-LexA, LexAop-6XmCherry; C) Overlay; D) Tdc2-AD ∩ dsx-GAL4-DBD; E) TRH-AD ∩ dsx-GAL4-DBD. Scale bar: 200 µm.

      The corrections have been made.

      (15) Figure 4S4 

      Comment: 

      There is a repeated segment in the figure legend, which makes it unclear and redundant. 

      Suggestion:

      Edit the legend to remove the duplicated lines: 

      Figure 4S4. Expression of vGlut, TβH-GFP, and 5-HT at the junction of the SV and AGs with the ED of the Drosophila male reproductive system. A) vGlut-40XV5; B) TβH-GFP; C) 5-HT; D) vGlut-40XV5, TβH-GFP overlay; E) vGlut-40XV5, 5-HT overlay; F) TβH-GFP, 5-HT overlay. Scale bar: 50 µm.

      The correction has been made.

      (16) Figure 6S5 

      Comment:

      Within this figure, the orientation and/or scale of the tissue varies noticeably between individual panels, making it difficult to directly compare the different experimental conditions. 

      Suggestion:

      For improved clarity and interpretability, consider standardizing the orientation and size of the tissue shown across all panels within the figure. Consistent presentation will facilitate direct comparisons between treatments or genotypes. 

      There is often variation in the size of the male reproductive organs. They were all acquired at the same magnification. The only point of this figure is there is no vGAT or vAChT at these NMJs and the result is unambiguously negative. 

      (17) Figure 10 

      Comment:

      Panel A appears redundant, as it shows the same information as the other panels but without indicating statistical significance. 

      Suggestion:

      Consider removing panel A and keeping only the remaining four graphs, which include relevant statistical comparisons and clearly show significant differences.

      We realize there is some redundancy of panel A with the other panels, but we feel there is value in having all the genotypes in a single panel for comparison.

      Reviewer #3 (Recommendations for the authors): 

      Here are some suggestions to improve the manuscript: 

      (1) Prot B GFP experiment: the authors should explain better the time chosen to look at the sperm content of the male reproductive system. At 10 minutes, it is expected that the male has already ejaculated, and therefore, a failure to ejaculate would result in more sperm in the reproductive system, not less. Since we are not certain when the male ejaculates, it would be important to do the analysis at different time points.

      In the Prot-GFP experiments, the 10-minute time point was chosen because we nearly always observe sperm in the ejaculatory duct of control males.  In the experimental males, we never observed sperm in the ejaculatory duct at this time point.  Also, no Prot-GFP sperm were observed in the reproductive tract of females mated to experimental males even when mating was allowed to go to completion, while abundant sperm were found in females mated to Prot-GFP controls.  Figure 10S1 has been updated to include Images of these female reproductive systems.  The results showing the absence of Prot-GFP sperm in the female reproductive tract mated to experimental males indicates sperm transfer in these males isn't occurring earlier during the copulation process than in control males and that we didn't miss it by only examining at the ejaculatory duct.

      (2) Discuss what may be the role of the octopamine/glutamate neurons and glutamate transmission in serotonin/glutamate neurons in the male reproductive system, given that they are not required for fertility (at least under the context in which it was tested). It is quite a striking result that deserves some attention. 

      We agree it is a surprising result and have included speculation on the role of glutamate and octopamine in male reproduction in the Discussion section "Potential for adaptation to environment".

      (3) Very important: 

      (a) Figure 3 is present in the Word document but not the PDF. 

      (b) Figure 9S3 is not present 

      (c) In Figure 5 X), the legend does not correspond to the panel.

      All of these corrections have been made. 

      (4) Other suggestions:

      (a) A summary schematic (or several) of the findings would make it an easier read.

      (b) Explain why the ejaculatory bulb was left out of the analysis.

      (c) Explain in the main text some of the tools, such as, BONT-C and the conditional vGlut mutation.

    1. eLife Assessment

      The authors employ an unbiased, affinity-guided reagent to label P2X7 receptor and use super-resolution imaging to monitor P2X7 redistribution in response to inflammatory signaling. The evidence is convincing and the study will be valuable to those studying the dynamics of receptor distribution and clustering.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors developed a chemical labeling reagent for P2X7 receptors, called X7-uP. This labeling reagent selectively labels endogenous P2X7 receptors with biotin based on ligand-directed NASA chemistry. After labeling the endogenous P2X7 receptor with biotin, the receptor can be fluorescently labeled with streptavidin-AlexaFluor647. The authors carefully examined the binding properties and labeling selectivity of X7-uP to P2X7, characterized the labeling site of P2X7 receptors, and demonstrated fluorescence imaging of P2X7 receptors. The data obtained by SDS-PAGE, Western blot, and fluorescence microscopy clearly shows that X7-uP labels the P2X7 receptor. Finally, the authors fluorescently labeled the endogenous P2X7 in BV2 cells, which are a murine microglia model, and used dSTORM to reveal a nanoscale P2X7 redistribution mechanism under inflammatory conditions at high resolution.

      Strengths:

      X7-uP selectively labels endogenous P2X7 receptors with biotin. Streptavidin-AlexaFluor647 binds to the biotin labeled to the P2X7 receptor, allowing visualization of endogenous P2X7 receptors.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Arnould et. al. develop an unbiased, affinity-guided reagent to label P2X7 receptor and use super-resolution imaging to monitor P2X7 redistribution in response to inflammatory signaling.

      Strengths:

      I think the X7-uP probe that they developed is very useful for visualizing localization of P2X7 receptor. They convincingly show that under inflammatory conditions, there is a reorganization of P2X7 localization into receptor clusters. Moreover, I think they have shown a very clever way to specifically label any receptor of interest. This has broad appeal.

      I think the authors have done a very nice job addressing my original concerns. Here are those original concerns and my new comments related to how the authors address them.

      (1) While the authors state that chemical modification of AZ10606120 to produce the X7-UP reagent has "minimal impact" on the inhibition of P2X7, we can see from Figure 2A and 2B that it does not antagonize P2X7 as effectively as the original antagonist. For the sake of completeness and quantitation, I think it would be great if the authors could determine the IC50 for X7-uP and compare it to the IC50 of AZ10606120.

      The authors now show the relative inhibition of X7-uP compared to AZ10606120 at different concentrations. This provides a nice comparison to give the reader an idea of how effectively X7-uP inhibits P2X7 receptor. This is great.

      (2) Do the authors know whether modification of the lysines with biotin affects the receptor's affinity for ATP (or ability to be activated by ATP)? What about P2X7 that has been modified with biotin and then labeled with Alexa 647? For the sake of completeness and quantitation, I think it would be great if the authors could determine the EC50 of biotinylated P2X7 for ATP as well as biotinylated and then Alexa 647 labeled P2X7 for ATP and compare these values to the affinity of unmodified WT P2X7 for ATP.

      I agree with the authors that assessing the functional integrity of P2X7 following biotinylation and fluorophore labeling is outside the scope of this paper but would be important for studies involving dynamic or post-labeling functional analyses such as live trafficking.

      (3) It is a little misleading to color the fluorescence signal from mScarlet green (for example, in Figure 3 and Figure 4). The fluorescence is not at the same wavelength as GFP. In fact, the wavelength (570 nm - 610 nm) for emission is closer to orange/red than to green. I think this color should be changed to differentiate the signal of mScarlet from the GFP signal used for each of the other P2X receptor subtypes.

      The authors have now changed the mScarlet color to orange, which solves my concern.

      (4) It is my understanding that P2X6 does not form homotrimers. Thus, I was a little surprised to see that the density and distribution of P2X6-GFP in Figure 3 looks very similar to the density and distribution of the other P2X subtypes. Do the authors have an explanation for this? Are they looking at P2X6 protomers inserted into the plasma membrane? Does the cell line have endogenous P2X receptor subtypes? Is Figure 3 showing heterotrimers with P2X6 receptor? A little explanation might be helpful.

      The authors address this point very well and include nice data to show that P2X6 does not insert into the plasma membrane as a homotrimer.

      (5) It is easy to overlook the fact that the antagonist leaves the binding pocket once the biotin has been attached to the lysines. It might be helpful if the authors made this a little more apparent in Figure 1 or in the text describing the NASA chemistry reaction.

      The authors have modified Figure 1 to make it easier to understand the NASA chemistry reaction.

      I congratulate the authors on an outstanding paper!

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary: 

      In this paper, the authors developed a chemical labeling reagent for P2X7 receptors, called X7-uP. This labeling reagent selectively labels endogenous P2X7 receptors with biotin based on ligand-directed NASA chemistry (Ref. 41). After labeling the endogenous P2X7 receptor with biotin, the receptor can be fluorescently labeled with streptavidin-AlexaFluor647. The authors carefully examined the binding properties and labeling selectivity of X7-uP to P2X7, characterized the labeling site of P2X7 receptors, and demonstrated fluorescence imaging of P2X7 receptors. The data obtained by SDS-PAGE, Western blot, and fluorescence microscopy clearly show that X7-uP labels the P2X7 receptor. Finally, the authors fluorescently labeled the endogenous P2X7 in BV2 cells, which are a murine microglia model, and used dSTORM to reveal a nanoscale P2X7 redistribution mechanism under inflammatory conditions at high resolution. 

      Strengths: 

      X7-uP selectively labels endogenous P2X7 receptors with biotin. Streptavidin-AlexaFluor647 binds to the biotin labeled to the P2X7 receptor, allowing visualization of endogenous P2X7 receptors. 

      We thank the reviewer for their positive comment.

      Weaknesses: 

      Weaknesses & Comments 

      (1) The P2X7 receptor exists in a trimeric form. If it is not a monomer under the conditions of the pull-down assay in Figure 2C, the quantitative values may not be accurate. 

      We thank the reviewer for this comment. As shown in Figure 2C, the band observed on the denaturing SDS-PAGE corresponds to the monomeric form of the P2X7 receptor. While we cannot exclude the presence of non-monomeric species under native conditions, no such higher-order forms are visible in the gel. This observation supports the conclusion that the quantitative values presented are based on the monomeric form and are therefore reliable.

      (2) In Figure 3, GFP fluorescence was observed in the cell. Are all types of P2X receptors really expressed on the cell surface ? 

      We thank the reviewer for this excellent comment, which was also raised by reviewer 2. To address this concern, we performed a commercial cell-surface protein biotinylation assay to assess whether GFP-tagged P2X receptors reach the plasma membrane. As expected, all P2X subtypes except P2X6 were detected at the cell surface in HEK293T cells, thereby validating our confocal fluorescence microscopy assay. These new data are now included in Figure 3 — figure supplement 1.

      (3) The reviewer was not convinced of the advantages of the approach taken in this paper, because the endogenous receptor labeling in this study could also be done using conventional antibody-based labeling methods. 

      We thank the reviewer for raising this important point and would like to highlight several advantages of our approach compared to conventional antibody-based labeling.

      First, commercially available P2X7 antibodies often suffer from poor specificity and are generally not suitable for reliably detecting endogenous P2X7 receptors, as documented in previous studies (e.g., PMID: 16564580 and PMID: 15254086). While recent advances have been made using nanobodies with improved specificity for P2X7 (e.g., PMID: 30074479 and PMID: 38953020), our strategy is distinct and complementary to nanobody-based approaches.

      Second, antibodies rely on non-covalent interactions with the receptor, which can result in dissociation over time. In contrast, our X7-uP probe covalently biotinylates lysine residues on the P2X7 receptor through stable amide bond formation. This covalent labeling ensures that the biotin moiety remains permanently attached, an advantage not afforded by reversible binding strategies.

      Third, by selectively biotinylating P2X7 receptors, our method provides a versatile platform for the chemical attachment of a wide range of probes or functional moieties. Although we did not demonstrate this application in the current study, we believe this modularity represents an additional advantage of our approach.

      We have now revised the discussion to highlight these key advantages, allowing the reader to form their own opinion. We hope this addresses the reviewer’s concerns and clarifies the benefits of our approach.

      (4) Although P2X7 was successfully labeled in this paper, it is not new as a chemistry. There is a need for more attractive functional evaluation such as live trafficking analysis of endogenous P2X7. 

      We agree with the reviewer that the underlying chemistry is not novel per se. However, to our knowledge, it has not previously been applied to the P2X7 receptor, and thus constitutes a novel application with specific relevance for studying native P2X7 biology.

      We also appreciate the reviewer’s suggestion regarding live trafficking analysis of endogenous P2X7. While this is indeed a valuable and interesting direction, we believe it lies beyond the scope of the present study, as it would first require demonstrating that the labeling itself does not affect P2X7 function (see below). This important step would necessitate additional experiments, which we consider more appropriate for a follow-up investigation.

      (5) The reviewer has concerns that the use of the large-size streptavidin to label the P2X7 receptor may perturbate the dynamics of the receptor. 

      We thank the reviewer for raising this important point. Although we did not directly measure receptor dynamics, it is indeed possible that tetrameric streptavidin (tStrept-A 647) could promote P2X7 clustering by cross-linking nearby receptors due to its tetravalency (see also point 7 raised by the reviewer). To address this concern, we performed additional dSTORM experiments using a monomeric form of streptavidin-Alexa 647 (mSA) (see PMID: 26979420). Owing to its reduced size and lack of tetravalency, mSA has been shown to minimize artificial crosslinking of synaptic receptors (PMID: 26979420). A drawback of using mSA, however, is that the monomeric form carries only two fluorophores (estimated degree of labeling, DOL ≈ 2, PMID: 26979420), whereas the tetrameric form, according to the manufacturer’s certificate of analysis (Invitrogen S21374), has an average DOL of three fluorophores per monomer, resulting in a total of ~12 fluorophores per streptavidin.

      We tested three conditions with mSA incubation: (i) control BV2 cells (without X7-uP), (ii) untreated X7-uP-labeled BV2 cells, and (iii) X7-uP-labeled BV2 cells treated with LPS and ATP (using the same concentrations and incubation times described in the manuscript). As shown in Author response image 1, only LPS+ATP treatment induced a clear increase in the mean cluster density compared to quiescent (untreated) BV2 cells. This effect closely matches the results obtained with tStrept-A 647, supporting the conclusion the tetrameric streptavidin does not artificially promote P2X7 clustering. It is also possible that the cellular environment of BV2 microglia differs from the confined architecture of synapses, which may further explain why cross-linking effects are less pronounced in our system.

      As expected, the overall fluorescence signal with mSA was about tenfold lower than with tStrept-A 647, consistent with the expected fluorophore stoichiometry. This lower signal may explain why the values for the untreated condition appeared slightly higher than for the control, although the difference was not statistically significant (P = 0.1455).

      We hope these additional experiments adequately address the reviewer’s concerns.

      Author response image 1.

      BV2 labeling with monomeric streptavidin–Alexa 647 (mSA).(A) Bright-field and dSTORM images of BV2 cells labeled with mSA in the presence (untreated and LPS+ATP) or absence (control) of 1 µM X7-uP. Treatment: LPS (1 µg/mL for 24 hours) and ATP (1 mM for 30 minutes). Scale bars, 10 µm. Insets: Magnified dSTORM images. Scale bars, 1 µm.(B) Quantification of the number of localizations (n = 2 independent experiments). Bars represent mean ± s.e.m. One-way ANOVA with Tukey’s multiple comparisons (P values are indicated above the graph).

      (6) It is better to directly label Alexa647 to the P2X7 receptor to avoid functional perturbation of P2X7. 

      Directly labeling of Alexa647 to the P2X7 receptor would require the design and synthesis of a novel probe, which is currently not available. Implementing such a strategy would involve substantial new experimental work that lies beyond the scope of the present study.

      (7) In all imaging experiments, the addition of streptavidin, which acts as a cross-linking agent, may induce P2X7 receptor clustering. This concern would be dispelled if the receptors were labeled with a fluorescent dye instead of biotin and observed. 

      We refer the reviewer to our response in point 5, where we addressed this concern by comparing tetrameric and monomeric streptavidin conjugates. As noted above (see also point 6), directly labeling the receptor with a fluorescent dye would require the development of a new probe, which is outside the scope of the present study.

      (8) There are several mentions of microglia in this paper, even though they are not used. This can lead to misunderstanding for the reader. The author conducted functional analysis of the P2X7 receptor in BV-2 cells, which are a model cell line but not microglia themselves. The text should be reviewed again and corrected to remove the misleading parts that could lead to misunderstanding. e.g. P8. lines 361-364

      First, it combines N-cyanomethyl NASA chemistry with the high-affinity AZ10606120 ligand, enabling rapid labeling in microglia (within 10 min)

      P8. lines 372-373 

      Our results not only confirm P2X7 expression in microglia, as previously reported (6, 26-33), but also reveal its nanoscale localization at the cell surface using dSTORM. 

      We agree with the reviewer’s comment. We have now modified the text, including the title.

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript, Arnould et. al. develop an unbiased, affinity-guided reagent to label P2X7 receptor and use super-resolution imaging to monitor P2X7 redistribution in response to inflammatory signaling. 

      Strengths: 

      I think the X7-uP probe that they developed is very useful for visualizing localization of P2X7 receptor. They convincingly show that under inflammatory conditions, there is a reorganization of P2X7 localization into receptor clusters. Moreover, I think they have shown a very clever way to specifically label any receptor of interest. This has broad appeal 

      We thank the reviewer for their positive comment.

      Weaknesses: 

      Overall, the manuscript is novel and interesting. However, I do have some suggestions for improvement. 

      (1) While the authors state that chemical modification of AZ10606120 to produce the X7-UP reagent has "minimal impact" on the inhibition of P2X7, we can see from Figure 2A and 2B that it does not antagonize P2X7 as effectively as the original antagonist. For the sake of completeness and quantitation, I think it would be great if the authors could determine the IC50 for X7-uP and compare it to the IC50 of AZ10606120. 

      We thank the reviewer for this insightful comment. Unfortunately, due to the limited availability of X7-uP, we were not able to establish a complete concentration–response curve to determine its IC<sub>50</sub>, which would require testing at concentrations >1 µM. Nevertheless, to estimate the effect of the modification, we assessed current inhibition at 300 µM X7-uP and compared it with the reported IC<sub>50</sub> of AZ10606120 (10 nM). Under these conditions, both compounds produced a similar level of inhibition, indicating that while the chemical modification reduces potency relative to AZ10606120, X7-uP still functions as an effective probe for P2X7. We have now included these data in Figure 2 and revised the text accordingly.

      (2) Do the authors know whether modification of the lysines with biotin affects the receptor's affinity for ATP (or ability to be activated by ATP)? What about P2X7 that has been modified with biotin and then labeled with Alexa 647? For the sake of completeness and quantitation, I think it would be great if the authors could determine the EC50 of biotinylated P2X7 for ATP as well as biotinylated and then Alexa 647 labeled P2X7 for ATP and compare these values to the affinity of unmodified WT P2X7 for ATP.

      We thank the reviewer for raising this important point. At present, we have not determined whether modification of lysine residues with biotin, or subsequent labeling with Alexa647, affects the ATP sensitivity or functional properties of P2X7. However, we believe this does not impact the conclusions of the current study, as all functional assays were conducted prior to X7-uP labeling. The labeling is used here as a terminal "snapshot" to visualize the endogenous receptor without interfering with the functional characterization.

      We fully agree that assessing the functional integrity of P2X7 following biotinylation and fluorophore labeling—such as by determining the EC<sub>50</sub> for ATP—would be essential for studies involving dynamic or post-labeling functional analyses, such as live trafficking. However, as noted earlier in our response to Reviewer 1 (point 4), these experiments lie beyond the scope of the current study.

      (3) It is a little misleading to color the fluorescence signal from mScarlet green (for example, in Figure 3 and Figure 4). The fluorescence is not at the same wavelength as GFP. In fact, the wavelength (570 nm - 610 nm) for emission is closer to orange/red than to green. I think this color should be changed to differentiate the signal of mScarlet from the GFP signal used for each of the other P2X receptor subtypes. 

      As suggested, we changed the mScarlet color to orange for all relevant figures.

      (4) It is my understanding that P2X6 does not form homotrimers. Thus, I was a little surprised to see that the density and distribution of P2X6-GFP in Figure 3 looks very similar to the density and distribution of the other P2X subtypes. Do the authors have an explanation for this? Are they looking at P2X6 protomers inserted into the plasma membrane? Does the cell line have endogenous P2X receptor subtypes? Is Figure 3 showing heterotrimers with P2X6 receptor? A little explanation might be helpful.

      We thank the reviewer for raising this important point. Indeed, it is well established that P2X6 does not form functional channels, which supports the conclusion that it does not form homotrimeric complexes. Although previous studies have shown that P2X6–GFP expression is generally lower, more diffuse, and not efficiently targeted to the cell surface compared with other P2X subtypes (see PMID: 12077178), the similar fluorescence distribution and density observed in our Figure 3 do not imply that P2X6 forms homotrimers.

      We did not directly assess the presence of endogenous P2X6 in our HEK293T cells; however, according to the Human Protein Atlas, there is no detectable P2X6 RNA expression in HEK293 cells (nTPM = 0), indicating that endogenous P2X6 is not expressed in this cell line. To further investigate surface expression (see also point 2 of reviewer 1), we performed a commercial cell-surface protein biotinylation assay to assess whether GFP-tagged P2X6 reaches the plasma membrane. As expected, P2X6 was not detected at the cell surface in HEK293T cells, whereas GFP-tagged P2X1 to P2X5 were readily detected. These results further support the conclusion that P2X6 does not insert into the plasma membrane as a homotrimer, thereby validating our confocal fluorescence microscopy assay. These new data are now included in Figure 3 — figure supplement 1.

      (5) It is easy to overlook the fact that the antagonist leaves the binding pocket once the biotin has been attached to the lysines. It might be helpful if the authors made this a little more apparent in Figure 1 or in the text describing the NASA chemistry reaction.

      We thank the reviewer for this insightful suggestion. To address this, we have modified Figure 1A and updated the legend.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript describes the development of a covalent labeling probe (X7-uP) that selectively targets and tags native P2X7 receptors at the plasma membrane of BV2 microglial cells. Using super-resolution imaging (dSTORM), the authors demonstrate that P2X7 receptors form nanoscale clusters upon microglial activation by lipopolysaccharide (LPS) and ATP, correlating with synergistic IL-1β release. These findings advance understanding of P2X7 reorganization during inflammation and provide a generalizable labeling strategy for monitoring endogenous P2X7 in immune cells. 

      Strengths: 

      (1) The authors designed X7-uP by coupling a high-affinity, P2X7-specific antagonist (AZ10606120) with N-cyanomethyl NASA chemistry to achieve site-directed biotinylation. This approach offers high specificity, minimal off-target reactivity, and a straightforward pull-down/imaging readout. 

      (2) The results connect P2X7's nanoscale clustering directly with IL-1β secretion in microglia, reinforcing the role of P2X7 in inflammation. By localizing endogenous P2X7 at single-molecule resolution, the authors reveal how LPS priming and ATP stimulation synergistically reorganize the receptor. 

      (3) The authors systematically validate their method in recombinant systems (HEK293 cells) and in BV2 cells, showing selective inhibition, mutational confirmation of the binding site, and Western blot pulldown experiments.

      We thank the reviewer for their positive comment.

      Weaknesses: 

      (1) While the data strongly indicate that P2X7 clustering contributes to IL-1β release, the manuscript would benefit from additional experiments (if feasible) or discussion on how receptor clustering interfaces with downstream inflammasome assembly. Clarification of whether the P2X7 clusters physically colocalize with known inflammasome proteins would solidify the mechanism. 

      We thank the reviewer for this valuable suggestion. Determining the physical colocalization of P2X7 clusters with known inflammasome components would provide important insight into the molecular partners involved in inflammasome activation. However, we believe that such an investigation would constitute a substantial study on its own and therefore lies beyond the scope of the present work.

      Nevertheless, in response to the reviewer’s suggestion, we have added a short paragraph at the end of the Discussion section addressing potential mechanisms by which P2X7 clustering may contribute to downstream inflammasome activation. We also revised the text to tone down the hypothesis of physical colocalization.

      (2) The authors might expand on the scope of X7-uP in other native cells that endogenously express P2X7 (e.g., macrophages, dendritic cells). Although they mention the possibility, demonstrating the probe's applicability in at least one other primary immune cell type would strengthen its general utility. 

      We thank the reviewer for this valuable suggestion. Again, we believe that such an investigation would constitute a substantial study on its own and therefore lies beyond the scope of the present work.

      (3) The authors do include appropriate negative controls, yet providing additional details (e.g., average single-molecule on-time or blinking characteristics) in supplementary materials could help readers assess cluster calculations. 

      As suggested, we have included additional data showing single-molecule blinking events in untreated and LPS+ATP-treated BV2 cells, along with the corresponding movies. The data are now presented in Figure 5—supplement figure 3A and B and Figure 5—Videos 1 and 2.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors): 

      (1) On line 96, the authors refer to the "ballast" domain of P2X7 receptor but do not cite the original article from which this nomenclature originated (McCarthy et al., 2019, Cell). This article should be cited to give appropriate credit. 

      Done.

      (2) On line 602, the authors state that they use models from PDB 1MK5 and 6U9W to generate the cartoons in Figure 6. The manuscripts from which these PDB files were generated need to be appropriately cited. 

      Done.

      (3) On line 319, the authors say "300 mM BzATP" but I think they mean 300 uM.

      Done. Thank you for catching the typo.

      Reviewer #3 (Recommendations for the authors): 

      Overall, excellent data quality. The paper would benefit from a discussion of the physiological implications of clustering. It would also be helpful to elaborate about the potential mechanisms for clustering: diffusion and/or insertion. Finally, the authors should comment on work by Mackinnon's (PMID: 39739811) and Santana lab (PMID: 31371391) on two distinct models for clustering of proteins. 

      As suggested by the reviewer, we have revised the discussion to incorporate their comments. First, we have added the following text:

      “Upon BV2 activation, we observed significant nanoscale reorganization of P2X7. Both LPS and ATP (or BzATP) trigger P2X7 upregulation and clustering, increasing the overall number of surface receptors and the number of receptors per cluster, from one to three (Figure 6). By labeling BV2 cells with X7-uP shortly after IL-1b release, we were able to correlate the nanoscale distribution of P2X7 with the functional state of BV2 cells, consistent with the two-signal, synergistic model for IL-1b secretion observed in microglia and other cell types (Ferrari et al, 1996; Perregaux et al, 2000; Ferrari et al, 2006; Di Virgilio et al, 2017; He et al, 2017; Swanson et al, 2019). In this model, LPS priming leads to intracellular accumulation of pro-IL-1b, while ATP stimulation activates P2X7, triggering NLRP3 inflammasome activation and the subsequent release of mature IL-1b.

      What is the mechanism underlying P2X7 upregulation that leads to an overall increase in surface receptors—does it result from the lateral diffusion of previously masked receptors already present at the plasma membrane, or from the insertion of newly synthesized receptors from intracellular pools in response to LPS and ATP? Although our current data do not distinguish between these possibilities, a recent study suggests that the a1 subunit of the Na<sup>+</sup>/K</sup>+</sup>-ATPase (NKAa1) forms a complex with P2X7 in microglia, including BV2 cells, and that LPS+ATP induces NKAa1 internalization (Huang et al, 2024). This internalization appears to release P2X7 from NKAa1, allowing P2X7 to exist in its free form. We speculate that the internalization of NKAa1 induced by both LPS and ATP exposes previously masked P2X7 sites, including the allosteric AZ10606120 sites, thus making them accessible for X7-uP labeling.”

      Second, we have added a short paragraph at the end of the Discussion section addressing potential mechanisms by which P2X7 clustering may contribute to downstream inflammasome activation:

      “What mechanisms underlie P2X7 clustering in response to inflammatory signals? Several models have been proposed to explain membrane protein clustering, including recruitment to structural scaffolds (Feng & Zhang, 2009), partitioning into membrane domains enriched in specific chemical components such as lipid rafts (Simons & Ikonen, 1997), and self-assembly mechanisms (Sieber et al, 2007). These self-assembly mechanisms include an irreversible stochastic model (Sato et al, 2019) and a more recent reversible self-oligomerization model which gives rise to higher-order transient structures (HOTS) (Zhang et al, 2025). Supported by cryogenic optical localization microscopy with very high resolution (~5 nm), the HOTS model has been observed in various membrane proteins, including ion channels and receptors (Zhang et al, 2025). Furthermore, HOTS are suggested to be dynamically modulated and to play a functional role in cell signaling, potentially influencing both physiological and pathological processes (Zhang & MacKinnon, 2025). While this hypothesis is compelling, our current dSTORM data lack sufficient spatial resolution to confirm whether P2X7 trimers form HOTS via self-oligomerization. Further biophysical and ultra-high-resolution imaging studies are required to test this model in the context of P2X7 clustering.”

    1. eLife Assessment

      This fundamental manuscript provides compelling evidence that BK and CaV1.3 channels can co-localize as ensembles early in the biosynthetic pathway, including within the ER and Golgi. The findings, supported by a range of imaging and proximity assays, offer insights into channel organization in both heterologous and endogenous systems. The data substantiate the central claims, while highlighting intriguing mechanistic questions for future studies: the determinants of mRNA co-localization, the temporal dynamics of ensemble trafficking, and the physiological implications of pre-assembly for channel function at the plasma membrane.

    2. Reviewer #1 (Public review):

      Summary:

      The co-localization of large conductance calcium- and voltage activated potassium (BK) channels with voltage-gated calcium channels (CaV) at the plasma membrane is important for the functional role of these channels in controlling cell excitability and physiology in a variety of systems. An important question in the field is where and how do BK and CaV channels assemble as 'ensembles' to allow this coordinated regulation - is this through preassembly early in the biosynthetic pathway, during trafficking to the cell surface or once channels are integrated into the plasma membrane. These questions also have broader implications for assembly of other ion channel complexes. Using an imaging based approach, this paper addresses the spatial distribution of BK-CaV ensembles using both overexpression strategies in tsa201 and INS-1 cells and analysis of endogenous channels in INS-1 cells using proximity ligation and superesolution approaches. In addition, the authors analyse the spatial distribution of mRNAs encoding BK and Cav1.3. The key conclusion of the paper that BK and CaV1.3 are co-localised as ensembles intracellularly in the ER and Golgi is well supported by the evidence. The experiments and analysis are carefully performed and the findings are very well presented.

    3. Reviewer #3 (Public review):

      Summary:

      The authors present a clearly written and beautifully presented piece of work demonstrating clear evidence to support the idea that BK channels and Cav1.3 channels can co-assemble prior to their assertion in the plasma membrane.

      Strengths:

      The experimental records shown back up their hypotheses and the authors are to be congratulated for the large number of control experiments shown in the ms.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Pournejati et al investigates how BK (big potassium) channels and CaV1.3 (a subtype of voltage-gated calcium channels) become functionally coupled by exploring whether their ensembles form early-during synthesis and intracellular trafficking-rather than only after insertion into the plasma membrane. To this end, the authors use the PLA technique to assess the formation of ion channel associations in the different compartments (ER, Golgi or PM), single-molecule RNA in situ hybridization (RNAscope), and super-resolution microscopy.

      Strengths:

      The manuscript is well written and addresses an interesting question, combining a range of imaging techniques. The findings are generally well-presented and offer important insights into the spatial organization of ion channel complexes, both in heterologous and endogenous systems.

      Weaknesses:

      The authors have improved their manuscript after revisions, and some previous concerns have been addressed.

      Still, the main concern about this work is that the current experiments do not quantitatively or mechanistically link the ensembles observed intracellularly (in the endoplasmic reticulum (ER) or Golgi) to those found at the plasma membrane (PM). As a result, it is difficult to fully integrate the findings into a coherent model of trafficking. Specifically, the manuscript does not address what proportion of ensembles detected at the PM originated in the ER. Without data on the turnover or halflife of these ensembles at the PM, it remains unclear how many persist through trafficking versus forming de novo at the membrane. The authors report the percentage of PLApositive ensembles localized to various compartments, but this only reflects the distribution of pre-formed ensembles. What remains unknown is the proportion of total BK and Ca<sub>V</sub>1.3 channels (not just those in ensembles) that are engaged in these complexes within each compartment. Without this, it is difficult to determine whether ensembles form in the ER and are then trafficked to the PM, or if independent ensemble formation also occurs at the membrane. To support the model of intracellular assembly followed by coordinated trafficking, it would be important to quantify the fraction of the total channel population that exists as ensembles in each compartment. A comparable ensemble-to-total ratio across ER and PM would strengthen the argument for directed trafficking of pre-assembled channel complexes.

      We appreciate the reviewer’s thoughtful comment and agree that quantitatively linking intracellular hetero-clusters to those at the plasma membrane is an important and unresolved question. Our current study does not determine what proportion of ensembles at the plasma membrane originated during trafficking. It also does not quantify the fraction of total BK and Ca<sub>V</sub>1.3 channels engaged in these complexes within each compartment. Addressing this requires simultaneous measurement of multiple parameters—total BK channels, total Ca<sub>V</sub>1.3 channels, hetero-cluster formation (via PLA), and compartment identity—in the same cell. This is technically challenging. The antibodies used for channel detection are also required for the proximity ligation assay, which makes these measurements incompatible within a single experiment.

      To overcome these limitations, we are developing new genetically encoded tools to enable real-time tracking of BK and Ca<sub>V</sub>1.3 dynamics in live cells. These approaches will enable us to monitor channel trafficking and the formation of hetero-clusters, as detected by colocalization. This kind of experiments will provide insight into their origin and turnover. While these experiments are beyond the scope of the current study, the findings in our current manuscript provide the first direct evidence that BK and CaV channels can form hetero-clusters intracellularly prior to reaching the plasma membrane. This mechanistic insight reveals a previously unrecognized step in channel organization and lays the foundation for future work aimed at quantifying ensemble-to-total ratios and determining whether coordinated trafficking of pre-assembled complexes occurs.

      This limitation is acknowledged in the discussion section, page 23. It reads: “Our findings highlight the intracellular assembly of BK-Ca<sub>V</sub>1.3 hetero-clusters, though limitations in resolution and organelle-specific analysis prevent precise quantification of the proportion of intracellular complexes that ultimately persist on the cell surface.”

      Reviewer #2 (Public review):

      Summary:

      The co-localization of large conductance calcium- and voltage activated potassium (BK) channels with voltage-gated calcium channels (CaV) at the plasma membrane is important for the functional role of these channels in controlling cell excitability and physiology in a variety of systems.

      An important question in the field is where and how do BK and CaV channels assemble as 'ensembles' to allow this coordinated regulation - is this through preassembly early in the biosynthetic pathway, during trafficking to the cell surface or once channels are integrated into the plasma membrane. These questions also have broader implications for assembly of other ion channel complexes

      Using an imaging based approach, this paper addresses the spatial distribution of BKCaV ensembles using both overexpression strategies in tsa201 and INS-1 cells and analysis of endogenous channels in INS-1 cells using proximity ligation and superesolution approaches. In addition, the authors analyse the spatial distribution of mRNAs encoding BK and Cav1.3.

      The key conclusion of the paper that BK and Ca<sub>V</sub>1.3 are co-localised as ensembles intracellularly in the ER and Golgi is well supported by the evidence.However, whether they are preferentially co-translated at the ER, requires further work. Moreover, whether intracellular pre-assembly of BK-Ca<sub>V</sub>1.3 complexes is the major mechanism for functional complexes at the plasma membrane in these models requires more definitive evidence including both refinement of analysis of current data as well as potentially additional experiments.

      The reviewer raises the question of whether BK and Ca<sub>V</sub>1.3 channels are preferentially co-translated. In fact, I would like to propose that co-translation has not yet been clearly defined for this type of interaction between ion channels. In our current work, we 1) observed the colocalization between BK and Ca<sub>V</sub>1.3 mRNAs and 2) determined that 70% of BK mRNA in active translation also colocalizes with Ca<sub>V</sub>1.3 mRNA. We think these results favor the idea of translational complexes that can underlie the process of co-translation. However, and in total agreement with the Reviewer, the conclusion that the mRNA for the two ion channels is cotranslated would require further experimentation. For instance, mRNA coregulation is one aspect that could help to define co-translation. 

      To avoid overinterpretation, we have revised the manuscript to remove references to “co-translation” in the Results section and included the word “potential” when referring to co-translation in the Discussion section. We also clarified the limitations of our evidence in the Discussion that can be found on page 25: “It is important to note that while our data suggest mRNA coordination, additional experiments are required to directly assess co-translation.”

      Strengths & Weaknesses

      (1) Using proximity ligation assays of overexpressed BK and CaV1.3 in tsa201 and INS1 cells the authors provide strong evidence that BK and CaV can exist as ensembles (ie channels within 40 nm) at both the plasma membrane and intracellular membranes, including ER and Golgi. They also provide evidence for endogenous ensemble assembly at the Golgi in INS-1 cells and it would have been useful to determine if endogenous complexes are also observe in the ER of INS-1 cells. There are some useful controls but the specificity of ensemble formation would be better determined using other transmembrane proteins rather than peripheral proteins (eg Golgi 58K).

      We thank the reviewer for their thoughtful feedback and for recognizing the strength of our proximity ligation assay data supporting BK–Ca<sub>V</sub>1.3 hetero-clusters formation at both the plasma membrane and intracellular compartments. As for specificity controls, we appreciate the suggestion to use transmembrane markers. To strengthen our conclusion, we have performed an additional experiment comparing the number of PLA puncta formed by the interaction of Ca<sub>V</sub>1.3 and BK channels with the number of PLA puncta formed by the interaction of Ca<sub>V</sub>1.3 channels and ryanodine receptors in INS-1 cells. As shown in the figure below, the number of interactions between Ca<sub>V</sub>1.3 and BK channels is significantly higher than that between Ca<sub>V</sub>1.3 and RyR<sub>2</sub>. Of note, RyR<sub>2</sub> is a protein resident of the ER. These results provide additional evidence of the existence of endogenous complex formation in INS-1 cells. We have added this figure as a supplement.

      (2) Ensemble assembly was also analysed using super-resolution (dSTORM) imaging in INS-1 cells. In these cells only 7.5% of BK and CaV particles (endogenous?) co-localise that was only marginally above chance based on scrambled images. More detailed quantification and validation of potential 'ensembles' needs to be made for example by exploring nearest neighbour characteristics (but see point 4 below) to define proportion of ensembles versus clusters of BK or Cav1.3 channels alone etc. For example, it is mentioned that a distribution of distances between BK and Cav is seen but data are not shown.

      We thank the reviewer for this comment. To address the request for more detailed quantification and validation of ensembles, we performed additional analyses:

      Proportion of ensembles vs isolated clusters: We quantified clusters within 200 nm and found that 37 ± 3% of BK clusters are near one or more CaV1.3 clusters, whereas 15 ± 2% of CaV1.3 clusters are near BK clusters. Figure 8– Supplementary 1A

      Distance distribution: As shown in Figure 8–Supplementary 1B, the nearestneighbor distance distribution for BK-to-CaV1.3 in INS-1 cells (magenta) is shifted toward shorter distances compared to randomized controls (gray), supporting preferential localization of BK–CaV1.3 hetero-clusters.

      Together, these analyses confirm that BK–CaV1.3 ensembles occur more frequently than expected by chance and exhibit an asymmetric organization favoring BK proximity to CaV1.3 in INS-1 cells. We have included these data and figures in the revised manuscript, as well as description in the Results section. 

      (3) The evidence that the intracellular ensemble formation is in large part driven by cotranslation, based on co-localisation of mRNAs using RNAscope, requires additional critical controls and analysis. The authors now include data of co-localised BK protein that is suggestive but does not show co-translation. Secondly, while they have improved the description of some controls mRNA co-localisation needs to be measured in both directions (eg BK - SCN9A as well as SCN9A to BK) especially if the mRNAs are expressed at very different levels. The relative expression levels need to be clearly defined in the paper. Authors also use a randomized image of BK mRNA to show specificity of co-localisation with Cav1.3 mRNA, however the mRNA distribution would not be expected to be random across the cell but constrained by ER morphology if cotranslated so using ER labelling as a mask would be useful?

      We thank the reviewer for these constructive suggestions. We measured mRNA colocalization in both directions as recommended. As shown in the figure below, colocalization between KCNMA1 and SCN9A transcripts was comparable in both directions, with no statistically significant difference, supporting the specificity of the observed associations. We decided not to add this to the original figure to keep the figure simple. 

      We agree that co-localization of BK protein with BK mRNA is not conclusive evidence of co-translation, and we do not intend to mislead readers in our conclusion. Consequently, we were careful in avoiding the use of co-translation in the result section and added the word “potential” when referring to co-translation in the Discussion section. We added a sentence in the discussion to caution our interpretation: “It is important to note that while our data suggest mRNA coordination, additional experiments are required to directly assess cotranslation.”

      Author response image 1.

      (4) The authors attempt to define if plasma membrane assemblies of BK and CaV occur soon after synthesis. However, because the expression of BK and CaV occur at different times after transient transfection of plasmids more definitive experiments are required. For example, using inducible constructs to allow precise and synchronised timing of transcription. This would also provide critical evidence that co-assembly occurs very early in synthesis pathways - ie detecting complexes at ER before any complexes 

      We appreciate the reviewer’s insightful suggestion regarding the use of inducible constructs to synchronize transcription timing. This is an excellent approach and would allow direct testing of whether co-assembly occurs early in the synthesis pathway, including detection of complexes at the ER prior to plasma membrane localization. These experiments are beyond the scope of the present work but represent an important direction for future studies.

      We have added the following sentence to the Discussion section (page 24) to highlight this idea. “Future experiments using inducible constructs to precisely control transcription timing will enable more precise quantification of heterocluster formation in the ER compartment prior to plasma membrane insertion and reduce the variability introduced by differences in expression timing after plasmid transfection.” 

      (5) While the authors have improved the definition of hetero-clusters etc it is still not clear in superesolution analysis, how they separate a BK tetramer from a cluster of BK tetramers with the monoclonal antibody employed ie each BK channel will have 4 binding sites (4 subunits in tetramer) whereas Cav1.3 has one binding site per channel. Thus, how do authors discriminate between a single BK tetramer (molecular cluster) with potential 4 antibodies bound compared to a cluster of 4 independent BK channels.

      We appreciate the reviewer’s thoughtful comment regarding the interpretation of super-resolution data. We agree that distinguishing a single BK tetramer from a cluster of multiple BK channels is challenging when using an antibody that can bind up to four sites per channel. To clarify, our analysis does not attempt to resolve individual subunits within a tetramer; rather, it focuses on the nanoscale spatial proximity of BK and Ca<sub>V</sub>1.3 signals.

      We want to note that this limitation applies only to the super-resolution maps in Figures 8C and 9D and does not affect Airyscan-based analyses or measurements of BK–Ca<sub>V</sub>1.3 proximity.

      To address how we might distinguish between a single BK tetramer and a cluster of multiple BK channels, we considered two contrasting scenarios. In the first case, we assume that all four α-subunits within a tetramer are labeled. Based on cryoEM structures, a BK tetramer measures approximately 13 nm × 13 nm (≈169 nm²). Adding two antibody layers (primary and secondary) would increase the footprint by ~14 nm in each direction, resulting in an estimated area of ~41 nm × 41 nm (≈1681 nm²). Under this assumption, particles smaller than ~1681 nm² would likely represent individual tetramers, whereas larger particles would correspond to clusters of multiple tetramers. 

      In the second scenario, we propose that steric constraints at the S9–S10 segment, where the antibody binds, limit labeling to a single antibody per tetramer. If true, the localization precision would approximate 14 nm × 14 nm—the combined size of the antibody complex and the channel—close to the resolution limit of the microscope. To test this, we performed a control experiment using two antibodies targeting the BK C-terminal domain, raised in different species and labeled with distinct fluorophores. Super-resolution imaging revealed that only ~12% of particles were colocalized, suggesting that most channels bind a single antibody.

      If multiple antibodies could bind each tetramer, we would expect much greater colocalization.

      Although these data are not included in the manuscript, we have added the following clarification to the Results section (page 19): “It is important to note that this technique does not allow us to distinguish between labeling of four BK αsubunits within a tetramer and labeling of multiple BK channel clusters. Hence, particles smaller than ~1680 nm² may represent either a single tetramer or a cluster. This limitation applies to Figures 8C and 9D and does not affect measurements of BK–Ca<sub>V</sub>1.3 proximity.”

      Author response image 2.

      (6) The post-hoc tests used for one way ANOVA and ANOVA statistics need to be defined throughout

      We thank the reviewer for highlighting the need for clarity regarding our statistical analyses. We have now specified the post-hoc tests used for all one-way ANOVA and ANOVA comparisons throughout the manuscript, and updated figure legends.

      Reviewer #3 (Public review):

      Summary:

      The authors present a clearly written and beautifully presented piece of work demonstrating clear evidence to support the idea that BK channels and Cav1.3 channels can co-assemble prior to their assertion in the plasma membrane.

      Strengths:

      The experimental records shown back up their hypotheses and the authors are to be congratulated for the large number of control experiments shown in the ms.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have sufficiently addressed the specific points previously raised and the manuscript has improved clarity in those aspects. My main concern, which still remains, is stated in the public review.

      Reviewer #3 (Recommendations for the authors):

      I am content that the authors have attempted to fully address my previous criticisms.

      I have only three suggestions

      (1) I think the word Homo-clusters at the bottom right of Figure 1 is erroneously included.

      We thank the reviewer for bringing this to our attention. The figure has been corrected accordingly.

      (2) The authors should, for completeness, to refer to the beta, gamma and LINGO subunit families in the Introduction and include appropriate references:

      Knaus, H. G., Folander, K., Garcia-Calvo, M., Garcia, M. L., Kaczorowski, G. J., Smith, M., & Swanson, R. (1994). Primary sequence and immunological characterization of betasubunit of high conductance Ca2+-activated K+ channel from smooth muscle. The Journal of Biological Chemistry, 269(25), 17274-17278.

      Brenner, R., Jegla, T. J., Wickenden, A., Liu, Y., & Aldrich, R. W. (2000a). Cloning and functional characterization of novel large conductance calcium-activated potassium channel beta subunits, hKCNMB3 and hKCNMB4. The Journal of Biological Chemistry, 275(9), 6453-6461.

      Yan, J & R.W. Aldrich. (2010) LRRC26 auxiliary protein allows BK channel activation at resting voltage without calcium. Nature. 466(7305):513-516

      Yan, J & R.W. Aldrich. (2012) BK potassium channel modulation by leucine-rich repeatcontaining proteins. Proceedings of the National Academy of Sciences 109(20):7917-22

      Dudem, S, Large RJ, Kulkarni S, McClafferty H, Tikhonova IG, Sergeant, GP, Thornbury, KD, Shipston, MJ, Perrino BA & Hollywood MA (2020). LINGO1 is a novel regulatory subunit of large conductance, Ca2+-activated potassium channels. Proceedings of the National Academy of Sciences 117 (4) 2194-2200

      Dudem, S., Boon, P. X., Mullins, N., McClafferty, H., Shipston, M. J., Wilkinson, R. D. A., Lobb, I., Sergeant, G. P., Thornbury, K. D., Tikhonova, I. G., & Hollywood, M. A. (2023). Oxidation modulates LINGO2-induced inactivation of large conductance, Ca2+-activated potassium channels. The Journal of Biological Chemistry, 299 (3) 102975.

      We agree with the reviewer’s suggestion and have revised the Introduction to include references to the beta, gamma, and LINGO subunit families. Appropriate citations have been added to ensure completeness and contextual relevance.

      Additionally, BK channels are modulated by auxiliary subunits, which fine-tune BK channel gating properties to adapt to different physiological conditions. The β, γ, and LINGO1 subunits each contribute distinct structural and regulatory features: β-subunits modulate Ca²⁺ sensitivity and can induce inactivation; γ-subunits shift voltage-dependent activation to more negative potentials; and LINGO1 reduces surface expression and promotes rapid inactivation (18-24). These interactions ensure precise control over channel activity, allowing BK channels to integrate voltage and calcium signals dynamically in various cell types.

      (3) I think it may be more appropriate to include the sentence "The probes against the mRNAs of interest and tested in this work were designed by Advanced Cell Diagnostics." (P16, right hand column, L12-14) in the appropriate section of the Methods, rather than in Results.

      We thank the reviewer for this helpful suggestion. In response, we have relocated the sentence to the appropriate section of the Methods, where it now appears with relevant context.

    1. eLife Assessment

      The authors studied cognitive control signals in the anterior cingulate cortex (ACC) while rats selected between small immediate and larger delayed rewards. The description of behavioral strategies related to value-tracking signals in ACC is potentially useful. The evidence in support of this finding is incomplete due to issues with the task design, analyses, and modeling.

    2. Reviewer #1 (Public review):

      Summary:

      Adult (4mo) rats were tasked to either press one lever for an immediate reward or another for a delayed reward. The task had an adjusting amount structure in which (1) the number of pellets provided on the immediate reward lever changed as a function of the decisions made, (2) rats were prevented from pressing the same lever three times in a row.

      While the authors have been very responsive to the reviews, and I appreciate that, unfortunately, the new analyses reported in this revision actually lead me to deeper concerns about the adequacy of the data to support the conclusions. In this revision, it has become clear that the conclusions are forced and not supported by the data. Alternative theories are not considered or presented. This revision has revealed deep problems with the task, the analyses, and the modeling.

      Data Weaknesses

      Most importantly, the inclusion of the task behavior data has revealed a deep problem with the entire structure of the data. As is obvious in Figure 1D, there is a slow learning effect that is changing over the sessions as the animals learn to stop taking the delayed outcome. Unfortunately, the 8s delays came *after* the 4s. The first 20 sessions contain 19 4s delays and 1 8s delay, while the last 20 sessions contain 14 8s delays and 6 4s delays. Given the changes across sessions, it is likely that a large part of the difference is due to across-session learning (which is never addressed or considered).

      These data are not shown by subject and I suspect that individual subjects did all 4s then all 8s and some subjects switched tasks at different times. If my suspicion is true, then any comparisons between the 4s and 8s conditions (which are a major part of the author's claims) may have nothing to do with the delays, but rather with increased experience on the task.

      Furthermore, the four "groups", which are still poorly defined, seem to have been assessed at a session-by-session level. So when did each animal fall into a given group? Why is Figure 1D not showing which session fell into which group and why are we not seeing each animal's progression? They also admit that animals used a mixture of strategies, which implies that the "group" assignment is an invalid analysis, as the groups do not accommodate strategy mixing.

      Figure 2 shows that none of the differences of the group behavior against random choice with a basic p(delay) are significant. The use a KS test to measure these differences. KS tests are notoriously sensitive as KS tests simply measure whether there are any statistical differences between two distributions. They do not report the full statistics for Figure 2, but only say that the 4HI group was not significant (KS p-value = 0.72) and the 8LO showed a p-value of 0.1 (which they interpret as significant). p=0.1 is not significant. They don't report the value of the 4LO or 8HI groups (why not?), but say they are in-between these two extremes. That means *none* of the differences are significant.

      They then test a model with additional parameters, and say that the model includes more than the minimal p_D parameter, but never report BIC or AIC model comparisons. In order to claim that the model is better than the bare p_D assumption, they should be reporting model-comparison statistics. But given that the p_D parameters are enough (q.v. Figure 2), this entire model seems unnecessary

      It took me a while to determine what was being shown in Figure 3, but I was eventually able to determine that 0 was the time after the animal made the choice to wait out the delay side, so the 4s in Figure 3A1 with high power in the low-frequency (<5 Hz) range is the waiting time. They don't show the full 8s time. Nor do they show the spectrograms separated by group (assuming that group is the analytical tool they are using). In B they show only show theta power, but it is unclear how to interpret these changes over time.

      In Figure 4, panel A is mostly useless because it is just five sample sessions showing firing rate plotted on the same panels as the immediate reward amount. If they want to claim correlation, they should show and test it. But moreover, this is not how neural data should be presented - we need to know what the cells are doing, population-wise. We need to have an understanding of the neural ensemble. These data are clearly being picked and chosen, which is not OK.

      Figure 4, panels B and C show that the activity trivially reflects the reward that has been delivered to the animal, if I am understanding the graphs correctly. (The authors do not interpret it this way, but the data is, to my eyes, clear.) The "immediate" signal shows up immediately at choice and reflects the size of the immediate reward (which is varying). The "delay" signal shows up after the delay and does not, which makes sense as the animals get 6 pellets on the delayed side no matter what. In fact, the max delayed side activity = the max immediate side activity, which is 6 pellets. This is just reward-related firing.

      Figure 5 is poorly laid out, switching the order in 5C to be 2 1 3 in E and F. (Why?!) The statistics for Figure 5 on page 17 should be asking whether there are differences between neuron types, not whether there is a choice x time interaction in a given neuron type. When I look at Figure 5F1-3, all three types look effectively similar with different levels of noise. It is unclear why they are doing this complicated PC analysis or what we should be drawing from it.

      Figure 6 mis-states pie charts as "total number" rather than proportions.

      Interpretation Weaknesses

      The separation of cognitive effort into "resource-based" and "resistance-based" seems artificial to me. I still do not understand why the ability to resist a choice does not also depend on resource or why using resources are not a form of resistance. Doesn't every action in the end depend on the resources one has available? And doesn't every use of a resource resist one option by taking another? Even if one buys these two separate cognitive control processes (which at this point in reading the revision, I do not), the paper starts from the assumption that a baseline probability of waiting out the delays is a "resistance-based cognitive control" (why?) and a probability of choice that takes into account the size of the immediate value (confusingly abbreviated as ival) is a "resource-based cognitive control" (again, why?)

    3. Reviewer #2 (Public review):

      Summary:

      I appreciate the considerable work the authors have done on the revision. The manuscript is markedly improved.

      Strengths still include the strong theoretical basis, well-done experiments, and clear links to LFP / spectral analyses that have links to human data. The task is now more clearly explained, and the neural correlates better articulated.

      Weaknesses:

      I had remaining questions, many related to my previous questions.<br /> (1) The results have some complexity, but I still had questions about which is resource and which is resistance based. The authors say in the last sentence of the discussion: "Prominent pre-choice theta power was associated with a behavioral strategy characterized by a strong bias towards a resistance-based strategy, whereas the neural signature of ival-tracking was associated with a strong bias towards a resource-based strategy.".<br /> I might suggest making this simpler and clear in the abstract and the first paragraph of the discussion. A simple statement like 'pre-choice theta was biased towards resistance whereas single neurons were biased towards resources" might make this idea come across?

      (2) I think most readers would like to see raw single trial LFP traces in Figure 3, single unit rasters in Figure 4, and spike-field records in Figure 5.

      (3) What limitations are there to this work? I wonder if readers might benefit from some contextualization - the sample size, heterogenous behavior - lack of cell-type specificity - using PC3 to define spectral relationships - I might suggest pointing these out.

      (4) I still wasn't sure what 4 Hz vs. theta 6-12 Hz meant - is it all based on PC3's pos/neg correlation? I wonder if showing a scatter plot with the y-axis being PC3 and the x-axis being theta 4 Hz power would help distinguish these? Is this the first time this sort of analysis has been done? If so, it requires clearer definitions.

    4. Reviewer #3 (Public review):

      Summary:

      The study investigated decision making in rats choosing between small immediate rewards and larger delayed rewards, in a task design where the size of the immediate rewards decreased when this option was chosen and increased when it was not chosen. The authors conceptualise this task as involving two different types of cognitive effort; 'resistance-based' effort putatively needed to resist the smaller immediate reward, and 'resource-based' effort needed to track the changing value of the immediate reward option. They argue based on analyses of the behaviour, and computational modelling, that rats use different strategies in different sessions, with one strategy in which they preferentially choose the delayed reward option irrespective of the current immediate reward size, and another strategy in which they preferentially choose the immediate reward option when the immediate reward size is large, and the delayed reward option when the immediate reward size is small. The authors recorded neural activity in anterior cingulate cortex. They propose that oscillatory activity in the 6-12Hz theta band occurs when subjects use a 'resistance-based' strategy of choosing the delayed option irrespective of the current value of the immediate reward option. They also examine neural representation of the current value of the immediate reward option, and suggest that this value is more strongly represented when subjects are using this value information to guide choice. They further argue that neurons whose activity is modulated by theta oscillations are less involved in tracking the value of the immediate reward option than neurons whose activity is not theta modulated. If solid, these findings will be of interest to researchers working on cognitive control and ACCs involvement in decision making. However, there are some issues with the modelling and analysis which preclude high confidence in the validity of the conclusions.

      Strengths:

      The behavioural task used is interesting and the recording methods used (64 channel silicon probes) should enable the collection of good quality single unit and LFP electrophysiology data. The authors recorded from a sizable sample of subjects for this type of study. The approach of splitting the data into sessions where subjects used different strategies and then examining the neural correlates of each is in principle interesting, though I have some reservations about the strength of evidence for the existence of multiple strategies.

      Limitations:

      The dataset is unbalanced in terms of both the number of sessions contributed by each subject, and their distribution across the different putative behavioural strategies (see Table 1), with some subjects contributing 7 sessions to a given strategy and others 0. Further, only 2 of 10 subjects contribute any sessions to one of the behavioural strategies (8LO), and a single subject contributes >50% of the sessions (7 of 13) sessions to another strategy (8HI). Apparent differences in brain activity between the strategies could therefore in fact reflect differences between subjects, which could arise due to e.g. differences in electrode placement. To make firm conclusions that neural activity is different in sessions where different strategies are thought to be employed, it would be necessary to account for potential cross-subject variation in the data. The current statistical methods don't appear to do this as they use within subject measures (e.g. trials or neurons) as the experimental unit and ignore which subject the neuron/trial came from.

      The starting point for the analysis was the splitting of sessions into 4 groups based on the duration of the delay (4 vs 8 seconds) and then clustering within each delay category into two sub-groups. It was not clear why 2 clusters per delay category were used, nor whether the data did in fact have a clear split into two distinct clusters or continuous variation across the population of sessions. The simplified RL model used in the revised manuscript (which is an improvement from that used in the previous version) could in principle help to quantify variation across the populations of sessions, by using model fitting and comparison methods to evaluate variation in strategy across subjects. However, as far as I could tell no model-fitting or comparison was performed, and the only attempt to link the model to data was by simulating data using a fixed probability of choosing the delayed lever (i.e. with no learning across trials) and comparing the distribution of total rewards obtained per session with that of the subjects in each group (Figure 2). Total reward per session is a very coarse behavioural metric and using likelihood-based methods to fit model parameters to subjects trial-by-trial choice data would provide a more sensitive way of using the modelling to assess behavioural strategy across sessions.

      Conceptually, it is not obvious that choices towards the delayed vs immediate lever reflect use of different strategies employing different types of cognitive effort. Rather these could reflect a single strategy which compares the estimated value of the two levers, with differences in behaviour between sessions accounted for either by differences in the task itself (between the 8s and 4s delay condition) or differences in the parameters of the strategy, such as the strength of temporal discounting.

      Even if one accepts the claim that the task recruits two distinct types of cognitive control, the argument that theta oscillations, which occur on delay choice trials in the 4s delay condition, are a correlate of a 'resistance-based' strategy (resisting the immediate reward), is hard to reconcile with the fact that theta oscillations do not occur on delay choice trials in the 8s delay condition (Figure 3). The authors note this discrepancy, but state that 'The reason was because these groups largely avoided the delayed lever (Figure 1) and thereby abandoned the need to implement resistance-based control altogether.' However, the data in Figure 1D show that even in the 8s condition the subjects choose the delayed lever on around 50% of trials. It is not obvious why choosing the delayed lever on 50% of trials in the 8s condition does not require 'resistance-based' cognitive effort, while choosing it in the 4s delay condition does.

      The other main claims regarding the neural data are that the neuronal representation of the value of the immediate reward lever (ival) is stronger in sessions where subjects are choosing that lever more often, particularly the 8LO group, and that neurons whose activity tracks ival are a different population from neurons whose activity is theta modulated. However, the analysis methods used to make these claims are rather convoluted and make it hard to assess the strength of the evidence for them.

      To evaluate the strength of ival representation in neural activity, the authors first fit a regression model predicting each neuron's activity at different timepoints as a function of behavioural variables including ival, which is a sensible first step. However, they then perform clustering on the regression coefficients and then plot neural activity only for the cluster which they state 'provided the clearest example of value tracking'. It is not clear how the clustering was done, whether there were in fact well defined clusters in the neural activity, how the clusters whose activity is plotted were chosen, nor the proportion of neurons in this cluster for each group of sessions. The analysis therefore provides only limited information about the strength of ival representation in different session groups. It would be useful to quantify the variance explained by ival in neural activity for each group of sessions using a simpler quantification of the regression analysis, such as cross-validated coefficient of partial determination.

      The analysis of how theta modulation related to representation of ival across neurons was also complicated and non-standard. To determine whether individual neurons were theta modulated, the authors did PCA on a matrix comprised of spike train autocorrelations for individual neurons, and then grouped neurons according to the projection of their autocorrelation function onto the 3rd Principal Component, on the basis that neurons with negative projection onto this component showed a peak roughly at theta frequency in the power spectrum of their autocorrelation. Even ignoring the fact that the peak in the power spectrum is broad and centred above the standard theta frequency (see figure 5B3), this is an arbitrary and unnecessarily complex way to determine if neurons are theta modulated. It would be much simpler and greatly preferable to either directly assess the modulation depth of individual neurons spike train autocorrelation in the theta band, or to use a metric of spike-LFP coupling in the theta band instead. The authors do include some analysis of spike field coherence in Figure 6 and this is a much more sensible approach. However, it is worth noting that the only session group which shows a difference in coherence at theta frequency relative to the other groups is 8LO, to which only 2 of 8 animals contribute any data and 70% of sessions come from one animal. It is therefore unclear whether differences in this group are due to differences in behavioural strategy, or reflect other sources of cross-animal variation.

    5. Author response:

      The following is the authors’ response to the current reviews.

      We would like to thank the reviewers for their efforts and feedback on our preprint. We have elected to rework the manuscript for publication in a different journal. In this process we will alter many of the approaches and re-evaluate the conclusions. With this, many of the points raised by the reviewers will be no longer relevant and therefore do not require a response. Again, we thank the reviewers for their time and helpful feedback.


      The following is the authors’ response to the original reviews.

      eLife Assessment:

      The authors present a potentially useful approach of broad interest arguing that anterior cingulate cortex (ACC) tracks option values in decisions involving delayed rewards. The authors introduce the idea of a resource-based cognitive effort signal in ACC ensembles and link ACC theta oscillations to a resistance-based strategy. The evidence supporting these new ideas is incomplete and would benefit from additional detail and more rigorous analyses and computational methods.

      We are extremely grateful for the several excellent and comments of the reviewers. To address these concerns, we have completely reworked the manuscript adding more rigorous approaches in each phase of the analysis and computational model. We realize that this has taken some time to prepare the revision. However, given the comments of the reviewers, we felt it necessary to thoroughly rework the paper based on their input. Here is a (nonexhaustive) overview of the major changes we made:

      We have developed a way to more adequately capture the heterogeneity in the behavior

      We have completely reworked the RL model

      We have added additional approaches and rigor to the analysis of the value-tracking signal. 

      Reviewer #1 (Public Review):

      Summary:

      Young (2.5 mo [adolescent]) rats were tasked to either press one lever for immediate reward or another for delayed reward. 

      Please note that at the time of testing and training that the rats were > 4 months old. 

      The task had a complex structure in which (1) the number of pellets provided on the immediate reward lever changed as a function of the decisions made, (2) rats were prevented from pressing the same lever three times in a row. Importantly, this task is very different from most intertemporal choice tasks which adjust delay (to the delayed lever), whereas this task held the delay constant and adjusted the number of 20 mg sucrose pellets provided on the immediate value lever.

      Several studies parametrically vary the immediate lever (PMID: 39119916, 31654652, 28000083, 26779747, 12270518, 19389183). While most versions of the task will yield qualitatively similar estimates of discounting, the adjusting amount is preferred as it provides the most consistent estimates (PMID: 22445576). More specifically this version of the task avoids contrast effects of that result from changing the delay during the session (PMID: 23963529, 24780379, 19730365, 35661751) which complicates value estimates. 

      Analyses are based on separating sessions into groups, but group membership includes arbitrary requirements and many sessions have been dropped from the analyses. 

      We have updated this approach and now provide a more comprehensive assessment of the behavior. The updated approach applies a hierarchical clustering model to the behavior in each session. This was applied at each delay to separate animals that prefer the immediate option more/less. This results in 4 statistically dissociable groups (4LO, 4HI, 8LO, 8HI) and includes all sessions. Please see Figure 1. 

      Computational modeling is based on an overly simple reinforcement learning model, as evidenced by fit parameters pegging to the extremes. 

      We have completely reworked the simulations in the revision. In the updated RL model we carefully add parameters to determine which are necessary to explain the experimental data. We feel that it is simplified yet more descriptive. Please see Figure 2 and associated text. 

      The neural analysis is overly complex and does not contain the necessary statistics to assess the validity of their claims.

      We have dramatically streamlined the spike train analysis approach and added several statistical tests to ensure the rigor of our results. Please see Figures 4,5,6 and associated text. 

      Strengths:

      The task is interesting.

      Thank you for the positive comment

      Weaknesses:

      Behavior:

      The basic behavioral results from this task are not presented. For example, "each recording session consisted of 40 choice trials or 45 minutes". What was the distribution of choices over sessions? Did that change between rats? Did that change between delays? Were there any sequence effects? (I recommend looking at reaction times.) Were there any effects of pressing a lever twice vs after a forced trial? 

      Please see the updated statistics and panels in Figures 1 and 2. We believe these address this valid concern.  

      This task has a very complicated sequential structure that I think I would be hard pressed to follow if I were performing this task. 

      Human tasks implement a similar task structure (PMID: 26779747). Please note the response above that outlines the benefits of using of this task.   

      Before diving into the complex analyses assuming reinforcement learning paradigms or cognitive control, I would have liked to have understood the basic behaviors the rats were taking. For example, what was the typical rate of lever pressing? If the rats are pressing 40 times in 45 minutes, does waiting 8s make a large difference?

      Thank you for this suggestion. Our additions to Figure 1 are intended to better explain and quantify the behavior of the animals. Note that this task is designed to hold the rate of reinforcement constant no matter the choices of the animals. Our analysis supports the long-held view in the literature that rats do not like waiting for rewards, even at small delays. Going from the 4 à 8 sec delay results in significantly more immediate choices, indicating that the rats will forgo waiting 8 sec for a larger reinforcer and take a smaller reinforcer at 4 sec.  

      For that matter, the reaction time from lever appearance to lever pressing would be very interesting (and important). Are they making a choice as soon as the levers appear? Are they leaning towards the delay side, but then give in and choose the immediate lever? What are the reaction time hazard distributions?

      This is an excellent suggestion, we have added a brief analysis of reaction times (Please see the section entitled “4 behavioral groups are observed across all sessions” in the Results). Please note that an analysis of the reaction times has been presented in a prior analysis of this data set (White et al., 2024). In addition, an analysis of reaction times in this task was performed in Linsenbardt et al. (2017). In short, animals tend to choose within 1 second of the lever appearing. In addition, our prior work shows that responses on the immediate lever tend to be slower, which we viewed as evidence of increased deliberation requirements (possibly required to integrate value signals).   

      It is not clear that the animals on this task were actually using cognitive control strategies on this task. One cannot assume from the task that cognitive control is key. The authors only consider a very limited number of potential behaviors (an overly simple RL model). On this task, there are a lot of potential behavioral strategies: "win-stay/lose-shift", "perseveration", "alternation", even "random choices" should be considered.

      The strategies the Reviewer mentioned are descriptors of the actual choices the rats made. For example, perseveration means the rat is choosing one of the levers at an excessively high rate whereas alternation means it is choosing the two levers more or less equally, independent of payouts. But the question we are interested in is why? We are arguing that the type of cognitive control determines the choice behavior, but cognitive control is an internal variable that guides behavior, rather than simply a descriptor of the behavior. For example, the animal opts to perseverate on the delayed lever because the cognitive control required to track ival is too high. We then searched the neural data for signatures of the two types of cognitive control.

      The delay lever was assigned to the "non-preferred side". How did side bias affect the decisions made?

      The side bias clearly does not impact performance as the animals prefer the delay lever at shorter delays, which works against this bias.  

      The analyses based on "group" are unjustified. The authors compare the proportion of delayed to immediate lever press choices on the non-forced trials and then did k-means clustering on this distribution. But the distribution itself was not shown, so it is unclear whether the "groups" were actually different. They used k=3, but do not describe how this arbitrary number was chosen. (Is 3 the optimal number of clusters to describe this distribution?) Moreover, they removed three group 1 sessions with an 8s delay and two group 2 sessions with a 4s delay, making all the group 1 sessions 4s delay sessions and all group 2 sessions 8s delay sessions. They then ignore group 3 completely. These analyses seem arbitrary and unnecessarily complex. I think they need to analyze the data by delay. (How do rats handle 4s delay sessions? How do rats handle 6s delay sessions? How do rats handle 8s delay sessions?). If they decide to analyze the data by strategy, then they should identify specific strategies, model those strategies, and do model comparison to identify the best explanatory strategy. Importantly, the groups were session-based, not rat based, suggesting that rats used different strategies based on the delay to the delayed lever.

      We have completely reworked our approach for capturing the heterogeneity in behavior. We have taken care to show more of the behavioral statistics that have gone into identifying each of the groups. All sessions are included in this analysis. As the reviewer suggests, we used the statistics from each of the behavioral groups to inform the RL model that explores neural signals that underly decisions in this task. We strongly disagree that groups should be rat and not session based as the behavior of the animal can, and does, change from day to day. This is important to consider when analyzing the neural data as rat-based groupings would ignore this potential source of variance. 

      The reinforcement learning model used was overly simple. In particular, the RL model assumes that the subjects understand the task structure, but we know that even humans have trouble following complex task structures. Moreover, we know that rodent decision-making depends on much more complex strategies (model-based decisions, multi-state decisions, rate-based decisions, etc). There are lots of other ways to encode these decision variables, such as softmax with an inverse temperature rather than epsilon-greedy. The RL model was stated as a given and not justified. As one critical example, the RL model fit to the data assumed a constant exponential discounting function, but it is well-established that all animals, including rodents, use hyperbolic discounting in intertemporal choice tasks. Presumably this changes dramatically the effect of 4s and 8s. As evidence that the RL model is incomplete, the parameters found for the two groups were extreme. (Alpha=1 implies no history and only reacting to the most recent event. Epsilon=0.4 in an epsilongreedy algorithm is a 40% chance of responding randomly.)

      While we agree that the approach was not fully justified, we do not agree that it was invalid. Simply stated, a softmax approach gives the best fit to the choice behavior, whereas our epsilon-greedy approach attempted to reproduce the choice behavior using a naïve agent that progressively learns the values of the two levers on a choice-by-choice basis. Nevertheless, we certainly appreciate that important insights can be gained by fitting a model to the data as suggested. We feel that the new modeling approach we have now implemented is optimal for the present purposes and it replaces the one used in the original manuscript.

      The authors do add a "dbias" (which is a preference for the delayed lever) term to the RL model, but note that it has to be maximal in the 4s condition to reproduce group 2 behavior, which means they are not doing reinforcement learning anymore, just choosing the delayed lever.

      The dbias term was dropped in the new model implementation

      Neurophysiology:

      The neurophysiology figures are unclear and mostly uninterpretable; they do not show variability, statistics or conclusive results.

      While the reviewer is justified in criticizing the clarity of the figures, the statement that “they do not show variability, statistics or conclusive results” is not correct. Each of the figures presented in the first draft of the manuscript, except Figure 3, are accompanied by statistics and measures of variability. Nonetheless we have updated each of the neurophysiology analyses. We hope that the reviewer will find our updates more rigorous and thorough.   

      As with the behavior, I would have liked to have seen more traditional neurophysiological analyses first. What do the cells respond to? How do the manifolds change aligned to the lever presses? Are those different between lever presses?

      We have added several figures that plot the mean +/- SEM of the neural activity (see Figures 4 and 5). Hopefully this provides a more intuitive picture of the changes in neural activity throughout the task.  

      Are there changes in cellular information (both at the individual and ensemble level) over time in the session? 

      We provide several analyses of how firing rate changes over trials in relation to ival over time and trials in the session. In addition, we describe how these signals change in each of the behavioral groups. 

      How do cellular responses differ during that delay while both levers are out, but the rats are not choosing the immediate lever?

      We were somewhat unclear about this suggestion as the delay follows the lever press. In addition, there is no delay after immediate presses 

      Figure 3, for example, claims that some of the principal components tracked the number of pellets on the immediate lever ("ival"), but they are just two curves. No statistics, controls, or justification for this is shown. BTW, on Figure 3, what is the event at 200s?

      This comment is no longer relevant based on the changes we’ve made to the manuscript. 

      I'm confused. On Figure 4, the number of trials seems to go up to 50, but in the methods, they say that rats received 40 trials or 45 minutes of experience.

      This comment is no longer relevant based on the changes we’ve made to the manuscript. 

      At the end of page 14, the authors state that the strength of the correlation did not differ by group and that this was "predicted" by the RL modeling, but this statement is nonsensical, given that the RL modeling did not fit the data well, depended on extreme values. Moreover, this claim is dependent on "not statistically detectable", which is, of course, not interpretable as "not different".

      This comment is no longer relevant based on the changes we’ve made to the manuscript. 

      There is an interesting result on page 16 that the increases in theta power were observed before a delayed lever press but not an immediate lever press, and then that the theta power declined after an immediate lever press. 

      Thank you for the positive comment. 

      These data are separated by session group (again group 1 is a subset of the 4s sessions, group 2 is a subset of the 8s sessions, and group 3 is ignored). I would much rather see these data analyzed by delay itself or by some sort of strategy fit across delays.

      Thank you for the excellent suggestion. Our new group assignments take delay into account. 

      That being said, I don't see how this description shows up in Figure 6. What does Figure 6 look like if you just separate the sessions by delay?

      We are unclear what the reviewer means by “this description”.  

      Discussion:

      Finally, it is unclear to what extent this task actually gets at the questions originally laid out in the goals and returned to in the discussion. The idea of cognitive effort is interesting, but there is no data presented that this task is cognitive at all. The idea of a resourced cognitive effort and a resistance cognitive effort is interesting, but presumably the way one overcomes resistance is through resourcelimited components, so it is unclear that these two cognitive effort strategies are different.

      The basis for the reviewers assertation that “the way one overcomes resistance is through resourcelimited components” is not clear. In the revised version, we have taken greater care to outline how each type of effort signal facilitates performance of the task and articulate these possibilities in our stochastic and RL models. We view the strong evidence for ival tracking presented herein as a critical component of resource based cognitive effort. 

      The authors state that "ival-tracking" (neurons and ensembles that presumably track the number of pellets being delivered on the immediate lever - a fancy name for "expectations") "taps into a resourced-based form of cognitive effort", but no evidence is actually provided that keeping track of the expectation of reward on the immediate lever depends on attention or mnemonic resources. They also state that a "dLP-biased strategy" (waiting out the delay) is a "resistance-based form of cognitive effort" but no evidence is made that going to the delayed side takes effort.

      We challenge the reviewers that assertation ival tracking is a “fancy name for expectations”. We make no claim about the prospective or retrospective nature of the signal. Clearly, expectations should be prospective and therefore different from ival tracking. Regarding the resistance signal: First, animals avoid the delay lever more often at the 8 sec delay (Figure 1). We have shown that increasing the delay systematically biases responses AWAY from the delay (Linsenbardt et al., 2017). This is consistent with a well-developed literature that rats and mice do not like waiting for delayed reinforcers. We contend that enduring something you don’t like takes effort. 

      The authors talk about theta synchrony, but never actually measure theta synchrony, particularly across structures such as amygdala or ventral hippocampus. The authors try to connect this to "the unpleasantness of the delay", but provide no measures of pleasantness or unpleasantness. They have no evidence that waiting out an 8s delay is unpleasant.

      We have added spike-field coherence to better contact the literature on synchrony. Note that we never refer to our results as “synchrony”. However, we would be remiss to not address the growing literature on theta synchrony in effort allocation. There is a well-developed literature that rats and mice do not like waiting for delayed reinforcers. If waiting out the delay was not pleasant then why do the animals forgo larger rewards to avoid it? 

      The authors hypothesize that the "ival-tracking signal" (the expectation of number of pellets on the immediate lever) "could simply reflect the emotional or autonomic response". Aside from the fact that no evidence for this is provided, if this were to be true, then, in what sense would any of these signals be related to cognitive control?

      This is proposed as an alternative explanation to the ival signal in the discussion. It was added as our due diligence. Emotional state could provide feedback to the currently implemented control mechanism. If waiting for reinforcement is too unpleasant this could drive them to ival tracking and choosing the immediate option more frequently. We provide this option only as a possibility, not a conclusion. We have clarified this in the revised text. Nevertheless, based on our review of the literature, autonomic tracking in some form, seems to be the most likely function of ACC (Seamans & Floresco 2022). While the reviewer may disagree with this, we feel it is at least as valid as all the complex, cognitively-based interpretations that commonly appear in the literature.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript explores the neuronal signals that underlie resistance vs resource-based models of cognitive effort. The authors use a delayed discounting task and computational models to explore these ideas. The authors find that the ACC strongly tracks value and time, which is consistent with prior work. Novel contributions include quantification of a resource-based control signal among ACC ensembles, and linking ACC theta oscillations to a resistance-based strategy.

      Strengths:

      The experiments and analyses are well done and have the potential to generate an elegant explanatory framework for ACC neuronal activity. The inclusion of local-field potential / spike-field analyses is particularly important because these can be measured in humans.

      Thank you for the endorsement of our work.

      Weaknesses:

      I had questions that might help me understand the task and details of neuronal analyses.

      (1) The abstract, discussion, and introduction set up an opposition between resource and resistancebased forms of cognitive effort. It's clear that the authors find evidence for each (ACC ensembles = resource, theta=resistance?) but I'm not sure where the data fall on this dichotomy.

      (a) An overall very simple schematic early in the paper (prior to the MCML model? or even the behavior) may help illustrate the main point.

      (b) In the intro, results, and discussion, it may help to relate each point to this dichotomy.

      (c) What would resource-based signals look like? What would resistance based signals look like? Is the main point that resistance-based strategies dominate when delays are short, but resource-based strategies dominate when delays are long?

      (d) I wonder if these strategies can be illustrated? Could these two measures (dLP vs ival tracking) be plotted on separate axes or extremes, and behavior, neuronal data, LFP, and spectral relationships be shown on these axes? I think Figure 2 is working towards this. Could these be shown for each delay length? This way, as the evidence from behavior, model, single neurons, ensembles, and theta is presented, it can be related to this framework, and the reader can organize the findings.

      These are excellent suggestions, and we have implemented them, where possible. 

      (2) The task is not clear to me.

      (a) I wonder if a task schematic and a flow chart of training would help readers.

      Yes, excellent idea, we have now included this in Figure 1. 

      (b) This task appears to be relatively new. Has it been used before in rats (Oberlin and Grahame is a mouse study)? Some history / context might help orient readers.

      Indeed, this task has been used in rats in several prior studies in rats. Please see the following references (PMID: 39119916, 31654652, 28000083, 26779747, 12270518, 19389183).

      (c) How many total sessions were completed with ascending delays? Was there criteria for surgeries? How many total recording sessions per animal (of the 54?)

      Please note that the delay does not change within a session. There were no criteria for surgery. 

      (d) How many trials completed per session (40 trials OR 45 minutes)? Where are there errors? These details are important for interpreting Figure 1.

      Every animal in this data set completed 40 trials and we have updated the task description to clarify this issue. There are no errors in this task, but rather the task is designed to the tendency to make an impulsive choice (smaller reward now). 

      (3) Figure 1 is unclear to me.

      (a) Delayed vs immediate lever presses are being plotted - but I am not sure what is red, and what is blue. I might suggest plotting each animal.

      We have updated Figure 1 considerably for clarity. 

      (b) How many animals and sessions go into each data point?

      We hope this is clarified now with our new group assignments as all sessions were included in the analysis. 

      (c) Table 1 (which might be better referenced in the paper) refers to rats by session. Is it true that some rats (2 and 8) were not analyzed for the bulk of the paper? Some rats appear to switch strategies, and some stay in one strategy. How many neurons come from each rat?

      We have updated Table 1 based on our new groupings. The rats that contribute the most sessions also tend to be represented across the behavioral groups therefore it is unlikely that effort allocation strategies across groupings are an esoteric feature of an animal. 

      (d) Task basics - RT, choice, accuracy, video stills - might help readers understand what is going into these plots

      (e) Does the animal move differently (i.e., RTs) in G1 vs. G2?

      Excellent suggestion. We have added more analysis of the task variables in the revision (e.g. RT, choice comparisons across delays, etc…)

      (4) I wasn't sure how clustered G1 vs. G2 vs G3 are. To make this argument, the raw data (or some axis of it) might help.

      (a) This is particularly important because G3 appears to be a mix of G1 and G2, although upon inspection, I'm not sure how different they really are

      (b) Was there some objective clustering criteria that defined the clusters?

      (c) Why discuss G3 at all? Can these sessions be removed from analysis?

      Based on our updates to the behavioral analysis these comments are no longer relevant. 

      (5) The same applies to neuronal analyses in Fig 3 and 4

      (a) What does a single neuron peri-event raster look like? I would include several of these.

      (b) What does PC1, 2 and 3 look like for G1, G2, and G3?

      (c) Certain PCs are selected, but I'm not sure how they were selected - was there a criteria used? How was the correlation between PCA and ival selected? What about PCs that don't correlate with ival?

      (d) If the authors are using PCA, then scree plots and PETHs might be useful, as well as comparisons to PCs from time-shuffled / randomized data.

      We hope that our reworking of the neural data analysis has clarified these issues. We now include several firing rate examples and aggregate data.   

      (6) I had questions about the spectral analysis

      (a) Theta has many definitions - why did the authors use 6-12 Hz? Does it come from the hippocampal literature, and is this the best definition of theta? What about other bands (delta - 1-4 Hz), theta (4-7 Hz); and beta - 13- 30 Hz? These bands are of particular importance because they have been associated with errors, dopamine, and are abnormal in schizophrenia and Parkinson's disease.

      This designation comes mainly from the hippocampal and ACC literature in rodents. In addition, this range best captured the peak in the power spectrum in our data. Note that we focus our analysis on theta give the literature regarding theta in the ACC as a correlate of cognitive controls (references in manuscript). We did interrogate other bands as a sanity check and the results were mostly limited to theta. Given the scope of our manuscript and the concerns raised regarding complexity we are concerned that adding frequency analyses beyond theta obfuscates the take home message.

      However, the spectrograms in Figure 3 show a range of frequencies and highlight the ones in the theta band as the most dynamic prior to the choice. 

      (b) Power spectra and time-frequency analyses may justify the authors focus. I would show these (yaxis - frequency, x-axis - time, z-axis, power).

      Thank you for the suggestion. We have added this to Figure 3.    

      (7) PC3 as an autocorrelation doesn't seem the to be right way to infer theta entrainment or spikefield relationships, as PCA can be vulnerable to phantom oscillations, and coherence can be transient. It is also difficult to compare to traditional measures of phase-locking. Why not simply use spike-field coherence? This is particularly important with reference to the human literature, which the authors invoke.

      Excellent suggestion. Note that PCA provided a way to classify neurons that exhibited peaks in the autocorrelation at theta frequencies. We have added spike-field coherence, and this analysis confirms the differences in theta entrainment of the spike trains across the behavioral groups. Please see Figure 6D.   

      Reviewer #3 (Public Review):

      Summary:

      The study investigated decision making in rats choosing between small immediate rewards and larger delayed rewards, in a task design where the size of the immediate rewards decreased when this option was chosen and increased when it was not chosen. The authors conceptualise this task as involving two different types of cognitive effort; 'resistance-based' effort putatively needed to resist the smaller immediate reward, and 'resource-based' effort needed to track the changing value of the immediate reward option. They argue based on analyses of the behaviour, and computational modelling, that rats use different strategies in different sessions, with one strategy in which they consistently choose the delayed reward option irrespective of the current immediate reward size, and another strategy in which they preferentially choose the immediate reward option when the immediate reward size is large, and the delayed reward option when the immediate reward size is small. The authors recorded neural activity in anterior cingulate cortex (ACC) and argue that ACC neurons track the value of the immediate reward option irrespective of the strategy the rats are using. They further argue that the strategy the rats are using modulates their estimated value of the immediate reward option, and that oscillatory activity in the 6-12Hz theta band occurs when subjects use the 'resistancebased' strategy of choosing the delayed option irrespective of the current value of the immediate reward option. If solid, these findings will be of interest to researchers working on cognitive control and ACCs involvement in decision making. However, there are some issues with the experiment design, reporting, modelling and analysis which currently preclude high confidence in the validity of the conclusions.

      Strengths:

      The behavioural task used is interesting and the recording methods should enable the collection of good quality single unit and LFP electrophysiology data. The authors recorded from a sizable sample of subjects for this type of study. The approach of splitting the data into sessions where subjects used different strategies and then examining the neural correlates of each is in principle interesting, though I have some reservations about the strength of evidence for the existence of multiple strategies.

      Thank you for the positive comments. 

      Weaknesses:

      The dataset is very unbalanced in terms of both the number of sessions contributed by each subject, and their distribution across the different putative behavioural strategies (see table 1), with some subjects contributing 9 or 10 sessions and others only one session, and it is not clear from the text why this is the case. Further, only 3 subjects contribute any sessions to one of the behavioural strategies, while 7 contribute data to the other such that apparent differences in brain activity between the two strategies could in fact reflect differences between subjects, which could arise due to e.g. differences in electrode placement. To firm up the conclusion that neural activity is different in sessions where different strategies are thought to be employed, it would be important to account for potential cross-subject variation in the data. The current statistical methods don't do this as they all assume fixed effects (e.g. using trials or neurons as the experimental unit and ignoring which subject the neuron/trial came from).

      In the revised manuscript we have updated the group assignments. We have improved our description of the logic and methods for employing these groupings as well. With this new approach, all sessions are now included in the analysis. The group assignments are made purely on the behavioral statistics of an animal in each session. We feel this approach is preferable to eliminating neurons or session with the goal of balancing them, which may introduce bias. Further, the rats that contribute the most sessions also tend to be represented across the behavioral groups therefore it is unlikely that effort allocation strategies across groupings are an esoteric feature of an animal. As neurons are randomly sampled from each animal on a given session, we feel that we’re justified in treating these as fixed effects.   

      It is not obvious that the differences in behaviour between the sessions characterised as using the 'G1' and 'G2' strategies actually imply the use of different strategies, because the behavioural task was different in these sessions, with a shorter wait (4 seconds vs 8 seconds) for the delayed reward in the G1 strategy sessions where the subjects consistently preferred the delayed reward irrespective of the current immediate reward size. Therefore the differences in behaviour could be driven by difference in the task (i.e. external world) rather than a difference in strategy (internal to the subject). It seems plausible that the higher value of the delayed reward option when the delay is shorter could account for the high probability of choosing this option irrespective of the current value of the immediate reward option, without appealing to the subjects using a different strategy.

      Further, even if the differences in behaviour do reflect different behavioural strategies, it is not obvious that these correspond to allocation of different types of cognitive effort. For example, subjects' failure to modify their choice probabilities to track the changing value of the immediate reward option might be due simply to valuing the delayed reward option higher, rather than not allocating cognitive effort to tracking immediate option value (indeed this is suggested by the neural data). Conversely, if the rats assign higher value to the delayed reward option in the G1 sessions, it is not obvious that choosing it requires overcoming 'resistance' through cognitive effort.

      The RL modelling used to characterise the subject's behavioural strategies made some unusual and arguably implausible assumptions:

      Thank you for the feedback, based on these comments (and those above) we have completely reworked the RL model. In addition, we’ve taken care to separate out the variables that correspond to a resistance- versus a resource-based signal. 

      There were also some issues with the analyses of neural data which preclude strong confidence in their conclusions:

      Figure 4I makes the striking claim that ACC neurons track the value of the immediately rewarding option equally accurately in sessions where two putative behavioural strategies were used, despite the behaviour being insensitive to this variable in the G1 strategy sessions. The analysis quantifies the strength of correlation between a component of the activity extracted using a decoding analysis and the value of the immediate reward option. However, as far as I could see this analysis was not done in a cross-validated manner (i.e. evaluating the correlation strength on test data that was not used for either training the MCML model or selecting which component to use for the correlation). As such, the chance level correlation will certainly be greater than 0, and it is not clear whether the observed correlations are greater than expected by chance.

      We have added more rigorous methods to assess the ival tracking signal (Figure 4 and 5). In addition, we’ve dropped the claim that ival tracking is the same across the behavioral groups. We suspect that this was an artifact of a suboptimal group assignment approach in the previous version. 

      An additional caveat with the claim that ACC is tracking the value of the immediate reward option is that this value likely correlates with other behavioural variables, notably the current choice and recent choice history, that may be encoded in ACC. Encoding analyses (e.g. using linear regression to predict neural activity from behavioural variables) could allow quantification of the variance in ACC activity uniquely explained by option values after controlling for possible influence of other variables such as choice history (e.g. using a coefficient of partial determination).

      We agree that the ival tracking signal may be influenced by other variables – especially ones that are not cognitive but rather more generated by the autonomic system. We have included a discussion of this possibility in the Discussion section. Our previous work has explored the role of choice history on neural activity, please see White et al., (2024). 

      Figure 5 argues that there are systematic differences in how ACC neurons represent the value of the immediate option (ival) in the G1 and G2 strategy sessions. This is interesting if true, but it appears possible that the effect is an artefact of the different distribution of option values between the two session types. Specifically, due to the way that ival is updated based on the subjects' choices, in G1 sessions where the subjects are mostly choosing the delayed option, ival will on average be higher than in G2 sessions where they are choosing the immediate option more often. The relative number of high, medium and low ival trials in the G1 and G2 sessions will therefore be different, which could drive systematic differences in the regression fit in the absence of real differences in the activity-value relationship. I have created an ipython notebook illustrating this, available at: https://notebooksharing.space/view/a3c4504aebe7ad3f075aafaabaf93102f2a28f8c189ab9176d48 07cf1565f4e3. To verify that this is not driving the effect it would be important to balance the number of trials at each ival level across sessions (e.g. by subsampling trials) before running the regression.

      This is an excellent point and lead us to abandon the linear regression-based approach to quantify differences in ival coding across behavioral groups.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This paper was extremely hard to read. In addition to the issues raised in the public review (overly complex and incomplete analyses), one of the hardest things to deal with was the writing.

      Thank you for the feedback. Hopefully we have addressed this with our thorough rewrite. 

      The presentation was extremely hard to follow. I had to read through it several times to figure out what the task was. It wasn't until I got to the RL model Figure 2A that I realized what was really going on with the task. I strongly recommend having an initial figure that lays out the actual task (without any RL or modeling assumptions) and identifies the multiple different kinds of sessions. What is the actual data you have to start with? That was very unclear.

      Excellent idea. We have implemented this in Figure 1.  

      Labeling session by "group" is very confusing. I think most readers take "group" as the group of subjects, but that's not what you mean at all. You mean some sessions were one way and some were another. (And, as I noted in the public review, you ignore many of the sessions, which I think is not OK.) I think a major rewrite would help a lot. Also, I don't think the group analysis is necessary at all. In the public review, I recommend doing the analyses very differently and more classically.

      We have updated the group assignments in a manner that is more intuitive, reflects the delays, and includes all sessions.  

      The paper is full of arbitrary abbreviations that are completely unnecessary. Every time I came to "ival", I had to translate that into "number of pellets delivered on the immediate lever" and every time I came to dLP, I had to translate that into "delayed lever press". Making the text shorter does not make the text easier to read. In general, I was taught that unless the abbreviation is the common term (such as "DNA" not "deoxyribonucleic acid"), you should never use an abbreviation. While there are some edge cases (ACC probably over "anterior cingulate cortex"), dLP, iLP, dLPs, iLPs, ival, are definitely way over the "don't do that" line.

      We completely agree here and apologize for the excessive use of abbreviations. We have removed nearly all of them

      The figures were incomplete, poorly labeled, and hard to read. A lot of figures were missing, for example

      Basic task structure

      Basic behavior on the task

      Scatter plot of the measures that you are clustering (lever press choice X number of pellets on the immediate lever, you can use color or multiple panels to indicate the delay to the delayed lever) Figure 3 is just a couple of examples. That isn't convincing at all.

      Figure 4 is missing labels. In Figure 4, I don't understand what you are trying to say.

      I don't see how the results on page 16 arise from Figure 6. I strongly recommend starting from the actual data and working your way to what it means rather than forcing this into this unreasonable "session group" analysis.

      We have completely reworked the Figures for clarity and content. 

      The statement that "no prior study has explored the cellular correlates of cognitive effort" is ludicrous and insulting. There are dozens of experiments looking at ACC in cognitive effort tasks, in humans, other primates, and rodents. There are many dozens of experiments looking at cellular correlates in intertemporal choice tasks, some with neural manipulations, some with ensemble recordings. There are many dozens of experiments looking at cellular relationships to waiting out a delay.

      We agree that our statement was extremely imprecise. We have updated this to say:  “Further, a role for theta oscillations in allocating physical effort has been identified. However, the cellular

      mechanisms within the ACC that control and deploy types of cognitive effort have not been identified.”

      Reviewer #2 (Recommendations For The Authors):

      In Figure 2, the panels below E and F are referred to as 'right' - but they are below? I would give them letters.

      I would make sure that animal #s, neuron #s, and LFP#s are clearly presented in the results and in each figure legend. This is important to follow the results throughout the manuscript.

      Some additional proofreading ('Fronotmedial') might help with clarity.

      Based on our updates, this is no longer relevant.  

      Reviewer #3 (Recommendations For The Authors):

      In addition to the suggestions above to address specific issues, it would be useful to report some additional information about aspects of the experiments and analyses:

      Specify how spike sorting was performed and what metrics were used to select well isolated single units.

      Done.

      Provide histology showing the recording locations for each subject.

      Histological assessments of electrodes placements are provided in White et al. 2024, but we provide an example placement. This has been added to the text. 

      Indicate the sequence of recording sessions that occurred for each subject, including for each session what delay duration was used and which dataset the session contributed to, and indicate when the neural probes were advanced between sessions.

      We feel that this adds complexity unnecessarily as we make no claims about holding units across sessions for differences in coding in the dorsoventral gradient of ACC. 

      Indicate the experimental unit when reporting uncertainty measures in figure legends (e.g. mean +/- SEM across sessions).

      Done.

    1. eLife Assessment

      This study investigates how the HIV inhibitor lenacapavir influences capsid mechanics and interactions with the nuclear pore complex. It provides important insights into how drug-induced hyperstabilization of the viral shell can compromise its structural integrity during nuclear entry. While the modeling is technically sophisticated and the results are promising, some mechanistic interpretations rely on assumptions embedded in the simulations, leaving parts of the evidence incomplete.

    2. Reviewer #1 (Public review):

      The paper from Hudait and Voth details a number of coarse-grained simulations as well as some experiments focused on the stability of HIV capsids in the presence of the drug lenacapavir. The authors find that LEN hyperstabilizes the capsid, making it fragile and prone to breaking inside the nuclear pore complex.

      I found the paper interesting. I have a few suggestions for clarification and/or improvement.

      (1) How directly comparable are the NPC-capsid and capsid-only simulations? A major result rests on the conclusion that the kinetics of rupture are faster inside the NPC, but are the numbers of LENs bound identical? Is the time really comparable, given that the simulations have different starting points? I'm not really doubting the result, but I think it could be made more rigorous/quantitative.

      (2) Related to the above, it is stated on page 12 that, based on the estimated free-energy barrier, pentamer dissociation should occur in ~10 us of CG time. But certainly, the simulations cover at least this length of time?

      (3) At first, I was surprised that even in a CG simulation, LEN would spontaneously bind to the correct site. But if I read the SI correctly, LEN was parameterized specifically to bind to hexamers and not pentamers. This is fine, but I think it's worth describing in the main text.

    3. Reviewer #2 (Public review):

      Here, Hudait et al. use CG modeling to investigate the mechanism by which lenacapavir (LEN) treats HIV capsids that dock to the nuclear pore complex (NPC). However, the manuscript fails to present meaningful findings that were previously unreported in the literature, and is thus of low impact. Many claims made in the manuscript are not substantiated by the presented data. Key mechanistic details that the work purports to reveal are artifacts of the parameterization choices or simulation/analysis design, with the simulations said to reveal details that they were specifically biased to reproduce. This makes the manuscript highly problematic, as its contributions to the literature would represent misconceptions based on oversights in modeling, and thus mislead future readers.

      (1) Considering the literature, it is unclear that the manuscript presents new scientific discoveries. The following are results from this paper that have been previously reported:

      (a) LEN-bound capsid can dock to the nuclear pore (Figure 2; see e.g. 10.1016/j.cell.2024.12.008 or 10.1128/mbio.03613-24).

      (b) NUP98 interacts with the docked capsid (Figure 2; see e.g. 10.1016/j.virol.2013.02.008 or 10.1038/s41586-023-06969-7 or 10.1016/j.cell.2024.12.008).

      (c) LEN and NUP98 compete for a binding interface (Figure 2; see e.g. 10.1126/science.abb4808 or 10.1371/journal.ppat.1004459).

      (d) LEN creates capsid defects (Figure 3 and 5, see e.g. 10.1073/pnas.2420497122).

      (e) RNP can emerge from a damaged capsid (Figure 3 and 5; see e.g. 10.1073/pnas.2117781119 or 10.7554/eLife.64776).

      (f) LEN hyperstabilizes/reduces the elasticity of the capsid lattice (Figure 6; see e.g. 10.1371/journal.ppat.1012537).

      (2) The mechanistic findings related to how these processes occur are problematic, either based on circular reasoning or unsubstantiated, based on the presented data. In some cases, features of parameterization and simulation/analysis design are erroneously interpreted as predictions by the CG models.

      (a) Claim: LEN-bound capsids remain associated with the NPC after rupture. CG simulations did not reach the timescale needed to demonstrate continued association or failure to translocate, leaving the claim unsubstantiated.

      (b) Claim: LEN contributes to loss of capsid elasticity. The authors do not measure elasticity here, only force constants of fluctuations between capsomers in freely diffusing capsids. Elasticity is defined as the ability of a material to undergo reversible deformation when subjected to stress. Other computational works that actually measure elasticity (e.g., 0.1371/journal.ppat.1012537) could represent a point of comparison, but are not cited. The changes in force constants in the presence of LEN are shown in Figure 6C, but the text of the scale bar legend and units of k are not legible, so one cannot discern the magnitude or significance of the change.

      (c) Claim: Capsid defects are formed along striated patterns of capsid disorder. Data is not presented that correlates defects/cracks with striations.

      (d) Claim: Typically 1-2 LEN, but rarely 3 bind per capsid hexamer. The authors state: "The magnitude of the attractive interactions was adjusted to capture the substoichiometric binding of LEN to CA hexamers (Faysal et al., 2024). ... We simulated LEN binding to the capsid cone (in the absence of NPC), which resulted in a substoichiometric binding (~1.5 LEN per CA hexamer), consistent with experimental data (Singh et al., 2024)." This means LEN was specifically parameterized to reproduce the 1-2 binding ratio per hexamer apparent from experiments, so this was a parameterization choice, not a prediction by CG simulations as the authors erroneously claim: "This indicates that the probability of binding a third LEN molecule to a CA hexamer is impeded, likely due to steric effects that prevent the approach of an incoming molecule to a CA hexamer where 2 LEN molecules are already associated. ... Approximately 20% of CA hexamers remain unoccupied despite the availability of a large excess of unbound LEN molecules. This suggests a heterogeneity in the molecular environment of the capsid lattice for LEN binding." These statements represent gross over-interpretation of a bias deliberately introduced during parameterization, and the "finding" represents circular reasoning. Also, if "steric effects" play any role, the authors could analyze the model to characterize and report them rather than simply speculate.

      (e) Claim: Competition between NUP98 and LEN regulates capsid docking. The authors state: "A fraction of LEN molecules bound at the narrow end dissociate to allow NUP98 binding to the capsid ... Therefore, LEN can inhibit the efficient binding of the viral cores to the NPC, resulting in an increased number of cores in the cytoplasm." Capsid docking occurs regardless of the presence of LEN, and appears to occur at the same rate as the LEN-free capsid presented in the authors' previous work (Hudait &Voth, 2024). The presented data simply show that there is a fluctuation of bound LEN, with about 10 fewer (<5%) bound at the end of the simulation than at the beginning, and the curve (Figure 2A) does not clearly correlate with increased NUP98 contact. In that case, no data is shown that connects LEN binding with the regulation of the docking process. Further, the two quoted statements contradict each other. The presented data appear to show that NUP outcompetes LEN binding, rather than LEN inhibiting NUP binding. The "Therefore" statement is an attempt to reconcile with experimental studies, but is not substantiated by the presented data.

      (f) Claim: LEN binding leads to spontaneous dissociation of pentamers. The CG simulation trajectories show pentamer dissociation. However, it is quite difficult to believe that a pentamer in the wide end of the capsid would dissociate and diffuse 100 nm away before a hexamer in the narrow end (previously between two pentamers and now only partially coordinated, also in a highly curved environment, and further under the force of the extruding RNA) would dissociate, as in Figure 2B. A more plausible explanation could be force balance between pent-hex versus hex-hex contacts, an aspect of CG parameterization. No further modeling is presented to explain the release of pentamers, and changes in pent-hex stiffness are not apparent in the force constant fluctuation analysis in Figure 6C.

      (g) Claim: WTMetaD simulations predict capsid rupture. The authors state: "In WTMetaD simulations, we used the mean coordination number (Figure S6) between CA proteins in pentamers and in hexamers as the reaction coordinate." This means that the coordination number, the number of pent-hex contacts, is the bias used to accelerate simulation sampling. Yet the authors then interpret a change in coordination number leading to capsid rupture as a discovery, representing a fundamental misuse of the WTMetaD method. Changes in coordination number cannot be claimed as an emergent property when they are in fact the applied bias, when the simulation forced them to sample such states. The bias must be orthogonal to the feature of interest for that feature to be discoverable. While the reported free energies are orthogonal to the reaction coordinate, the structural and stepwise-mechanism "findings" here represent circular reasoning.

      (3) Another major concern with this work is the excessive self-citation, and the conspicuous lack of engagement with similar computational modeling studies that investigate the HIV capsid and its interactions with LEN, capsid mechanical properties relevant to nuclear entry, and other capsid-NPC simulations (e.g., 10.1016/j.cell.2024.12.008 and 10.1371/journal.ppat.1012537). Other such studies available in the literature include examination of varying aspects of the system at both CG and all-atom levels of resolution, which could be highly complementary to the present work and, in many cases, lend support to the authors' claims rather than detract from them. The choice to omit relevant literature implies either a lack of perspective or a lack of collegiality, which the presentation of the work suffers from. Overall, it is essential to discuss findings in the context of competing studies to give readers an accurate view of the state of the field and how the present work fits into it. It is appropriate in a CG modeling study to discuss the potential weaknesses of the methodology, points of disagreement with alternative modeling studies, and any lack of correlation with a broader range of experimental work. Qualitative agreement with select experiments does not constitute model validation.

      (4) Other critiques, questions, concerns:

      (a) The first Results sub-heading presents "results", complete with several supplementary figures and a movie that are from a previous publication about the development of the HIV capsid-NPC model in the absence of LEN (Hudait &Voth, 2024). This information should be included as part of the introduction or an abbreviated main-text methods section rather than being included within Results as if it represents a newly reported advancement, as this could be misleading.

      (b) The authors say the unbiased simulations of capsid-NPC docking were run as two independent replicates, but results from only one trajectory are ever shown plotted over time. It is not mentioned if the time series data are averaged or smoothed, so what is the shadow in these plots (e.g., Figures 1,2, and Supplementary Figure 5)?

      (c) Why do the insets showing LEN binding in Figure 2A look so different from the models they are apparently zoomed in on? Both instances really look like they are taken from different simulation frames, rather than being a zoomed-in view.

      (d) What are the sudden jerks apparent in the SI movies? Perhaps this is related to the rate at which trajectory frames are saved, but occasionally, during the relatively smooth motion of the capsid-NPC complex, something dramatic happens all of a sudden in a frame. For example, significant and apparently instantaneous reorientation of the cone far beyond what preceding motions suggest is possible (SI movie 2, at timestamp 0.22), RNP extrusion suddenly in a single frame (SI movie 2, at timestamp 0.27), and simultaneous opening of all pentamers all at once starting in a single frame (SI movie 2, at timestamp 0.33). This almost makes the movie look generated from separate trajectories or discontinuous portions of the same trajectory. If movies have been edited for visual clarity (e.g., to skip over time when "nothing" is happening and focus on the exciting aspects), then the authors should state so in the captions.

      (e) Figure 3c presents a time series of the degree of defects at pent-hex and hex-hex interfaces, but I do not understand the normalization. The authors state, "we represented the defects as the number of under-coordinated CA monomers of the hexamers at the pentamer-hexamer-pentamer and hexamer-hexamer interface as N_Pen-Hex and N_Hex-Hex ... Note that in N_Pen-Hex and N_Hex-Hex are calculated by normalizing by the total number of CA pentamer (12) and hexamer rings (209) respectively." Shouldn't the number of uncoordinated monomers be normalized by the number of that type of monomer, rather than the number of capsomers/rings? E.g., 12*5 and 209*6, rather than 12 and 209?

      (f) The authors state that "Although high computational cost precluded us from continuing these CG MD simulations, we expect these defects at the hexamer-hexamer interface to propagate towards the high curvature ends of the capsid." The defects being reported are apparently propagating from (not towards) the high curvature ends of the capsid.

      (g) The first half of the paper uses the color orange in figures to indicate LEN, but the second half uses orange to indicate defects, and this could be confusing for some readers. Both LEN and "defects" are simply a cluster of spheres, so highlighted defects appear to represent LEN without careful reading of captions.

      (h) SI Figure S3 captions says "The CA monomers to which at least one LEN molecule is bound are shown in orange spheres. The CA monomers to which no LEN molecule is bound are shown in white spheres. " While in contradiction, the main-text Fig 2 says "The CA monomers to which at least one LEN molecule is bound are shown in white spheres. The CA monomers to which no LEN molecule is bound are shown in orange spheres. " One of these must be a typo.

      (i) The authors state that: "CG MD simulations and live-cell imaging demonstrate that LEN-treated capsids dock at the NPC and rupture at the narrow end when bound to the central channel and then remain associated to the NPC after rupture." However, the live cell imaging data do not show where rupture occurs, such that this statement is at least partially false. It is also unclear that CG simulations show that cores remain bound following rupture, given that simulations were not extended to the timescale needed to observe this, again rendering the statement partially false.

      (j) The authors state: "We previously demonstrated that the RNP complex inside the capsid contributes to internal mechanical strain on the lattice driven by CACTD-RNP interactions and condensation state of RNP complex (Hudait &Voth, 2024). " In that case, why do the present CG models detect no difference in results for condensed versus uncondensed RNP?

      (k) The authors state: "The distribution demonstrates that the binding of LEN to the distorted lattice sites is energetically favorable. Since LEN localizes at the hydrophobic pocket between two adjoining CA monomers, it is sterically favorable to accommodate the incoming molecule at a distorted lattice site. This can be attributed to the higher available void volume at the distorted lattice relative to an ordered lattice, the latter being tightly packed. This also allows the drug molecule to avoid the multitude of unfavorable CA-LEN interactions and establish the energetically favorable interactions leading to a successful binding event. " What multitude of unfavorable interactions are the authors referring to? Data is not presented to substantiate the claim of increased void volume between hexamers in the distorted lattice. Capsomer distortion is shown as a schematic in Figure 6A rather than in the context of the actual model.

      (l) The authors state that "These striated patterns also demonstrate deviations from ideal lattice packing. " What does ideal lattice packing mean in this context, where hexamers are in numerous unique environments in terms of curvature? What is the structural reference point?

      (m) If pentamer-hexamer interactions are weakened in the presence of LEN, why are differences at these interfaces not apparent in the Figure 6C data that shows stiffening of the interactions between capsomer subunits?

      (n) The authors state: "Lattice defects arising from the loss of pentamers and cracks along the weak points of the hexameric lattice drive the uncoating of the capsid." The word rupture or failure should be used here rather than uncoating; it is unclear that the authors are studying the true process of uncoating and whether the defects induced by LEN binding relate in any way to uncoating.

      (o) The authors state: "LEN-treated broken cores are stabilized by the interaction with the disordered FG-NUP98 mesh at the NPC." But no data is presented to demonstrate that capsid stability is increased by NUP98 interaction. In fact, the presented data could suggest the opposite since capsids in contact with NUP98 in the NPC appeared to rupture faster than freely diffusing capsids.

      (p) The authors state: "LEN binding stimulates similar changes in free capsids, but they occur with lower frequency on similar time scales, suggesting that the cores docked at the NPC are under increased stress, resulting in more frequent weakening of the hexamer-pentamer and hexamer-hexamer interactions, as well as more nucleation of defects at the hexamer-hexamer<br /> Interface. ... Our results suggest that in the presence of the LEN, capsid docking into the NPC central channel will increase stress, resulting in more frequent breaks in the capsid lattice compared to free capsids." The first is a run-on sentence. The results shown support that LEN stimulates changes in free capsids to happen faster, but not more frequently. The frequency with which an event occurs is separate from the speed with which the event occurs.

      (q) The authors state: "A possible mechanistic pathway of capsid disassembly can be that multiple pentamers are dissociated from the capsid sequentially, and the remaining hexameric lattice remains stabilized by bound LEN molecules for a time, before the structural integrity of the remaining lattice is compromised." This statement is inconsistent with experimental studies that say LEN does not lead to capsid disassembly, and may even prevent disassembly as part of its disruption of proper uncoating (e.g., 10.1073/pnas.2420497122 previously published by the authors).

      (r) Finally, it remains a concern with the authors' work that the bottom-up solvent-free CG modeling software used in this and supporting works is not open source or even available to other researchers like other commonly used molecular dynamics software packages, raising significant questions about transparency and reproducibility.

    4. Author response:

      Before providing a brief provisional response to the two reviews, it is important to reiterate a few key points about our work. First, our paper is largely a computational biophysics paper, augmented by experimental results. Generally speaking, computational biophysics work intends to achieve one of two things (or both). One is to provide more molecular level insight into various behaviors of biomolecular systems that have not been (or cannot be) provided by qualitative experimental results alone. The second general goal of computational biophysics it to formulate new hypotheses to be tested subsequently by experiment. In our paper, we have achieved both of these goals and then confirmed the key computational results by experiment..

      The first reviewer has some valuable points, which can be addressed as follows (and will be emphasized in the revised version of the paper): (1) Yes the simulations of capsid rupture in the NPC and capsid-only are directly comparable as both have approximately the same number of bound LEN, as determined by following the LEN-capsid interaction protocol described in the main text (around Fig 6) and in the SI section S3; (2) While we have stressed this point in several places in the manuscript, here again we stress that coarse-grained (CG) MD time is not the same as real time. The point of CG simulations is to accelerate the timescale of the MD and the associated sampling, so the CG “time” from the MD integrator needs to be rescaled to associate a real time to it. As such, our CG simulation is not representing a microsecond of real time but rather something much longer. We will emphasize this again in the revised text. (3) Actually, we think that the parameterization of the LEN model and the LEN-capsid interactions is well described in the text associated with Fig 6 and in SI section S3. It is true that this one part of the CG model was parameterized “top-down” given the good experimental structures of bound LEN to capsid and other data, but the rest of the CG model is “bottom-up” (meaning developed from well-defined coarse-graining statistical mechanics as applied to molecular level structures and interactions, see also below). 

      As for the second reviewer, this review is quite problematic in our view as the reviewer seems to think that quoting a number of qualitative experimental results is sufficient to undermine the impact of our paper (they are not) and, furthermore, the reviewer appears to have a very minimal understanding of “bottom-up” CG modeling, which we have utilized. This modeling does not in fact rely on the “assumptions” this reviewer alleges we have relied on. (As an aside, it could be helpful for this reviewer to study the review by Jin et al, https://doi.org/10.1021/acs.jctc.2c00643) in order to become more familiar with the field and our approach before criticizing it.) We also note that our main HIV capsid-NPC docking model is already published in PNAS (https://doi.org/10.1073/pnas.2313737121), where it underwent rigorous peer review. In our forthcoming full response to the reviews and in the revised paper we will attempt to address a number of this reviewers comments, but the number, extent, and tone of this collection of criticisms, for us, calls into question the objectivity of this reviewer, not to mention the reviewer’s rather weak understanding of what we have done and how we have done it.

      Finally, while we certainly appreciate the overall positive eLife assessment, we are disappointed by the statement “some mechanistic interpretations rely on assumptions embedded in the simulations, leaving parts of the evidence incomplete”. Of course, all simulations (and experiments) rely on certain assumptions, but we have gone to great length to provide a “bottomup” approach to our modeling, based on underlying molecular level structures and interactions, and we have provided experimental validation of the main simulation predictions. It seems that the comments of the second reviewer may have influenced this point of view, but we do not feel it is justified.

    1. eLife Assessment

      This study offers valuable insights into the anatomical and physiological features of cold-selective lamina I spinal projection neurons. The evidence supporting the authors' claims is compelling, although including a larger sample size and more quantification would have strengthened the study further, and the claims of monosynaptic connectivity would benefit from being stated more cautiously. The work will interest those in the field of somatosensory biology, especially researchers studying spinal cord dorsal horn circuits and projection neuron cell types.

    2. Reviewer #1 (Public review):

      Summary:

      Spinal projection neurons in the anterolateral tract transmit diverse somatosensory signals to the brain, including touch, temperature, itch, and pain. This group of spinal projection neurons is heterogeneous in their molecular identities, projection targets in the brain, and response properties. While most anterolateral tract projection neurons are multimodal (responding to more than one somatosensory modality), it has been shown that cold-selective projection neurons exist in lamina I of the spinal cord dorsal horn. Using a combination of anatomical and physiological approaches, the authors discovered that the cold-selective lamina I projection neurons are heavily innervated by Trpm8+ sensory neuron axons, with calb1+ spinal projection neurons primarily capturing these cold-selective lamina I projection neurons. These neurons project to specific brain targets, including the PBNrel and cPAG. This study adds to the ongoing effort in the field to identify and characterize spinal projection neuron subtypes, their physiology, and functions.

      Strengths:

      (1) The combination of anatomical and physiological analyses is powerful and offers a comprehensive understanding of the cold-selective lamina I projection neurons in the spinal cord dorsal horn. For example, the authors used detailed anatomical methods, including EM imaging of Trpm8+ axon terminals contacting the Phox2a+ lamina I projection neurons. Additionally, they recorded stimulus-evoked activity in Trpm8-recipient neurons, carefully selected by visual confirmation of tdTomato and GFP juxtaposition, which is technically challenging.

      (2) This study identifies, for the first time, a molecular marker (calb1) that labels cold-selective lamina I projection neurons. Although calb1+ projection neurons are not entirely specific to cold-selective neurons, using an intersectional strategy combined with other genes enriched in this ALS group or cold-induced FosTRAP may further enhance specificity in the future.

      (3) This study shows that cold-selective lamina I projection neurons specifically innervate certain brain targets of the anterolateral tract, including the NTS, PBNrel, and cPAG. This connectivity provides insights into the role of these neurons in cold sensation, which will be an exciting area for future research.

      Weaknesses:

      (1) The sample size for the ex vivo electrophysiology is small. Given the difficulty and complexity of the preparation, this is understandable. However, a larger sample size would have strengthened the authors' conclusions.

      (2) The authors used tdTomato expression to identify brain targets innervated by these cold-selective lamina I projection neurons. Since tdTomato is a soluble fluorescent protein that fills the entire cell, using synaptophysin reporters (e.g., synaptophysin-GFP) would have been more convincing in revealing the synaptic targets of these projection neurons.

      (3) The summary cartoon shown in Figure 7 can be misleading because this study did not determine whether these cold-selective lamina I projection neurons have collateral branches to multiple brain targets or if there are anatomical subtypes that may project exclusively to specific targets. For example, a recent study (Ding et al., Neuron, 2025) demonstrated that there are PBN-projecting spinal neurons that do not project to other rostral brain areas. Furthermore, based on the authors' bulk labeling experiments, the three main brain targets are NTS, PBNrel, and cPAG. The VPL projection is very sparse and almost negligible.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors took advantage of a semi-intact ex vivo somatosensory preparation that includes hindlimb skin to characterize the response of projection neurons in the dorsal horn of the spinal cord to peripheral stimulation, including cold thermal stimuli. The main aim was to characterize the connectivity between peripheral afferents expressing the cold-sensing receptor TRPM8 and a set of genetically tagged neurons of the anterolateral system (ALS). These ALS neurons expressed high levels of the calcium-binding protein calbindin 1.

      In addition, combining different viral tracing methods, the authors could identify the anatomical targets of this specific subset of projection neurons within the brainstem and diencephalon.

      Strengths:

      The use of a relatively new (seldom used previously) transgenic line to label TRPM8-expressing afferents, combined with the genetic characterization of a previously identified subset of projection neurons, adds a specificity to the characterization. The transgenic line appears to capture well the subpopulation of Trpm8-expressing neurons

      In addition, the use of electron microscopy techniques makes the interpretation of the structural contacts more compelling.

      The writing is clear, and the presentation of findings follows a logical flow.

      Overall, this study provides solid, novel information about the brain circuits involved in cold thermosensation.

      Weaknesses:

      In the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recorded neurons is relatively low. In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the connectivity.

      The authors could provide some sense of the effort needed to record from the 6 cold-activated neurons described. How many preparations were needed, etc?

    1. eLife Assessment

      This important work advances our understanding of the single neuron coding types in the mouse gustatory cortex and the functional roles of these neurons for perceptual decision-making. The conclusions are based on compelling evidence from rigorous behavioral experiments, high-density electrophysiology, sophisticated data analysis, and neural network modeling with in silico perturbations of functionally-identified units. This work will be of broad interest to systems neuroscientists.

    2. Reviewer #1 (Public review):

      This manuscript provides several important findings that advance our current knowledge about the function of the gustatory cortex (GC). The authors used high-density electrophysiology to record neural activity during a sucrose/NaCl mixture discrimination task. They observed population-based activity capable of representing different mixtures in a linear fashion during the initial stimulus sampling period, as well as representing the behavioral decision (i.e., lick left or right) at a later time point. Analyzing this data at the single neuron level, they observed functional subpopulations capable of encoding the specific mixture (e.g., 45/55), tastant (e.g., sucrose), and behavioral choice (e.g., lick left). To test the functional consequences of these subpopulations, they built a recurrent neural network model in order to "silence" specific functional subpopulations of GC neurons. The virtual ablation of these functional subpopulations altered virtual behavioral performance in a manner predicted by the subpopulation's presumed contribution.

      Strengths:

      Building a recurrent neural network model of the gustatory cortex allows the impact of the temporal sequence of functionally identifiable populations of neurons to be tested in a manner not otherwise possible. Specifically, the author's model links neural activity at the single neuron and population level with perceptual ability. The electrophysiology methods and analyses used to shape the network model are appropriate. Overall, the conclusions of the manuscript are well supported.

      Weaknesses:

      One potential concern is the apparent mismatch between the neural and behavioral data. Neural analyses indicate a clear separation of the activity associated with each mixture that is independent of the animal's ultimate choice. This would seemingly indicate that the animals are making errors despite correctly encoding the stimulus. Based solely on the neural data, one would expect the psychometric curve to be more "step-like" with a significantly steeper slope. One potential explanation for this observation is the concentration of the stimuli utilized in the mixture discrimination task. The authors utilize equivalent concentrations, rather than intensity-matched concentrations. In this case, a single stimulus can (theoretically) dominate the perception of a mixture, resulting in a biased behavioral response despite accurate concentration coding at the single neuron level. Given the difficulty of isointensity matching concentrations, this concern is not paramount. However, the apparent mismatch between the neural and behavioral data should be acknowledged/addressed in the text.

    3. Reviewer #2 (Public review):

      Lang et al. investigate the contribution of individual neuronal encoding of specific task features to population dynamics and behavior. Using a taste-based decision-making behavioral task with electrophysiology from the mouse gustatory cortex and computational modeling, the authors reveal that neurons encoding sensory, perceptual, and decision-related information with linear and categorical patterns are essential for driving neural population dynamics and behavioral performance. Their findings suggest that individual linear and categorical coding units have a significant role in cortical dynamics and perceptual decision-making behavior.

      Overall, the experimental and analytical work is of very high quality, and the findings are of great interest to the taste coding field, as well as to the broader systems neuroscience field.

      I have a couple of suggestions to further enhance the authors' important conclusions:

      My main comment is the distinction between constrained and unconstrained units. The authors train a small percentage of units to match the real neural data (constrained units), and then find some unconstrained units that are similar to the real neural data and some that are not. As far as I could tell, the relative fraction of constrained and unconstrained units in the trained RNN is not reported; I assume the constrained ones are a much smaller population, but this is unclear. The selection of different groups of neurons for the RNN ablation experiments appears to be based on their response profiles only. Therefore, if I understood correctly, both constrained and unconstrained units and ablated together for a given response category (e.g., linear or step-perception). It would be useful, therefore, to separately compare the effects of constrained vs. unconstrained RNN units.

      Specifically:

      (1) For the analyses in the initial version of the manuscript, the authors should specify how many units in each ablation category are constrained and unconstrained.

      (2) The authors should repeat Figure 6, but only for unconstrained units to test how much of the effects in the initial version of Figure 6 are driven by constrained vs. unconstrained RNN units.

      (3) The authors should repeat Figure 7, but performing ablations separately on the constrained and unconstrained units to examine how the network behaves in each case and the resulting "behavioral" effect.

    4. Reviewer #3 (Public review):

      Primary taste cortex neurons show a variety of dynamic response profiles during taste decision-making tasks, reflecting both sensory and decision variables. In the present study, Lang et al. set out to determine how neurons with distinct response profiles contribute to perceptual decisions about taste stimuli.

      The methods, with reference to the behavioral task and electrophysiological recordings/data analysis, are straightforward, solid, and appropriate. The computational model is presented in a clear and conceptually intuitive manner, although the details are outside of my area of expertise.

      The experimental design features a simple 2-alternative forced-choice design that yielded clear psychometric curves across a range of stimuli. In vivo recordings were performed using Neuropixels and yielded an appropriate sample of single neuron responses. The strength of the model lies in the fact that it consists of single neurons whose response profiles mimic those recorded in vivo, and allows neuron-selective manipulation.

      By virtually lesioning specific subsets of neurons in the network, the authors demonstrate that a relatively small population of neurons with specific tuning profiles was sufficient to produce the observed neural dynamics and behavioral responses. This effect was selective as lesioning other responsive neurons did not affect overall response dynamics or performance.

      These findings provide new insight into the relation between the response profiles of single neurons in sensory cortex, their population-level activity dynamics, and the perceptual decisions they inform.

      The approach is particularly innovative as it uses computational modeling to target functionally-defined "cell types", which cannot necessarily be targeted by more conventional genetic approaches.

    5. Author response:

      Reviewer #1 (Public review):

      This manuscript provides several important findings that advance our current knowledge about the function of the gustatory cortex (GC). The authors used high-density electrophysiology to record neural activity during a sucrose/NaCl mixture discrimination task. They observed population-based activity capable of representing different mixtures in a linear fashion during the initial stimulus sampling period, as well as representing the behavioral decision (i.e., lick left or right) at a later time point. Analyzing this data at the single neuron level, they observed functional subpopulations capable of encoding the specific mixture (e.g., 45/55), tastant (e.g., sucrose), and behavioral choice (e.g., lick left). To test the functional consequences of these subpopulations, they built a recurrent neural network model in order to "silence" specific functional subpopulations of GC neurons. The virtual ablation of these functional subpopulations altered virtual behavioral performance in a manner predicted by the subpopulation's presumed contribution.

      Strengths:

      Building a recurrent neural network model of the gustatory cortex allows the impact of the temporal sequence of functionally identifiable populations of neurons to be tested in a manner not otherwise possible. Specifically, the author's model links neural activity at the single neuron and population level with perceptual ability. The electrophysiology methods and analyses used to shape the network model are appropriate. Overall, the conclusions of the manuscript are well supported.

      Weaknesses:

      One potential concern is the apparent mismatch between the neural and behavioral data. Neural analyses indicate a clear separation of the activity associated with each mixture that is independent of the animal's ultimate choice. This would seemingly indicate that the animals are making errors despite correctly encoding the stimulus. Based solely on the neural data, one would expect the psychometric curve to be more "step-like" with a significantly steeper slope. One potential explanation for this observation is the concentration of the stimuli utilized in the mixture discrimination task. The authors utilize equivalent concentrations, rather than intensity-matched concentrations. In this case, a single stimulus can (theoretically) dominate the perception of a mixture, resulting in a biased behavioral response despite accurate concentration coding at the single neuron level. Given the difficulty of isointensity matching concentrations, this concern is not paramount. However, the apparent mismatch between the neural and behavioral data should be acknowledged/addressed in the text.

      We thank the Reviewer for the insightful comments and thoughtful suggestions. Our electrophysiological recordings show that GC dynamically encodes stimulus concentration of mixture elements, dominant perceptual quality, and decisions of directional lick. With regard to the encoding of mixtures, the clear separation of activity associated with each mixture (Figure 3) is present at a trial-averaged pseudo-population level, and average activities associated with more similar, intermediate mixtures are closer to each other in this space. In fact, at a single trial level activity evoked by similar, intermediate mixtures can be hard to separate. This increased similarity can lead to behavioral errors resulting from either incorrect encoding of the stimulus or from the inability to interpret the stimuli to guide the correct decision.

      The psychometric function, which shows that more distinct stimuli (100/0 vs 0/100) lead to fewer mistakes than more ambiguous, intermediate mixtures (55/45 vs 55/45), is consistent with the increased ambiguity of responses to intermediate mixtures and with the possibility that, compared to pure stimuli, intermediate mixtures lead to more trials in which the binary choice component of neural activity is inverted, resulting in more directional errors.

      The Reviewer is correct that there could be a slight mismatch in the perceived intensity of the mixture components. This mismatch could be the reason for the slight asymmetry in our psychometric function (Figure 1B). However, it is not uncommon for mice in these 2AC tasks to also have a motor laterality bias in their responses that manifests itself for the more ambiguous stimuli. We chose not to model this bias given its subtlety and its unknown origin. Rather, we chose to model an ideal scenario in which stimuli have matched intensity and no motor bias exists. In the revised version we will discuss this issue.

      Reviewer #2 (Public review):

      Lang et al. investigate the contribution of individual neuronal encoding of specific task features to population dynamics and behavior. Using a taste-based decision-making behavioral task with electrophysiology from the mouse gustatory cortex and computational modeling, the authors reveal that neurons encoding sensory, perceptual, and decision-related information with linear and categorical patterns are essential for driving neural population dynamics and behavioral performance. Their findings suggest that individual linear and categorical coding units have a significant role in cortical dynamics and perceptual decision-making behavior.

      Overall, the experimental and analytical work is of very high quality, and the findings are of great interest to the taste coding field, as well as to the broader systems neuroscience field.

      I have a couple of suggestions to further enhance the authors' important conclusions:

      My main comment is the distinction between constrained and unconstrained units. The authors train a small percentage of units to match the real neural data (constrained units), and then find some unconstrained units that are similar to the real neural data and some that are not. As far as I could tell, the relative fraction of constrained and unconstrained units in the trained RNN is not reported; I assume the constrained ones are a much smaller population, but this is unclear. The selection of different groups of neurons for the RNN ablation experiments appears to be based on their response profiles only. Therefore, if I understood correctly, both constrained and unconstrained units and ablated together for a given response category (e.g., linear or step-perception). It would be useful, therefore, to separately compare the effects of constrained vs. unconstrained RNN units.

      We thank the Reviewer for the constructive feedback and are pleased that the work is considered of broad interest. The Reviewer is correct that ablations were carried out with respect to response categories only and included both constrained and unconstrained units.

      The ratio of total units to constrained units is fixed at 5.88, thus constrained units are ~17% of the network and unconstrained units are ~83%. This value is specified in the Methods (RNN: Components and dynamics), but we will report it in the Results of the revised manuscript as well for clarity.

      Specifically:

      (1) For the analyses in the initial version of the manuscript, the authors should specify how many units in each ablation category are constrained and unconstrained.

      In the revised manuscript, we will specify the fractions of constrained and unconstrained units within each response category. For convenience, they are reported here: Linear = 194 constrained and 691 unconstrained units; Step-perception = 147 constrained and 840 unconstrained units; Step-choice = 129 constrained and 814 unconstrained units; Other = 353 constrained and 1739 unconstrained units.

      (2) The authors should repeat Figure 6, but only for unconstrained units to test how much of the effects in the initial version of Figure 6 are driven by constrained vs. unconstrained RNN units.

      In the revised version we will add a Supplemental Figure in which the contribution of constrained vs unconstrained units is addressed.

      (3) The authors should repeat Figure 7, but performing ablations separately on the constrained and unconstrained units to examine how the network behaves in each case and the resulting "behavioral" effect.

      The revised version will include a Supplemental Figure with these simulations.

      Reviewer #3 (Public review):

      Primary taste cortex neurons show a variety of dynamic response profiles during taste decision-making tasks, reflecting both sensory and decision variables. In the present study, Lang et al. set out to determine how neurons with distinct response profiles contribute to perceptual decisions about taste stimuli.

      The methods,with reference to the behavioral task and electrophysiological recordings/data analysis, are straightforward, solid, and appropriate. The computational model is presented in a clear and conceptually intuitive manner, although the details are outside of my area of expertise.

      The experimental design features a simple 2-alternative forced-choice design that yielded clear psychometric curves across a range of stimuli. In vivo recordings were performed using Neuropixels and yielded an appropriate sample of single neuron responses. The strength of the model lies in the fact that it consists of single neurons whose response profiles mimic those recorded in vivo, and allows neuron-selective manipulation.By virtually lesioning specific subsets of neurons in the network, the authors demonstrate that a relatively small population of neurons with specific tuning profiles was sufficient to produce the observed neural dynamics and behavioral responses. This effect was selective as lesioning other responsive neurons did not affect overall response dynamics or performance.These findings provide new insight into the relation between the response profiles of single neurons in sensory cortex, their population-level activity dynamics, and the perceptual decisions they inform.

      The approach is particularly innovative as it uses computational modeling to target functionally-defined "cell types", which cannot necessarily be targeted by more conventional genetic approaches.

      We thank the Reviewer for the positive assessment of our study.

    1. eLife Assessment

      This valuable study leverages a large global dataset of tens of thousands of tuberculosis samples to place recurrent protein-coding mutations into their three-dimensional structural context, offering an expanded view of how antibiotic resistance emerges compared to traditional genetic analyses alone. The strength of evidence is convincing, supported by the scale and breadth of the dataset and the systematic structural analysis, although some of the assumptions made in the the modeling approach are only partially supported. Overall, the work will be of broad interest to researchers studying microbial evolution, antibiotic resistance, and structure-function relationships in pathogens.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Green et al. attempt to use large-scale protein structure analysis to find signals of selection and clustering related to antibiotic resistance. This was applied to the whole proteome of Mycobacterium tuberculosis, with a specific focus on the smaller set of known antibiotic-resistance-related proteins.

      Strengths:

      The use of geospatial analysis to detect signals of selection and clustering on the structural level is really intriguing. This could have a wider use beyond the AMR-focussed work here and could be applied to a more general evolutionary analysis context. Much of the strength of this work lies in breaking ground into this structural evolution space, something rarely seen in such pathogen data. Additional further research can be done to build on this foundation, and the work presented here will be important for the field.

      The size of the dataset and use of protein structure prediction via AlphaFold, giving such a consistent signal within the dataset, is also of great interest and shows the power of these approaches to allow us to integrate protein structure more confidently into evolution and selection analyses.

      Weaknesses:

      There are several issues with the evolutionary analysis and assumptions made in the paper, which perhaps overstate the findings, or require refining to take into account other factors that may be at play.

      (1) The focus on antimicrobial resistance (AMR) throughout the paper contains the findings within that lens. This results in a few different weaknesses:

      (a) While the large size of the analysis is highlighted in the abstract and elsewhere, in reality, only a few proteins are studied in depth. These are proteins already associated with AMR by many other studies, somewhat retreading old ground and reducing the novelty.

      (b) Beyond the AMR-associated proteins, the proteome work is of great interest, but only casually interrogated and only in the context of AMR. There appears to be an assumption that all signals of positive selection detected are related to AMR, whereas something like cas10 is part of the CRISPR machinery, a set of proteins often under positive selection, and thus unlikely to be AMR-related.

      (2) The strength of the signal from the structural information and the novelty of the structural incorporation into prediction are perhaps overstated.

      (a) A drop of 13% in F1 for a gain of 2% in PPV is quite the trade-off. This is not as indicative of a strong predictor that could be used as the abstract claims. While the approach is novel and this is a good finding for a first attempt at such complex analysis, this is perhaps not as significant as the authors claim

      (b) In relation to this, there is a lack of situating these findings within the wider research landscape. For instance, the use of structure for predicting resistance has been done, for example, in PncA (https://academic.oup.com/jacamr/article/6/2/dlae037/7630603, https://www.sciencedirect.com/science/article/pii/S1476927125003664, https://www.nature.com/articles/s41598-020-58635-x) and in RpoB (https://www.nature.com/articles/s41598-020-74648-y). These, and other such works, should be acknowledged as the novelty of this work is perhaps not as stark as the authors present it to be.

      (3) The authors postulate that neutral AA substitutions would be randomly distributed in the protein structure and thus use random mutations as a negative control to simulate this neutral evolution. However, I am unsure if this is a true negative control for neutral evolution. The vast majority of residues would be under purifying selection, not neutral selection, especially in core proteins like rpoB and gyrA. Therefore, most of these residues would never be mutated in a real-world dataset. Therefore, you are not testing positive selection against neutral selection; you are testing positive against purifying, which will have a much stronger signal. This is likely to, in turn, overestimate the signal of positive selection. This would be better accounted for using a model of neutral evolution, although this is complex and perhaps outside the scope. Still, it needs to be made clear that these negative controls are not representative of neutral evolution.

      (4) In a similar vein, the use of 15 Å as a cut-off for stating co-localisation feels quite arbitrary. The average radius of a globular protein is about 20 Å, so this could be quite a large patch of a protein. I think it may be good to situate the cut-off for a 'single location' within a size estimator of the entire protein, as 15 Å could be a neighbourhood in a large protein, but be the whole protein for smaller ones.

    3. Reviewer #2 (Public review):

      Summary:

      This is an important study that, for the first time, systematically places the homoplastic genetic variation observed in the coding regions in a large collection of >31,000 M. tuberculosis samples into the protein structural context. This should be much more informative when, e.g. predicting antimicrobial resistance. The authors imaginatively apply the Getis-Ord score, which originated in geographical spatial analysis but has also been used in human disease to demonstrate that missense mutations in M. tuberculosis known to be associated with antimicrobial resistance are clustered in space. That they are able to consider almost all of the proteome using a large dataset of 31,000 M. tuberculosis complex clinical samples, which makes the evidence convincing.

      Strengths:

      To my knowledge, this is the first study to place the homoplastic missense mutations from a large clinical dataset into their protein structural context and attempt to look for clustering in space, which could be indicative of a recent evolutionary pressure, such as the use of antibiotics. The field usually only views resistance through the genetic paradigm, so it is delightful to see a structural paradigm being brought to bear, as this should, in theory, be much more informative, as protein structure is much closer to function. In addition, the dataset used is large (>31,000 clinical M. tuberculosis samples), and the authors are able to consider almost all of the ORFs (3,687/3,996) in the M. tuberculosis reference, and hence the analysis is comprehensive.

      Weaknesses:

      It is not apparent at the time of this review if the study could be reproduced by other researchers as e.g. whilst the authors state that the raw sequencing files (FASTQ) underpinning the dataset of 31,428 M. tuberculosis isolates can be downloaded the table in the Supplement containing the sample and accession identifiers contains rows that do not contain NCBI accessions e.g. '01R0685' or 'IDR 1600023875' or '1479144813357T181715lib5022nextseqn0035151bp' instead of the expected form e.g. 'SAMEA1016138'. I have searched the NCBI SRA using these terms and got no results, so they cannot be used to download any FASTQ files. There is also no information in the preprint on how the reads were processed (which is a complex process) and the dataset of SNPs subsequently built. One can trace back through the references, but I cannot find anywhere where one can download the SNP dataset, which would permit researchers to reproduce at least the latter stages of the work -- one obvious option would be to make the SNP dataset available. Likewise, the authors have constructed a "M. tuberculosis structureome", which would be very useful for the community but does not appear to be publicly available. At the time of the review, not all the GitHub repositories were public, so these points may have been rectified when that was corrected.

      The authors correctly point out in the Introduction that supervised methods like GWAS or ML need datasets with matching genetic and phenotypic drug susceptibility data, which are much difficult/expensive to obtain, but don't then close the loop by comparing their results back to such supervised methods. They pick out RnJ as having previously been identified by a GWAS, but it would have provided a useful validation of their method to e.g. demonstrating that X% of the genes they identify were also identified by GWAS/ML studies, and therefore their method can achieve similar results but without having to collect pDST data.

      Whilst the authors acknowledge that assuming all sites are equally likely to mutate in their random shuffling procedure is a shortcoming, a bigger weakness is, I suspect, that one should also only consider which amino acids could arise at each codon due to a SNP. Shuffling assumes any amino acid can arise at any codon which is only possible with multiple nucleotide changes, which is possible but highly unlikely.

      Finally, the authors implicitly assume that the mutations do not perturb the structure of the proteins, which is likely to be generally true for essential genes but less likely to be true for non-essential genes. This assumption underpins their entire approach and should be borne in mind when evaluating the results.

    1. eLife Assessment

      This valuable study shows that combining reactivation-based training with anodal tDCS yields an unusually broad generalization of visual perceptual learning, while preserving robust learning gains and markedly reducing total training time. Although the empirical evidence is solid, the proposed mechanistic account, i.e., the GABA modulation, disrupted offline consolidation and reduced perceptual overfitting, remains insufficiently substantiated, as these assumptions lack direct neurochemical support, and several alternative behavioral explanations and necessary control comparisons have not been fully addressed. The work will be of broad interest to researchers investigating brain plasticity, perceptual learning, and rehabilitation training.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Xie and colleagues presents an intriguing behavioral finding for the field of perceptual learning (PL): combining the reactivation-based training paradigm with anodal tDCS induces complete generalization of the learning effect. Notably, this generalization is achieved without compromising the magnitude of learning effects and with an 80% reduction in total training time. The experimental design is well-structured, and the observed complete generalization is robustly replicated across two stimulus dimensions (orientation and motion direction).

      However, while the empirical results are methodologically valid and scientifically surprising, the theoretical framework proposed to explain them appears underdeveloped and, in some cases, difficult to reconcile with the existing literature. Several arguments are insufficiently justified. In addition, the introduction of a non-standard metric (NGI: normalized learning gain index) raises concerns about the interpretability and comparability with existing PL literature.

      Strengths:

      (1) Rigorous experimental design

      In this study, Xie and colleagues employed a 2×2 factorial design (Training paradigm: Reactivation vs. Full-Practice × tDCS protocols: Anodal vs. Sham), which allowed clear dissociation of the main and interaction effects.

      (2) High statistical credibility

      Sample sizes were predetermined using G*Power, non-significant effects were evaluated using the Bayes factor, and the core behavioral findings were replicated in a second stimulus dimension. These strengthen the credibility of the findings.

      (3) Strong translational potential

      The observed complete generalization could have useful implications for sensory rehabilitation. The large reduction (80%) in total training time is particularly compelling.

      Weaknesses:

      (1) NGI (Normalized learning gain index) is a non-standard behavioral metric and may distort interpretability.

      NGI (pre - post / ((pre + post) / 2)) is rarely used in PL studies to measure learning effects. Almost all PL studies rely on raw thresholds and percent improvements (pre - post / pre), making it difficult to contextualize the current NGI-based results within the broader field. The current manuscript provides no justification for adopting NGI.

      A more critical issue is the NGI's nonlinearity: by normalizing to the mean of pre- and post-test thresholds, it disproportionately inflates learning effects for participants with lower post-test thresholds. Notably, the "complete generalization" claims are illustrated mainly with NGI plots. Although the authors also analyze thresholds directly and the results also support the core claim, the interpretation in the text relies heavily on NGI.

      The authors may consider rerunning key analyses using the standard percent improvement metric. If retaining NGI, the authors should provide explicit justification for why NGI is superior to standard measures.

      (2) The proposed theoretical framework is sometimes unclear and insufficiently supported.

      The authors propose the following mechanistic chain:

      (a) reactivation-based learning depends on offline consolidation mediated by GABA (page 4 line 73);

      (b) online a-tDCS reduces GABA (page 4, line 76), thereby disrupting offline consolidation (page 11, line 225);

      (c) disrupted offline consolidation reduces perceptual overfitting (page 4, line 77; page 11, line 225), thereby enabling generalization;

      (d) under full-practice training, a-tDCS increases specificity via a different mechanism (page 11 line 235).

      While this framework is plausible in broad terms, several components are speculative at best in the absence of neurochemical or neural measurements.

      (3) Several reasoning steps require further clarification.

      (a) Mechanisms of Reactivation-based Learning.

      The manuscript focuses on the neurochemical basis of reactivation-based learning. However, reactivation-induced neurochemical changes differ across brain regions. In the motor cortex, Eisenstein et al. (2023) reported that after reactivation, increased GABA and decreased E/I ratio were associated with offline gains. In contrast, Bang et al. (2018) demonstrated that, in the visual cortex, reactivation decreased GABA and increased E/I ratio. While both studies are consistent with GABA involvement, the direction of GABA modulation differs. The authors should clarify this discrepancy.<br /> More importantly, Bang et al. (2018) demonstrated that reactivation-based (3 blocks) and full-practice (16 blocks) training produced similar time courses of E/I ratio changes in V1: an initial increase followed by a decrease. Given this similarity, the manuscript would benefit from a more thorough discussion of how the two paradigms diverge mechanistically. For example, behaviorally, Song et al. (2021) reported greater generalization with reactivation-based training than with full-practice training, aligning with Kondat et al. (2025). Neurally, Kondat et al. (2024) showed that reactivation-based training increased activity in higher-order brain regions (e.g., IPS), whereas full practice training reduced connectivity between temporal and parietal regions.

      (b) tDCS Mechanisms and Protocols.

      The effect of a-tDCS on GABA is not consistent across brain regions. While a-tDCS reliably reduces GABA in the motor cortex, recently, a more related work (Abuleli et al., 2025) reports no significant modulation of GABA or Glx in V1, challenging the authors' assumption of tDCS-induced GABA reduction in the visual cortex.

      The manuscript proposes that online a-tDCS disrupts offline consolidation is somewhat difficult to interpret conceptually. Online tDCS typically modulates processes occurring during stimulation (e.g., encoding process, attentional state), whereas consolidation occurs afterward. Thus, stating that online tDCS protocols only disrupt offline consolidation without considering the possibility that they first modulate the encoding process is difficult to interpret. Even if tDCS has prolonged effects, the link between online stimulation and disruption of offline consolidation remains unelucidated.

      (c) Missing links between GABA modulation and perceptual overfitting.

      The proposed chain ("tDCS disrupts consolidation → reduced overfitting → improved generalization") skips a critical step: how GABA modulation translates to changes in neural representational properties (e.g., tuning width, representational overlap between trained/untrained stimuli) that define "perceptual overfitting." The PL literature has not established a link between GABA levels and these representational changes, leaving a key component of the mechanistic explanation underspecified.

      (d) Insufficient explanation of the opposite effects.

      The manuscript does not fully explain why the same a-tDCS promotes generalization in reactivation-based training but increases specificity in full-practice training. Both paradigms engage offline consolidations, and, as mentioned above, the time courses of E/I ratio changes are similar for 3-block reactivation-based or 16-block training. Thus, if offline consolidation mechanisms (and their associated E/I changes) are comparable across paradigms, it is unclear why identical a-tDCS would produce opposite outcomes in the two paradigms.

    3. Reviewer #2 (Public review):

      Xie et al., combined transcranial direct current brain stimulation (tDCS) and a reactivation-based training protocol to investigate the generalization of learning. Using visual perceptual learning as a model, they found that a reactivation-based training protocol, when combined with anodal tDCS over the visual cortex, can induce learning transfer to untrained visual orientations and motion directions. Interestingly, extending reactivation-based training to a full-training protocol with more training trials did not induce generalization of learning. Furthermore, even when paired with tDCS, extending the training protocol did not provide benefits for generalization of learning. This study provides interesting insights into the mechanisms of brain plasticity and how future training protocols could be designed to achieve robust and generalizable learning outcomes.

      The authors supported their arguments with a series of well-constructed experiments. The conclusions are largely supported by the data, although some clarifications about their hypotheses and control analyses could strengthen the work:

      (1) The authors hypothesize that tDCS can reduce perceptual overfitting through reduced GABA concentrations in the visual cortex, which leads to learning transfer. However, without a clear description of the role of GABA in perceptual learning and perceptual overfitting, it is difficult for the reader to understand why reduced GABA concentrations would contribute to generalization. Do the authors imply that increased GABA can lead to specificity? Are there studies that can support this argument? The authors also did not describe clearly how reactivation-based visual perceptual learning can modify GABA levels in the visual cortex differently (compared to full-practice) during training and during the offline consolidation phase. In order for the reader to better understand their hypotheses and the motivation of the current study, it is beneficial for the authors to provide a concise but clearer description of the roles of GABA in perceptual learning with a focus on the roles of GABA in generalization and during off-line consolidation for different types of training protocols (see for instance Bang et al., 2018; Frangou et al., 2019; Frank et al., 2022; Jia et al., 2024; Shibata et al., 2011; Tamaki et al., 2020; Yamada et al., 2024).

      (2) Based on the results, an alternative explanation is that the amount of transfer to the untrained visual feature might be related to the amount of learning for the trained visual feature, which might be different depending on the training protocol and brain stimulation combination. Is it beneficial to compare the amount of learning gains across different training and stimulation protocols to rule out this possibility? Would more learning gains for the trained visual feature predict less transfer for the untrained visual feature? Are there correlations between learning gains and learning transfer?

      (3) The authors argued that a reactivation-based training protocol, rather than the amount of training, was critical for the generalization of learning. The control experiment in the study showed that full-practice training combined with tDCS did not lead to transfer, as in reactivation-based training. However, in order to rule out the confounding effects from the amount of training, it is crucial to examine whether a training protocol in which a similar number of trials as in the reactivation-based training but not separated across training sessions would lead to similar generalization of learning.

    4. Reviewer #3 (Public review):

      Summary:

      This research focuses on a long-lasting and interesting phenomenon in human plasticity. When humans learn basic perceptual skills such as judging the orientation of a simple line, the learned abilities are often limited to the trained condition but not generalizable to untrained conditions. The authors hypothesized that this learning specificity was related to GABA, an inhibitory neurotransmitter in the brain. Using a novel training method that combines reactivation and a brain stimulation method (tDCS) that hypothetically inactivates GABA, the authors hypothesized that learned visual perceptual skills would show greater transfer.

      Strengths:

      The authors conducted a list of well-conceived behavior studies to demonstrate the effectiveness of their proposed method in enabling learning transfer in two different visual tasks, and carefully conducted comparison studies to elucidate other possible explanations. The sample size was adequate to convey convincing results, and the analyses were thorough.

      Weaknesses:

      While the authors built their training paradigm on

      (1) the hypothetical role GABA plays in inhibiting learning transfer, and

      (2) the hypothetical impact tDCS may have on GABA, there was no direct evidence supporting these hypotheses in the current study.

      Further, learning specificity takes many formats from features to locations to tasks; it is not yet clear the scope of the observed transfer with the proposed method.

    1. eLife Assessment

      This important study establishes the first vertebrate models of DeSanto-Shinawi Syndrome, revealing conserved craniofacial and social and behavioral phenotypes across mouse and zebrafish that mirror key clinical features. The solid evidence is supported by behavioral, anatomical, and molecular analyses of Wac animal mutants that broadly support the authors' claims, though additional mechanistic investigation would strengthen the conclusions. This study sets a baseline for future mechanistic studies and reports a platform to test approaches to reverse phenotypes.

    2. Reviewer #1 (Public review):

      Summary:

      The authors generated mouse and zebrafish models for DeSanto-Shinawi Syndrome, caused by loss-of-function variants in the WAC gene. Using these vertebrate systems, they demonstrate conserved craniofacial and social-behavioral phenotypes that parallel human clinical features, along with deficits in GABAergic markers. They observe increased seizure susceptibility and male-biased brain volumetric changes in Wac mutant mice. Together, these findings begin to define the biological consequences of Wac haploinsufficiency and provide valuable resources for future mechanistic studies.

      Strengths:

      WAC is a high-confidence neurodevelopmental disorder gene and one of the genes identified by large-scale exome sequencing efforts, including the Satterstrom et al. (2020) autism spectrum disorder cohort. This study establishes the first vertebrate Wac models, addressing a major gap in the understanding of DeSanto-Shinawi Syndrome, and provides a framework for studying other syndromic forms of autism. The models generated will be impactful and useful to the community to study and understand DeSanto-Shinawi Syndrome.

      The cross-species analysis is important and well executed, and reveals both conserved and divergent phenotypes. The behavioral and anatomical assays are rigorously executed and well-controlled, and the inclusion of RNA-sequencing analyses adds valuable insights into the mechanisms underlying brain function in Wac mutants. Notably, the RNA-seq data reveal upregulation of several clustered protocadherins, genes central to neuronal identity and cell-cell interactions, which are known to be regulated by dynamic developmental regulation of chromatin architecture. This observation provides an intriguing hint that could link Wac function to higher-order chromatin organization and neuronal connectivity.

      Weaknesses:

      The evidence is solid, but the study remains incomplete in its mechanistic depth and molecular interpretation. The authors compellingly describe behavioral, anatomical, and transcriptomic phenotypes associated with WAC loss, yet do not explore how WAC mechanistically regulates chromatin or transcription. Given prior evidence that WAC interacts with the RNF20/40 ubiquitin ligase complex and promotes histone H2B ubiquitination and transcriptional elongation, the paper would benefit from a discussion of these functions as a potential link between Wac haploinsufficiency and the observed changes in neuronal gene expression. Similarly, the authors mention WAC's WW and coiled-coil domains but do not consider how these domains could mediate nuclear interactions or recruitment of transcriptional cofactors that shape gene regulation and chromatin organization in neurons.

      The transcriptomic analysis is rich but largely descriptive. Although the upregulation of clustered protocadherins is particularly intriguing, these findings are not validated or localized to specific neuronal populations. The study would be strengthened by independently validating the most significant RNA-seq changes, such as protocadherin gamma genes, using in situ hybridization methods to confirm the spatial and cellular specificity of expression changes.

      Finally, while the behavioral and MRI results add valuable breadth, their interpretation would be improved by clearer reporting of sample sizes, statistical corrections, and effect sizes to support claims of sex-specific and regional brain volume differences.

    3. Reviewer #2 (Public review):

      The authors describe the first deep neurological characterization of WAC mutation in two vertebrate species (zebrafish and mouse). They examine these at various levels, guided by the work in humans that has associated a heterozygous WAC mutation with DeSantos Shinawi Syndrome (DESSH). Therefore, they investigate the animals for a variety of phenotypes, following a template for what is seen when characterizing a new mouse/fish model of a developmental disability gene. Investigations include analysis of skull and jaw for abnormalities(both species), MRI of brain structure(in mice), electrophysiology(mice), assessment of signaling pathways (by Western blot, in mice), cell counts (both, more in mice), transcriptomics (mice), and behavior (both).

      Generally, this describes an important first characterization of the consequences of the mutation. Most of the studies appear well-conducted and reasonably powered, thus solid or convincing. However, there are a few places where the data presentation could be improved for clarity, and a few concerns about some choices in analytical approach for a couple of the experiments, where improved statistical approaches could improve their sensitivity and/or better rule out false positives, and thus the support of some of these claims is currently incomplete. There is also some lack of clarity about the rationale for some decisions regarding the fish genetics. Nonetheless, this is an important and useful first characterization of many phenotypes of these lines. Such experiments form a baseline for future mechanistic studies in the same lines and a platform to test approaches to reverse phenotypes.

      Individual claims and their strength & weaknesses:

      (1) The authors developed mouse and zebrafish models of WAC deletion

      They used the existing KOMP floxed WAC line to generate a null allele. For the mouse, there is a Western showing that it is indeed null for the protein. The fish data is less robustly validated - they don't confirm the allele in null at the protein or RNA level, and fish have two paralogs (waca and wacb), and this paper only characterizes one of these. So this evidence is less clear. The evaluated mice are heterozygous (Het), similar to patients, while the fish appear to be evaluated as homozygous mutants.

      (2) The authors show that both species show altered craniofacial features

      These data appear well powered, and the findings are robust.

      (3) Each model altered GABAergic neurons

      In mice, the authors stained with PV antibodies and saw a decrease in cells positive for this staining. A second marker, Lhx6, does not show a difference, suggesting this might be a change in PV expression rather than cell number. They could maybe look into the literature to see if this loss of just the protein also occurs in other models. Overall, the sample size here is a bit smaller than other parts of the paper (n=3), and the methods on the cell counts were less clear, so it is not as clear that this finding is as robust. The authors counted several other broad classes of cells, and those appear normal. Interestingly, there might also be some TBR1 mislocalization in layer 6 that might be significant with added power.

      The fish data is based on an in situ hybridization for GAD. The measure shown is the width of the positive area in the forebrain. This measure is not one I have seen much before, and has potential to be driven by something unrelated to GABA (e.g., if the whole forebrain were simply a bit smaller). So this analysis could use a couple of other approaches (density of signal?) and/or a control probe for some other brain gene showing the measure is normal, and thus it is not just a size issue.

      (4) Mice were more susceptible to the seizure-inducing agent PTZ

      These data appear well powered, and the findings are robust. The authors also did a fair amount of useful electrophysiology that was all normal, but appeared to be well executed.

      (5) Mice had changes in brain volume that interact with sex

      The authors conducted an MRI on a good number of mice and reported a slight increase in global volume just in males. Sample size is fair, but the statistical approach here may be better if it puts males and females in the same model (to boost power and explicitly test for sex by genotype interaction that they report), and there is some chance that the brain region level differences that they report could include some false positives. They tested many regions, and it is not clear whether or not they corrected for the number of tests. Often, an FDR correction would be used in such imaging studies. It may be that only the most robust regional findings will survive those corrections. It is interesting data either way, but the analysis could be improved.

      (6) Several behaviors are altered in the mice as well

      These studies were fairly well-powered (n=15,16), and they found several positive and negative results, including alterations in memory and sociability in both species. There is a minor statistical flaw in the three-chamber analysis (they don't actually compare the Hets directly to the wildtypes in their statistical testing - a common mistake in neuroscience that should be addressed. But the data look like they will probably still be significant when correctly analyzed. In the supplement, the authors could do a bit more with the data they have to look at hyperactivity (i.e., show total motion in open field, not just time in center vs. periphery), and adding sex to their model might improve sensitivity for genotype effects.

      (7) Some biochemical signaling pathways are altered in the brain

      These are n=4 immunoblots, and show altered phospho ERK, but no changes in other signaling events predicted from prior WAC literature like H2B ubiquitination. They appear well done, and the authors share the full blots in the supplement.

      (8) WAC deletion also alters gene expression in the brain

      These studies were well-powered for RNAseq, with 10 and 14 samples, using neonates (P2), just the forebrain. The sequencing quality metrics all looked good, and the approach to analysis was okay. It would be stronger to again include sex in the model, rather than separate by sex. There were some typos in this part of the paper that made part of the conclusions unclear, but the RNAseq nicely confirmed the mutation of the mice, and discovered many differentially expressed genes, consistent with the role of this gene as a regulator of transcription. The presentation could be expanded to make more use of the data. Overall, though, this is a useful first characterization of the transcriptome in the line.

    1. eLife Assessment

      This fundamental study reports solid evidence for early verbal episodic memory formation. The findings demonstrate that speaker identity is a crucial feature, enabling episodic-like memories from birth, and will be of interest to cognitive neuroscientists working on brain development, memory, language learning and social cognition.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigates whether newborns can use speaker identity to separate verbal memories, aiming to shed light on the earliest mechanisms of language learning and memory formation. The authors employ a well-designed experimental paradigm using functional near-infrared spectroscopy (fNIRS) to measure neural responses in newborns exposed to familiar and novel words, with careful counterbalancing and acoustic controls. Their main finding is that newborns show differential neural activation to novel versus familiar words, particularly when speaker identity changes, suggesting that even at birth, infants can use indexical cues to support memory.

      Strengths:

      Major strengths of the work include its innovative approach to a longstanding question in developmental science, the use of appropriate and state-of-the-art neuroimaging methods for this age group, and a thoughtful experimental design that attempts to control for order and acoustic confounds. The study addresses a significant gap in our understanding of how infants process and remember speech, and the data are presented transparently, with clear reporting of both significant and non-significant results.

      Weaknesses:

      However, there are notable weaknesses that limit the strength of the conclusions. The main recognition effect is restricted to a specific subgroup of participants and emerges only during a particular testing window, raising questions about the robustness and generalizability of the findings. The sample size, while typical for infant neuroimaging, is modest, and the statistical power is further reduced by missing data and group-dependent effects. Additionally, the claims regarding episodic memory and evolutionary implications are somewhat overstated, as the paradigm primarily demonstrates memory retention over a few minutes without evidence of the rich, contextually bound recall characteristic of fully developed episodic memory.

      Overall, the authors have achieved their primary aim of demonstrating that speaker identity can facilitate memory separation in newborns, providing valuable preliminary evidence for early indexical processing in language learning. The results are intriguing and likely to stimulate further research, but the limitations in effect robustness and theoretical interpretation mean that the findings should be viewed as an important step forward rather than a definitive answer. The methods and data will be of interest to researchers studying infant cognition, memory, and language, and the study highlights both the promise and the challenges of probing complex cognitive processes in the earliest stages of life.

    3. Reviewer #2 (Public review):

      Summary:

      Previous studies by some of the same authors of the actual manuscript showed that healthy human newborns memorize recently learned nonsense words. They exposed neonates to a familiarization period (several minutes) when multiple repetitions of a bisyllabic word were presented, uttered by the same speaker. Then they exposed neonates to an "interference period" when newborns listened to music or the same speaker uttering a different pseudoword. Finally, neonates were exposed to a test period when infants hear the familiarized word again. Interestingly, when the interference was music, the recognition of the word remained. The word recognition of the word was measured by using the NIRS technique, which estimates the regional brain oxygenation at the scalp level. Specifically, the brain response to the word in the test was reduced, unveiling a familiarity effect, while an increase in regional brain oxygenation corresponds to the detection of a "new word" due to a novelty effect. In previous studies, music does not erase the memory traces for a word (familiarity effect), while a different word uttered by the same speaker does.

      The current study aims at exploring whether and how word memory is interfered with by other speech properties, specifically the changes in the speaker, while young children can distinguish speakers by processing the speech. The author's main hypothesis anticipates that new speaker recognition would produce less interference in the familiarized word because somehow neonates "separate" the processing of both words (familiarized uttered by one speaker, and interfering word, uttered by a different speaker), memorizing both words as different auditory events.

      From my point of view, this hypothesis is interesting, since the results would contribute to estimating the role of the speaker in word learning and speech processing early in life.

      Strengths:

      (1) New data from neonates. Exploring neonates' cognitive abilities is a big challenge, and we need more data to enrich the knowledge of the early steps of language acquisition.

      (2) The study contributes new data showing the role of speaker (recognition) on word learning (word memory), a quite unexplored factor. The idea that neonates include speakers in speech processing is not new, but its role in word memory has not been evaluated before. The possible interpretation is that neonates integrate the process of the linguistic and communicative aspects of speech at this early age.

      (3) The study proposes a quite novel analytic approach. The new mixed models allow exploring the brain response considering an unbalanced design. More than the loss of data, which is frequent in infants' studies, the familiarization, interference and learning processes may take place at different moments of the experiment (e.g. related to changes in behavioural states along the experiment) or expressed in different regions (e.g. related to individual variations in optodes' locations and brain anatomy).

      Weaknesses:

      I did not find major weaknesses. However, I would like to have more discussion or explanation on the following points.

      (1) It would be fine to report the contribution of each infant to the analysis, i.e. how many good blocks, 1 to 5 in sequence 1 and 2, were provided by each infant.

      (2) Why did the factor "blocknumber" range from 0 to 4? The authors should explain what block zero means and why not 1 to 5.

      (3) I may suggest intending to integrate the changes in brain activity across the 3 phases. That is, whether changes in familiarization relate to changes in the test and interference phases. For instance, in Figure 2, the brain response distinguishes between same and novel words that occurred over IFG and STG in both hemispheres. However, in the right STG there was no initial increase in the brain response, and the response for the same was higher than the one for novels in the 5th block.

      (4) Similarly, it is quite amazing that the brain did not increase the activity with respect to the familiarization during the interference phase, mainly over the left hemisphere, even if both the word and speaker changed. Although the discussion considers these findings, an integrated discussion of the detection of novel words and the detection of a novel speaker over time may benefit from a greater integration of the results.

      Appraisal:

      The authors achieved their aims because the design and analytic approaches showed significant differences. The conclusions are based on these results. Specifically, the hypothesis that neonates would memorize words after interference, when interfered speech is pronounced by a different speaker, was supported by the data in blocks 2 and 5, and the potential mechanisms underlying these findings were discussed, such as separate processing for different speakers, likely related to the recognition of speaker identity.

      I think the discussion is well-structured, although I may suggest integrating the changes into the three phases of the study. Maybe comparing with other regions, not related to speech processing.

      Evaluating neonates is a challenge. Because physiology is constantly changing. For instance, in 9 minutes, newborns may transit from different behavioral states and experience different physiological needs.

      This study offers the opportunity to inspire looking for commonalities and individual differences when investigating early memory capacities of newborns.

    1. eLife Assessment

      This study offers a valuable contribution to understanding how working memory (WM) shapes neural processing in extrastriate cortex. By applying spectral decomposition to LFP recordings from primate middle temporal area (MT) during a spatial WM task, the authors show that lower-frequency components (theta, alpha, and beta, but not gamma or high-gamma) correlate with trial-by-trial gain modulation of visually evoked responses. However, certain aspects of the gain-modulation and statistical analyses are incomplete. A clearer and more comprehensive description of these components would substantially strengthen the manuscript.

    2. Reviewer #1 (Public review):

      Working memory affects sensory processing. Observers make faster and more accurate perceptual decisions at remembered locations, and corresponding regions of retinotopic visual cortex display enhanced response gain and modulations in oscillatory activity and spike-phase coupling.

      Roshanaei et al investigate the relationship between working memory, oscillatory activity, and response gain by reanalyzing extracellular laminar probe recordings from area MT of rhesus monkeys performing a spatial working memory task. During the memory period, visual probes were flashed in the receptive field of the recorded neurons, allowing a comparison of visual responses when memory overlapped with this receptive field (IN) or a location in the opposite hemifield (OUT). They first replicate a range of findings, including increased power in lower frequency bands (theta and alpha/beta) and increased visually-evoked responses in the IN condition. The authors next deployed a spectral technique (MODWT) to decompose the local field potential on single trials into 6 non-arbitrary component frequency bands. This approach allows the authors to observe shifts in peak spectral frequencies across IN and OUT trials. Finally, these single-trial spectral decompositions allowed the authors to relate frequency band power and response gain. This analysis revealed that response gain tended to increase with power in lower (alpha, beta, and theta) frequency bands, and this effect minimally interacted with the remembered location.

      Together, these interesting results provide correlational evidence that the effect of working memory on response gain may be mediated by oscillatory power. As the authors note, these results are also consistent with theories positing that lower frequency oscillatory activity primarily reflects working-memory related feedback signals from prefrontal and parietal cortex.

      These findings also suggest opportunities for further exploration. From a methodological perspective, it's not clear if the particular spectral decomposition highlighted here is necessary for obtaining these results, or if applying more standard approaches to single trials (as in Lundqvist et al., 2016) would have provided similar sensitivity. Additionally, although the relationship among working memory, oscillatory power, and response gain explored here is necessarily correlational, it could be of interest to subject these factors to a mediation analysis in this or future studies. Finally, the careful analysis of oscillatory phenomena reported here can ideally be used to inform large-scale circuit models and constrain the underlying mechanism.

    3. Reviewer #2 (Public review):

      Summary:

      Roshanaei et al investigate how working memory (WM) modulates neural activity in the primate visual system by examining local field potentials (LFPs) and spiking activity recorded in area MT. This work is an extension and the reuse of the dataset of the group's prior manuscript, Bahmani et al, Neuron 2018. The animals perform a spatial working memory task where they need to remember the location of a probe stimulus presented within (IN condition) or outside (OUT condition) the neuron's mapped receptive field (RF).

      As the first step, the authors replicate the findings in their Neuron 2018 paper by showing:<br /> (1) Significant modulation of the LFP power in αβ band during the working memory period in IN vs OUT conditions. This effect was absent in the gamma band.<br /> (2) A significant increase in phase-coded mutual information for probe location for the IN condition compared to the OUT condition.

      The authors then apply the Maximal Overlap Discrete Wavelet Transform (MODWT) to decompose LFP signals at the single-trial level, an approach that allows them to identify oscillatory components without imposing pre-defined frequency bands. They find that the precise frequencies of low-frequency oscillations (theta, alpha, and beta) correlate with the visually evoked firing rates of MT neurons.

      Strengths:

      The work addresses an important question: how cognitive states such as working memory modulate sensory processing in the visual cortex. More specifically, as we are expanding our understanding of the role of feedback in the brain, a me role of oscillations.

      The application of MODWT to single-trial LFPs represents a methodological advance over traditional bandpass filtering, which typically relies on trial-averaged power and may miss fine-grained frequency variability.

      The work aligns with ongoing efforts to understand how feedback and oscillatory dynamics contribute to top-down modulation in the brain.

      Weaknesses:

      (1) Several early results (e.g., increases in alpha/beta power and phase coding) closely replicate previous work from the same group and may be better placed in the Supplementary Information or omitted entirely. The novelty of the current paper lies mainly in the single-trial decomposition and frequency-rate relationship. However, the manuscript fails to expand the prior findings using the traditional methods, or at least offer a more mechanistic insight into the role of top-down modulation of the MT area during working memory tasks. Single-trial analysis can offer new avenues for mechanistic insight. For example, authors could have investigated the relationship of Cross-frequency coupling (CFC) with trial-by-trial behavior of the animal (Voytek et al., 2010) or transient synchronous oscillations for memory maintenance (Buschman et al, 2012).

      (2) The statistical methods require greater transparency. Details such as whether tests were one- or two-sided, how multiple comparisons were controlled, and how correlations among nearby electrodes were handled are not fully reported.

    1. eLife Assessment

      This fundamental work substantially advances our understanding of episodic memory by proposing a biologically plausible mechanism through which hippocampal barcode activity enables efficient memory binding and flexible recall. The evidence supporting the conclusions is convincing, with rigorously validated computational models and alignment with experimental findings. The work will be of broad interest to neuroscientists and computational modelers studying memory and hippocampal function.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors develop a biologically plausible recurrent neural network model to explain how the hippocampus generates and uses barcode-like activity to support episodic memory. They address key questions raised by recent experimental findings: how barcodes are generated, how they interact with memory content (such as place and seed-related activity), and how the hippocampus balances memory specificity with flexible recall. The authors demonstrate that chaotic dynamics in a recurrent neural network can produce barcodes that reduce memory interference, complement place tuning, and enable context-dependent memory retrieval, while aligning their model with observed hippocampal activity during caching and retrieval in chickadees.

      Strengths:

      (1) The manuscript is well-written and structured.

      (2) The paper provides a detailed and biologically plausible mechanism for generating and utilizing barcode activity through chaotic dynamics in a recurrent neural network. This mechanism effectively explains how barcodes reduce memory interference, complement place tuning, and enable flexible, context-dependent recall.

      (3) The authors successfully reproduce key experimental findings on hippocampal barcode activity from chickadee studies, including the distinct correlations observed during caching, retrieval, and visits.

      (4) Overall, the study addresses a somewhat puzzling question about how memory indices and content signals coexist and interact in the same hippocampal population. By proposing a unified model, it provides significant conceptual clarity.

      Weaknesses:

      The recurrent neural network model incorporates assumptions and mechanisms, such as the modulation of recurrent input strength, whose biological underpinnings remain unclear. The authors acknowledge some of these limitations thoughtfully, offering plausible mechanisms and discussing their implications in depth. It may be worth exploring the robustness of the results to certain modeling assumptions. For instance, the choice to run the network for a fixed amount of time and then use the activity at the end for plasticity could be relaxed.

    3. Reviewer #2 (Public review):

      Summary:

      Striking experimental results by Chettih et al 2024 have identified high-dimensional, sparse patterns of activity in the chickadee hippocampus when birds store or retrieve food at a given site. These barcode-like patterns were interpreted as "indexes" allowing the birds to retrieve from memory the locations of stored food.

      The present manuscript proposes a recurrent network model that generates such barcode activity and uses it to form attractor-like memories that bind information about location and food. The manuscript then examines the computational role of barcode activity in the model by simulating two behavioral tasks, and by comparing the model with an alternate model in which barcode activity is ablated.

      Strengths of the study:

      proposes a potential neural implementation for the indexing theory of episodic memory\

      Provides a mechanistic model of striking experimental findings: barcode-like, sparse patterns of activity when birds store a grain at a specific location

      A particularly interesting aspect of the model is that it proposes a mechanism for binding discrete events to a continuous spatial map, and demonstrates the computational advantages of this mechanism

      Weaknesses:

      The importance of different modeling ingredients and dynamical mechanisms could be made more clear.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors develop a biologically plausible recurrent neural network model to explain how the hippocampus generates and uses barcode-like activity to support episodic memory. They address key questions raised by recent experimental findings: how barcodes are generated, how they interact with memory content (such as place and seed-related activity), and how the hippocampus balances memory specificity with flexible recall. The authors demonstrate that chaotic dynamics in a recurrent neural network can produce barcodes that reduce memory interference, complement place tuning, and enable context-dependent memory retrieval, while aligning their model with observed hippocampal activity during caching and retrieval in chickadees.

      Strengths:

      (1) The manuscript is well-written and structured.

      (2) The paper provides a detailed and biologically plausible mechanism for generating and utilizing barcode activity through chaotic dynamics in a recurrent neural network. This mechanism effectively explains how barcodes reduce memory interference, complement place tuning, and enable flexible, context-dependent recall.

      (3) The authors successfully reproduce key experimental findings on hippocampal barcode activity from chickadee studies, including the distinct correlations observed during caching, retrieval, and visits.

      (4) Overall, the study addresses a somewhat puzzling question about how memory indices and content signals coexist and interact in the same hippocampal population. By proposing a unified model, it provides significant conceptual clarity.

      Weaknesses:

      The recurrent neural network model incorporates assumptions and mechanisms, such as the modulation of recurrent input strength, whose biological underpinnings remain unclear. The authors acknowledge some of these limitations thoughtfully, offering plausible mechanisms and discussing their implications in depth.

      One thread of questions that authors may want to further explore is related to the chaotic nature of activity that generates barcodes when recurrence is strong. Chaos inherently implies sensitivity to initial conditions and noise, which raises questions about its reliability as a mechanism for producing robust and repeatable barcode signals. How sensitive are the results to noise in both the dynamics and the input signals? Does this sensitivity affect the stability of the generated barcodes and place fields, potentially disrupting their functional roles? Moreover, does the implemented plasticity mitigate some of this chaos, or might it amplify it under certain conditions? Clarifying these aspects could strengthen the argument for the robustness of the proposed mechanism.

      In our model, chaos is used to produce a random barcode when forming memories, but memory retrieval depends on attractor dynamics. Specifically, the plasticity update at the end of the cache creates an attractor state, and then afterwards for successful memory retrieval the network activity must settle into this attractor rather than remaining chaotic. This attractor state is a conjunction of memory content (place and seed activity) and memory index (barcode activity). Thus a barcode is ‘reactivated’ when network dynamics during retrieval settle into this cache attractor, or in other words chaotic dynamics do not need to generate the same barcode twice.

      The reviewer raises an important point, which is how sensitivity to initial conditions and noise would affect the reliability of our proposed mechanism. The key question here is how noise will affect the network’s dynamics during retrieval. Would adding noise to the dynamics make memory retrieval more difficult? We thank the reviewer for suggesting we investigate this further, and below describe our experiments and changes to the manuscript to better address this topic.

      We first experimented with adding independent gaussian distributed noise into each unit, drawn independently at each timestep. We analyzed recall accuracy using the same task and methods as Fig. 4F while varying the magnitude of noise. Memory recall was quite robust to this form of noise, even as the magnitude of noise approached half of the signal amplitude. This first experiment added noise into the temporal dynamics of the network. We subsequently examined adding static noise into the network inputs, which can also be thought of as introducing noise into initial conditions. Specifically, we added independent gaussian distributed noise into each unit, with the random value held constant for the extent of temporal dynamics. This perturbation decreased the likelihood of memory recall in a graded manner with noise magnitude, without dramatically changing the spatial profile. Examination of dynamics on individual trials revealed that the network failed to converge onto a cache attractor on some random fraction of trials, with other trials appearing nearly identical to noiseless results. We now include these results in the text and as a new supplementary figure, Figure S4AB.

      To clarify the network dynamics and the purpose of chaos in our model, we make the following modifications in text:

      Section 2.3, paragraph 2 (starting at “To store memories…”):

      “…place inputs arrive into the RNN, recurrent dynamics generate an essentially random barcode, seed inputs are activated, and then Hebbian learning binds a particular pattern of barcode activity to place- and seed-related activity.”

      Section 2.3, paragraph 3 (starting at “Memory recall in our network…”): As an example, consider a scenario in which an animal has already formed a memory at some location l, resulting in the storage of an attractor \vec{a} into the RNN. The attractor \vec{a} can be thought of as a linear combination of place input-driven activity $p(l)$, seed input-driven activity $s$, and a recurrent-driven barcode component $b$. Later, the animal returns to the same location and attempts recall (i.e. sets r \= 1, Figure 3B). Place inputs for location l drive RNN activity towards $p(l)$, which is partially correlated with attractor \vec{a}, and the recurrent dynamics cause network activity to converge onto attractor \vec{a}. In this way, barcode activity $b$ is reactivated, along with the place and seed components stored in the attractor state, $p(l)$ and $s$. The seed input can also affect recall, as discussed in the following section.

      Section 2.4, final paragraph (starting “We further examined how model hyperparameters affected performance on these tasks”), added the following describing new results on adding noise: We found that adding noise to the network's temporal dynamics had little effect on memory recall performance (Figure S4A). However, large static noise vectors added to the network's input and initial state decreased the overall probability of memory recall, but not its spatial profile (Figure S4B).

      It may also be worth exploring the robustness of the results to certain modeling assumptions.  For instance, the choice to run the network for a fixed amount of time and then use the activity  at the end for plasticity could be relaxed.

      As described above, chaotic dynamics are necessary to generate a barcode during a cache, but not to reactivate that barcode during retrieval. During a successful memory retrieval, network activity settles into an attractor state and thus does not depend on the duration of simulated dynamics. The choice of duration to run dynamics during caching is important, but only insofar as activity significantly decorrelates from the initial state. We show in Figure S1B that decorrelation saturates ~t=25, and thus any random time point t > 25 would be similarly effective. We used a fixed duration runtime for caches only to avoid introducing unnecessary complication into our model.

      Reviewer #2 (Public review):

      Summary:

      Striking experimental results by Chettih et al 2024 have identified high-dimensional, sparse patterns of activity in the chickadee hippocampus when birds store or retrieve food at a given site. These barcode-like patterns were interpreted as "indexes" allowing the birds to retrieve from memory the locations of stored food.

      The present manuscript proposes a recurrent network model that generates such barcode activity and uses it to form attractor-like memories that bind information about location and food. The manuscript then examines the computational role of barcode activity in the model by simulating two behavioral tasks, and by comparing the model with an alternate model in which barcode activity is ablated.

      Strengths of the study:

      Proposes a potential neural implementation for the indexing theory of episodic memory - Provides a mechanistic model of striking experimental findings: barcode-like, sparse patterns of activity when birds store a grain at a specific location

      A particularly interesting aspect of the model is that it proposes a mechanism for binding discrete events to a continuous spatial map, and demonstrates the computational advantages of this mechanism.

      Weaknesses:

      The relation between the model and experimentally recorded activity needs some clarification

      The relation with indexing theory could be made more clear

      The importance of different modeling ingredients and dynamical mechanisms could be made more clear

      The paper would be strengthened by focusing on the most essential aspects

      Comments:

      The model distinguishes between "barcode activity" and "attractors". Which of the two corresponds to experimentally-recorded barcodes? I would presume the attractors. A potential issue is that the attractors are, as explained in the text (l.137), conjunctions of place activity, barcode activity and "seed" inputs. The fact that the seed activity is shared across attractors seems to imply that they have a non-zero correlation independent of distance. Is that the case in the model? If I understand correctly, Fig 3D shows correlations between an attractor and barcodes at different locations, but correlations between attractors at different locations are not shown. Fig 1 F instead shows that correlations between recorded retrieval activities decay to zero with distance.

      More generally, the fact that the expression "barcode" is apparently used with different meanings in the model and in the experiments is potentially confusing (in the model they correspond to activity generating during caching, and this activity is distinct from the memories; my understanding is that in the experiments barcodes correspond to both caching and retrieval, but perhaps I am mistaken?).

      Our intent is to use the expression “barcode” as similarly as possible between model and experimental work. The reviewer points out that the connection between barcodes in experimental and modeling work is unclear, as well as the relation of “attractors” in our model to previous experimental results. The meaning of ‘barcode’ is absolutely critical—we clarify below our intended meaning, and then describe changes to the manuscript to highlight this.

      In experiments, we observed that activity during caching looked different than ordinary hippocampal activity (i.e. typical “place activity” observed during visits). Empirically there were two major differences. First, there was a pattern of neural activity which was present during every cache . This pattern was also present when birds visually inspected sites containing a cached seed, but not when visually inspecting an empty site. This is what we refer to as “seed activity”. Second, there was a pattern of neural activity which was unique to each cache. This pattern re-occurred during retrieval, and was orthogonal to place activity (see Fig. 1E-F). This is what we refer to as “barcode activity”. In summary, activity during a cache (or retrieval) contains a combination of three components: place activity, seed activity, and barcode activity.

      These experimental findings are recapitulated in our model, as activity during a cache contains a combination of three components: place activity driven by place inputs, seed activity driven by seed inputs, and barcode activity generated by recurrent dynamics. Cache activity in the model corresponds to cache activity in experiments, and barcodes in the model correspond to barcodes in experiments. Our model additionally has “attractors”, meaning that network connectivity changes so that the activity generated during a simulated cache becomes an attractor state of network dynamics. “Attractors” refers to a feature of network dynamics, not a distinct activity state, and we do not yet know if these attractors exist in experimental data.

      Figure 3D, as described in the figure legend, is a correlation of activity during cache and retrieval (in purple), for cache-retrieval pairs at the same or at different sites. We believe this is what the reviewer asks to see: the correlation between attractor states for different cache locations. The reviewer makes an important point: seed activity is shared across all attractors, so then why are correlations not high for all locations? This is because attractors also have a place component, which is anti-correlated for distant locations. This is evident in Fig. 3D by noticing that visit-visit correlations (black line, corresponding to place activity only) are negative for distant locations, and the correlation between attractors (purple line, cache-retrieval pairs) is subtly shifted up relative to the black line (place code only) for these distant locations. The size of this shift is due to the relative magnitude of place and seed inputs. For example, if we increase the strength of the seed input during caching (blue line), we can further increase the correlation between attractors even for quite distant sites:

      Author response image 1.

      To clarify the manuscript, we made the following modifications:

      Section 2.2, first paragraph: We model the hippocampus as a recurrent neural network (RNN) (Alvarez and Squire, 1994; Tsodyks, 1999; Hopfield, 1982) and propose that recurrent dynamics can generate barcodes from place inputs. As in experiments, the model’s population activity during a cache should exhibit both place and barcode activity components.

      Section 2.3, paragraph 3 (starting at “Memory recall in our network…”): As an example, consider a scenario in which an animal has already formed a memory at some location l , resulting in the storage of an attractor \vec{a} into the RNN . The attractor \vec{a} can be thought of as a linear combination of place input-driven activity $p(l)$, seed input-driven activity $s$, and a recurrent-driven barcode component $b$. Later, the animal returns to the same location and attempts recall (i.e. sets r \= 1, Figure 3B). Place inputs for l drive RNN activity towards $p(l)$, which is partially correlated with attractor \vec{a}, and the recurrent dynamics cause network activity to converge onto attractor \vec{a}. In this way, barcode activity $b$ is reactivated as part of attractor \vec{a}, along with the place and seed components stored in the attractor state, $p(l)$ and $s$. The seed input can also affect recall, as discussed in the following section.

      The insights obtained from the network model for the computational role of barcode activity could be explained more clearly. The introduction starts by laying out the indexing theory, which proposes that the hippocampus links an index with each memory so that the memory is reactivated when the index is presented. The experimental paper suggests that the barcode activations play the role of indexes. Yet, in the model reactivations of memories are driven not by presenting bar-code activity, but by presenting place activity (Cache Presence task) or seed activity (Cache Location task). So it seems that either place activity and seed activity play the role of indexes. Section 2.5 nicely shows that ultimately the role of barcode activity is to decorrelate attractors, which seems different from playing the role of indexes. I feel it would be useful that the Discussion reassess more critically the relationship between barcodes, indexing theory, and key-value architectures.

      The reviewer highlights a failure on our part to clearly identify the connection between our findings on barcodes, indexing theory, and key-value architectures. This is another major component of the paper, and below we propose changes to the manuscript to clarify these concepts and their relationships. First, we will summarize the key points that were unclear in our original manuscript.

      The reviewer equates the concept of an ‘index’ with that of a ‘query’: the signal that drives memory reactivation. This may be intuitive, but it is not how a memory index was defined in indexing theory (e.g. Teyler & DiScenna 1986). In indexing theory, the index is a pattern of hippocampal activity that is (a) generated during memory formation, (b) separate from the activity encoding memory content, and (c) linked to memory content via associative plasticity. After memory formation, a memory might be queried by activating a partial set of the memory contents, which would then drive reactivation of the hippocampal index, leading to pattern completion of memory contents. See, for example, figure 1 of Teyler and DiScenna 1986. The ‘index’ is thus not the same as the ‘query’ that drives recall.

      We propose in this work that barcode activity is such an index. Indexing theory originally posited that memory content was encoded by neocortex, and memory index was encoded by hippocampus. However the experiments of Chettih et al. 2024 revealed that the hippocampus contained both memory content and memory index signals, and furthermore there was no division of cells into ‘content’ and ‘index’ subtypes. Thus our model drops the assumption of earlier work that index and content signals correspond to different neurons in different brain areas—a significant advance of our work. Otherwise, the experimentally observed barcodes and the barcodes generated by our computational model play the role of indices as originally defined.

      Our original manuscript was unclear on the relationship of indexing theory and key-value systems. Our work connects diverse areas of memory models, including attractor dynamics, key-value memory systems, and memory indexing. A full account of these literatures and their relationships may be beyond the scope of this manuscript, and we note that a recent review article (Gershman, Fiete, and Irie, 2025) further clarifies the relationship between key-value memory, indexing theory, and the hippocampus. We will cite this work in our discussion as a source for the interested reader.

      Briefly, a key-value memory system distinguishes between the address where a memory is stored, the ‘key’, and the content of that memory, the ‘value’. An advantage of such systems is that keys can be optimized for purposes independent of the value of each memory. The use of barcodes in our model to decorrelate memories is related to this optimization of keys in key-value memory systems. By generating barcodes and adding this to the attractor state corresponding to a cache memory, the ‘address’ of the memory in population activity is differentiated from other memories. Our work is thus consistent with the idea that hippocampus generates keys and implements a key storage system. However it is not so straightforward to equate barcodes with keys, as they are defined in key-value memory. As the reviewer points out, memory recall can be driven by location and seed inputs, i.e. it is content-addressable. We think of the barcode as modifying the memory address to better separate similar memories, without changing memory content, and the resulting memory can be recalled by querying with either content or barcode. Given the complex and speculative nature of these relationships, we prefer to note the salient connection of our work with ongoing efforts applying the key-value framework to biological memory, and leave the precise details of this connection to future work.

      We make the following changes in the manuscript to clarify these ideas:

      Introduction, first paragraph: In this scheme, during memory formation the hippocampus generates an index of population activity, and the neurons representing this index are linked with the neurons representing memory content by associative plasticity . Later, re-experience of partial memory contents may reactivate the index, and reactivation of the index drives complete recall of the memory contents.

      Discussion, 4th paragraph on key-value: Interestingly, prior theoretical work has suggested neural implementations for both key-value memory and attention mechanisms, arguing for their usefulness in neural systems such as long term memory (Kanerva, 1988; Tyulmankov et al., 2021; Bricken and Pehlevan, 2021; Whittington et al., 2021; Kozachkov et al., 2023; Krotov and Hopfield, 2020; Gershman 2025 ). In this framework, the address where a memory is stored (the key) may be optimized independently of the value or content of the memory. In our model, barcodes improve memory performance by providing a content-independent scaffold that binds to memory content, preventing memories with overlapping content from blurring together. Thus barcodes can be considered as a change in memory address, and our model suggests important connections between recurrent neural activity and key generation mechanisms. However we note that barcodes should not be literally equated with keys in key-value systems as our model’s memory is ‘content-addresable’—it can be queried by place and seed inputs.

      The model includes a number of non-standard ingredients. It would be useful to explain which of these ingredients and which of the described mechanisms are essential for the studied phenomenon. In particular:

      - the dynamics in Eq.2 include a shunting inhibition term. Is it essential and why?

      The shunting inhibition is important as it acts to normalize the network activity to prevent runaway excitation. We hope to clarify this further by amending the following sentence in section 2.2: “g (·) is a leak rate that depends on the average activity of the full network, representing a form of global shunting inhibition that normalizes network activity to prevent runaway excitation from recurrent dynamics.”

      - same question for the global inhibition included in the random connectivity;

      The distribution from which connectivity strengths are drawn has a negative mean (global inhibition). This causes activity during caching (i.e. r = 1) to be sparser than activity during visits (i.e. r = 0), and was chosen to match experimental findings. In figures 2B and S2B we show that our model can transition between a mode with place code only, barcode only, or a mode containing both, by changing the variance of the weight distribution while holding the mean constant. We suggest clarifying this by editing the following in section 2.2, paragraph 2: “We initialize the recurrent weights from a random Gaussian distribution, . where 𝑁<sub>𝑋</sub> is the number of RNN neurons and μ < 0, reflecting global subtractive inhibition that encourages sparse network activity to match experimental findings (Chettih et al. 2024).”

      - the model is fully rate-based, but for certain figures, spikes are randomly generated. This seems superfluous.

      Spikes are simulated for one analysis and one visualization, where it is important to consider noise or variability in neural responses across trials. First, for Fig. 2H,J, we generated spikes to allow a visual comparison to figures that can be easily generated from experimental data. Second, and more significantly, for the analysis underlying Fig. 3D, it is essential to simulate variability in neural responses. Because our rate-based models are noiseless, the RNN’s rate vector at site distance = 0 will always be the same and result in a correlation of 1 for both visit-visit and cache-retrieval. However, we show that, if one interprets the rate as a noisy Poisson spiking process, the correlation at site distance = 0 between a cache-retrieval pair is higher than that of two visits. This is because under a Poisson spiking model, the signal-to-noise ratio is higher for cache-retrieval activity, where rates are higher in magnitude. The greater correlation for a cache-retrieval pair at the same site, relative to visits at the same site, is an experimental finding that was critical for our model to reproduce. We detail clarifications to the manuscript below in response to the reviewer’s following and related question.

      How are the correlations determined in the model (e.g., Fig 2 B)? The methods explain that they are computed from Poisson-generated spikes, but over which time period? Presumably during steady-state responses, but are these responses time-averaged?

      The reviewer points out a lack of clarity in our original manuscript. Correlations for events (caches, retrievals and visits) at different sites are calculated in two sections of the paper (2B, 3D), for different purposes and with slight differences in methods:

      - For figure 2B, no spikes are simulated. Note that the methods mentioning poisson spike generation specify only Fig. 2H,J and Fig. 3D. We simply take the network’s rate vector at timestep t=100 (when the decorrelating effect of chaotic dynamics has saturated, S1A-B) and correlate this vector when generated at different locations. We now clarify this in the legend for Figure 2B: “We show correlation of place inputs (gray) and correlation of the RNN's rate vector at t = 100 (black).”

      - For Figure 3D, we want to compare the model to empirical results from Chettih et al. 2024, and reproduced in this paper in Fig. 1E-F. These empirical results are derived from correlating vectors of spiking activity on pairs of single trials, and are thus affected by noise or variability in neural responses as described in our response to the reviewer’s previous question. We thus took the RNN’s rate vector at t=100 and simulated spiking data by drawing samples from a poisson distribution to get spike counts. Our original manuscript was unclear about this, and we suggest the following changes:

      - Legend for Figure 3D: D. Correlation of Poisson-generated spikes simulated from RNN rate vectors at two sites, plotted as a function of the distance between the two sites.

      - Section 2.3, last paragraph: Population activity during retrieval closely matches activity during caching, and is substantially decorrelated from activity during visits (Figure 3C). To compare our model with the empirical results reproduced in Figure 1E,F, we ran in silico experiments with caches and retrievals at varying sites in the circular arena. We simulated Poisson-generated spikes drawn from our network's underlying rates to match the intrinsic variability in empirical data (see Methods).

      - Methods, subsection Spatial correlation of RNN activity for cache-retrieval pairs at different sites: To calculate correlation values as in Figure \ref{fig3}D, we simulated experiments where 5 sites were randomly chosen for caching and retrieval. To compare model results to the empirical data in Fig. 1E,F, which includes intrinsic neural variability, we sampled Poisson-generated spike counts from the rates output by our model. Specifically, for RNN activity \vec{r_i} at location i, using the rates at t=100 as elsewhere, we first generate a sample vector of spikes…

      I was confused by early and late responses in Fig 2 C. The text says that the activity is initialized at zero, so the response at t=0 should be flat (and zero). More generally, I am not sure I understand why the dynamics matter for the phenomenon at all, presumably the decorrelation shown in Fig 2B depends only on steady state activity (cf previous question).

      Thanks for catching this mistake. The legend has been updated to indicate that the ‘early’ response is actually at t=1, when network activity reflects place inputs without the effects of dynamics. The reviewer is correct that we are primarily interested in the ‘late’ response of the network. All other results in the paper use this late response at t=100. As shown in Fig. S2A,B, this timepoint is not truly a steady state, as activity in the network continues to change, but the decorrelation of network activity with place-driven activity has saturated.

      We include the early response in Fig. 2C for visual comparison of the purely place-driven early activity with the eventual network response. It is also relevant since, as the reviewer points out above, there is a shunting inhibition term in the dynamics that is present during both low and high recurrent strength simulations.

      Related to the previous point, the discussion of decorrelation (l.79 - 97) is somewhat confusing. That paragraph focuses on chaotic activity, but chaos decorrelates responses across different time points. Here the main phenomenon is the decorrelation of responses across different spatial inputs (Fig 2B). This decorrelation is presumably due to the fact that different inputs lead to different non-trivial steady-state responses, but this requires some clarification. If that is correct, the temporal chaos adds fluctuations around these non-trivial steady-state responses, but that alone would not lead to the decorrelation shown in Fig 2B.

      We agree with the reviewer that chaotic activity produces a decorrelation across time points. Because of chaotic dynamics, network activity does not settle into a trivial steady-state, and instead evolves from the initial state in an unpredictable way. The network does not settle into a steady-state pattern, but both the decorrelation of network state with initial state and the rate of change in the network state saturate after ~t=25 timesteps, as shown in Fig. S2A-B.

      The initial activity for nearby states is similar, due to them receiving similar place inputs.

      Because network activity is chaotically decorrelated from this initial state by temporal dynamics, ‘late stage’ network activity between nearby spatial states is less correlated than ‘early stage’ activity. Thus the temporal decorrelation produces a spatial decorrelation. We believe that the changes we have introduced to the manuscript in revision will make this point clearer in our resubmission.

      A key ingredient of the model is that the recurrent interactions are switched on and off between "caching" and "visits". The discussion argues that a possible mechanism for this is recurrent inhibition (l.320), which would need to be added. However two forms of inhibition are already included in the model. The text also says that it is unclear how units in the model should be mapped onto E and I neurons. However the model makes explicit assumptions about this, in particular by generating spikes from individual neurons. Altogether, I did not find that part of the Discussion convincing.

      We agree with the reviewer that this section is a limitation of our current work, and in fact it is an ongoing area of future research. However we think the advances in this current work warrant publication despite this topic requiring further research. We attempted to discuss this limitation explicitly, and note that the other reviewer pointed this section out as particularly helpful. We do not think it is problematic for a realistic model of the brain to ultimately include 3, or even more forms of inhibition. We do not think that poisson-generated spikes commit us to interpreting network units as single neurons. Spikes are not a core part of our model’s mechanism, and were used only as a mechanism of introducing variability on top of deterministic rates for specific analyses. Furthermore one could still view network units as pools of both E and I spiking neurons. We would welcome further recommendations the reviewer believes are important to note in this section on our model’s limitations.

      On lines 117-120 the text briefly mentions an alternate feed-forward model and promptly discards it. The discussion instead says that a "separate possibility is that barcodes are generated in a circuit upstream of where memories are stored, and supplied as inputs to the hippocampal population", and that this possibility would lead to identical conclusions. The two statements seem a bit contradictory. It seems that the alternative possibility would replace the need for switching on and off recurrent interactions, with a mechanism where barcode inputs are switched on and off. This alternate scenario is perhaps more plausible, so it would be useful to discuss it more explicitly.

      We apologize for the confusion here, which seems to be due to our phrasing in the discussion section. We do reject the idea that a simple feed-forward model could generate the spatial correlation profile observed in data, as mentioned in the text and included as Fig. S2. Our statement in the discussion may have seemed contradictory because here we intended to discuss the possibility that an upstream area generates barcodes, for example by the chaotic recurrent dynamics proposed in our work, while a downstream network receives these barcodes as inputs and undergoes plasticity to store memories as attractors. We did not intend to suggest any connection to the feedforward model of barcode generation, and apologize for the confusion. Our claim that this ‘2 network’ solution would lead to similar conclusions is because the upstream network would need an efficient means of barcode generation, and the downstream network would need an efficient means of storing memory attractors, and separating these functions into different networks is not likely to affect for example the advantage of partially decorrelating memory attractors. Moreover, the downstream network would still require some form of recurrent gating, so that during visits it exhibits place activity without activating stored memory attractors!

      We thus chose a 1 network instead of a 2 network solution because it was simpler and, we believe, more interesting. It is challenging in the absence of more data to say which is more plausible, thus we wanted to mention the possibility of a 2 network solution. We suggest the following changes to the manuscript:

      - Discussion, 3rd paragraph: “Alternatively, other mechanisms may be involved in generating barcodes. We demonstrated that conventional feed-forward sparsification (Babadi and Sompolinsky, 2014; Xie et al., 2023) was highly inefficient, but more specialized computations may improve this (Földiak, 1990; Olshausen and Field, 1996; Sacouto and Wichert, 2023; Muscinelli et al., 2023). Another possibility is that barcodes are generated in a separate recurrent network upstream of the recurrent network where memories are stored. In this 2-network scenario, the downstream network receives both spatial tuning and barcodes as inputs. This would not obviate the need for modulating recurrent strength in the downstream network to switch between input-driven modes and attractor dynamics. We suspect separating barcode generation and memory storage in separate networks would not fundamentally affect our conclusions.”

      As a minor note, the beginning of the discussion states that the presented model is similar to previous recurrent network models of the hippocampus. It would be worth noting that several of the cited works assign a very different role to recurrent interactions: they generate place cell activity, while the present model assumes it is inherited from upstream inputs.

      We are not sure how best to modify the paper to address this suggestion. As far as we know, all of the cited models which deal with spatial encoding do assume that the hippocampus receives a spatially-modulated or spatially-tuned input. For example, the Tsodyks 1999 paper cited in this paragraph uses exponentially-decaying place inputs to each neuron highly similar to our model. Furthermore we explore how our model would perform if we change the format of spatial inputs in Fig. S4, and find key results are unchanged. It is unclear how hippocampal place fields could emerge without inputs that differentiate between spatial locations. We think it is appropriate to highlight the similarity of our model to well known hopfield-type recurrent models, where memories are stored as attractor states of the network dynamics.

      On the other hand, we agree that a common line of hippocampal modeling proposes that recurrent interactions reshape spatial inputs to produce place fields. This often arises in the context of hippocampus generating a predictive map, where inputs may be one-hot for a single spatial state, in a grid cell-like format, or a random projection of sensory features. We attempted to address this in section 2.6, using a model which superimposes the random connectivity needed for barcode generation with the structured connectivity needed for predictive map formation. We found that such a model was able to perform both predictive and barcode functions, suggesting a path forward to connecting different lines of hippocampal modeling in future work.

    1. eLife Assessment

      This paper presents fundamental research showing that the acquisition and expression of Pavlovian conditioned responding are lawfully related to temporal characteristics of an animal's conditioning experience. It showcases a rigorous experimental design, several different approaches to data analysis, careful consideration of prior literature, and a thorough introduction. The evidence supporting the conclusions is compelling. The paper will have a general appeal to those interested in the behavioral and neural analysis of Pavlovian conditioning.

    2. Reviewer #2 (Public review):

      A long-standing debate in the field of Pavlovian learning relates to the phenomenon of timescale invariance in learning i.e. that the rate at which an animal learns about a Pavlovian CS is driven by the relative rate of reinforcement of the cue (CS) to the background rate of reinforcement. In practice, if a CS is reinforced on every trial, then the rate of acquisition is determined by the relative duration of the CS (T) and the ITI (C = inter-US-interval = duration of CS + ITI), specifically the ratio of C/T. Therefore, the point of acquisition should be the same with a 10s CS and a 90s ITI (T = 10; C = 90 + 10 = 100, C/T = 100/10 = 10) and with a 100s CS and a 900s ITI (T = 100; C = 900 + 100 = 1000, C/T = 1000/100 = 10). That is to say, the rate of acquisition is invariant to the absolute timescale as long as this ratio is the same. This idea has many other consequences, but is also notably different from more popular prediction-error based associative learning models such as the Rescorla-Wagner model. The initial demonstrations that the ratio C/T predicts the point of acquisition across a wide range of parameters (both within and across multiple studies) was conducted in Pigeons using a Pavlovian autoshaping procedure. What has remained under contention is whether or not this relationship holds across species, particularly in the standard appetitive Pavlovian conditioning paradigms used in rodents. The results from rodent studies aimed at testing this have been mixed, and often the debate around the source of these inconsistent results focuses on the different statistical methods used to identify the point of acquisition for the highly variable trial-by-trial responses at the level of individual animals.

      The authors successfully replicate the same effect found in pigeon autoshaping paradigms decades ago (with almost identical model parameters) in a standard Pavlovian appetitive paradigm in rats. They achieve this through a clever change the experimental design, using a convincingly wide range of parameters across 14 groups of rats, and by a thorough and meticulous analysis of these data. It is also interesting to note that the two authors have published on opposing sides of this debate for many years, and as a result have developed and refined many of the ideas in this manuscript through this process.

      Main findings

      (1) The present findings demonstrate that the point of initial acquisition of responding is predicted by the C/T ratio.

      (2) The terminal rates of responding to the CS appear to be related to the reinforcement rate of the CS (T; specifically, 1/T) but not its relation to the reinforcement rate of the context (i.e. C or C/T). In the present experiment, all CS trials were reinforced so it is also the case that the terminal rate of responding was related to the duration of the CS.

      (3) An unexpected finding was that responding during the ITI was similarly related to the rate of contextual reinforcement (1/C). This novel finding suggests that the terminal rate of responding during the ITI and the CS are related to their corresponding rates of reinforcement. This finding is surprising as it suggests that responding during the ITI is not being driven by the probability of reinforcement during the ITI.

      (4) Finally, the authors characterised the nature of increased responding from the point of initial acquisition until responding peaks at a maximum. Their analyses suggest that nature of this increase was best described as linear in the majority of rats, as opposed to the non-linear increase that might be predicted by prediction error learning models (e.g. Rescorla-Wagner). However, more detailed analyses revealed that these changes can be quite variable across rats, and more variable when the CS had lower informativeness (defined as C/T).

      Strengths and Weaknesses:

      There is an inherent paradox regarding the consistency of the acquisition data from Gibbon & Balsam's (1981) meta-analysis of autoshaping in pigeons, and the present results in magazine response frequency in rats. This consistency is remarkable and impressive, and is suggestive of a relatively conserved or similar underlying learning principle. However, the consistency is also surprising given some significant differences in how these experiments were run. Some of these differences might reasonably be expected to lead to differences in how these different species respond. For example:

      The autoshaping procedure commonly used in the pigeons from these data were pretrained to retrieve rewards from a grain hopper with an instrumental contingency between head entry into the hopper and grain availability. During Pavlovian training, pecking the key light also elicited an auditory click feedback stimulus, and when the grain hopper was made available, the hopper was also illuminated.

      In the present experimental procedure, the rats were not given contextual exposure to the pellet reinforcers in the magazine (e.g. a magazine training session is typically found in similar rodent procedures). The Pavlovian CS was a cue light within the magazine itself.

      These design features in the present rodent experiment are clearly intentional. Pretraining with the reinforcer in the testing chambers would reasonably alter the background rate of reinforcement (parameter), so it make sense not to include this but differs from the paradigm used in pigeons. Having the CS inside the magazine where pellets are delivered provides an effective way to reduce any potential response competition between CS and US directed responding and combines these all into the same physical response. This makes the magazine approach response more like the pecking of the light stimulus in the pigeon autoshaping paradigm. However, the location of the CS and US is separated in pigeon autoshaping, raising questions about why the findings across species are consistent despite these differences.

      Intriguingly, when the insertion of a lever is used as a Pavlovian cue in rodent studies, CS directed responding (sign-tracking) often develops over training such that eventually all animals bias their responding towards the lever than towards the US (goal-tracking at the magazine). However, the nature of this shift highlights the important point that these CS and US directed responses can be quite distinct physically as well as psychologically. Therefore, by conflating the development of these different forms of responding, it is not clear whether the relationship between C/T and the acquisition of responding describes the sum of all Pavlovian responding or predominantly CS or US directed responding.

      Another interesting aspect of these findings is that there is a large amount of variability that scales inversely with C/T. A potential account of the source of this variability is related to the absence of preexposure to the reward pellets. This is normally done within the animals' homecage as a form of preexposure to reduce neophobia. If some rats take longer to notice and then approach and finally consume the reward pellets in the magazine, the impact of this would systematically differ depending on the length of the ITI. For animals presented with relatively short CSs and ITIs, they may essentially miss the first couple of trials and/or attribute uneaten pellets accumulating in the magazine to the background/contextual rate of reinforcement. What is not currently clear is whether this was accounted for in some way by confirming when the rats first started retrieving and consuming the rewards from the magazine.

      While the generality of these findings across species is impressive, the very specific set of parameters employed to generate these data raise questions about the generality of these findings across other standard Pavlovian conditioning parameters. While this is obviously beyond the scope of the present experiment, it is important to consider that the present study explored a situation with 100% reinforcement on every trial, with a variable duration CS (drawn form a uniform distribution), with a single relatively brief CS (maximum of 122s) CS and a single US. Again, the choice of these parameters in the present experiment is appropriate and very deliberately based on refinements from many previous studies from the authors. This includes a number of criteria used to define magazine response frequency which includes discarding specific responses (discussed and reasonably justified clearly in the methods section). Similarly, the finding that terminal rates of responding are reliably related to 1/T is surprising, and it is not clear whether this might be a property specific to this form of variable duration CS, the use of a uniform sampling distribution, or the use of only a single CS. However, it is important to keeps these limitations in mind when considering some of the claims made in the discussion section of this manuscript that go beyond what these data can support.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Conceptually, I feel that the authors addressed many concerns. However, I am still not convinced that their data support the strength of their claims. Additionally, I spent considerable time investigating the now freely available code and data and found several inconsistencies that would be critical to rectify. My comments are split into two parts, reflecting concerns related to the responses/methods and concerns resulting from investigation of the provided code/data. The former is described in the public review above. Because I show several figures to illustrate some key points for the latter part, an attached file will provide the second part: https://elife-rp.msubmit.net/elife-rp_files/2025/02/24/00136468/01/136468_1_attach_15_2451_convrt.pdf

      (1) This point is discussed in more detail in the attached file, but there are some important details regarding the identification of the learned trial that require more clarification. For instance, isn’t the original criterion by Gibbon et al. (1977) the first “sequence of three out of four trials in a row with at least one response”? The authors’ provided code for the Wilcoxon signed rank test and nDkl thresholds looks for a permanent exceeding of the threshold. So, I am not yet convinced that the approaches used here and in prior papers are directly comparable.

      We agree that there remain unresolved issues with our two attempts to create criteria that match that used by Gibbon and Balsam for trials to criterion. Therefore, we have decided to remove those analyses and return to our original approach showing trials to acquisition using several different criteria so as to demonstrate that the essential feature of the results—the scaling between learning rate and information—is robust. Figure 2A shows the results for a criterion that identifies the trial after which the cumulative response rate during the CS (=cumulative CS response count from Trial 1 divided by cumulative CS time from Trial 1) is consistently above the cumulative overall response rate across the trial (i.e., including both the CS and ITI). These data compare the CS response rate with the overall response rate, rather than with ITI rate as done in the previous version (in Figure 3A of that submission), to be consistent with the subsequent comparisons that are made using the nDkl. (The nDkl relies on the comparison between the CS rate and the overall rate, rather than between the CS and ITI rates.) Figures 2B and 2C show trials to acquisition when two statistical criteria, based on the nDkl, are applied to the difference between CS and overall response rates (the criteria are for odds >= 4:1 and p<.05). As we now explain in the text, a statistical threshold is useful inasmuch as it provides some confidence to the claim that the animals had learned by a given trial. However, this trial is very likely to be after the point when they had learned because accumulating statistical evidence of a difference necessarily adds trials.

      Also, there’s still no regression line fitted to their data (Fig 3’s black line is from Fig 1,according to the legends). Accordingly, I think the claim in the second paragraph of the Discussion that the old data and their data are explained by a model with “essentially the same parameter value” is not yet convincing without actually reporting the parameters of the regression. Related to this, the regression for their data based on my analysis appears to have a slope closer to -0.6, which does not support strict timescale invariance. I think that this point should be discussed as a caveat in the manuscript.

      We now include regression lines fitted to our data in Figures 2A-C, and their slopes are reported in the figure note. We also note on page 14 of the revision that these regressions fitted to our data diverge from the black regression line (slope -1) as the informativeness increases. On pages 14-15, we offer an explanation for this divergence; that, in groups with high informativeness, the effective informativeness is likely to be lower than the assigned value because the rats had not been magazine trained which means they would not have discovered the food pellet as soon as it was released on the first few trials. On pages 15-16, we go on to note that evidence for a change in response rate during the CS in those very first few trials may have been missed because the initial response rates were very low in rats trained with very long inter-reinforcement intervals (and thus high informativeness). We also propose a solution to this problem of comparing between very low response rates, one that uses the nDkl to parse response rates into segments (clusters of trials with equivalent response rates). This analysis with parsed response rates provides evidence that differential responding to the CS may have been acquired earlier than is revealed using trial-by-trial comparisons.

      (2) The authors report in the response that the basis for the apparent gradual/multiple step-like increases after initial learning remains unclear within their framework. This would be important to point out in the actual manuscript Further, the responses indicating the fact that there are some phenomena that are not captured by the current model would be important to state in the manuscript itself.

      We have included a paragraph (on page 26) that discusses the interpretation of the steady/multi-step increase in responding across continued training.

      (3) There are several mismatches between results shown in figures and those produced by the authors’ code, or other supplementary files. As one example, rat 3 results in Fig 11 and Supplementary Materials don’t match and neither version is reproduced by the authors’ code. There are more concerns like this, which are detailed in the attached review file.

      Addressed next….

      The following is the response to the points raised in Part 2 of Reviewer 1’s pdf.

      (1a) I plotted the calculated nDkl with the provided code for rat 3 (Fig 11), but itlooks different, and the trials to acquisition also didn’t match with the table  provided (average of ~20 trial difference). The authors should revise the provided code and plots. Further, even in their provided figures, if one compares rat 3 in Supplementary Materials to data from the same rat in Fig 11, the curves are different. It is critical to have reproducible results in the manuscript, including the ability to reproduce with the provided code.

      We apologise for those inconsistencies. We have checked the code and the data in the figures to ensure they are all now consistent and match the full data in the nHT.mat file in OSF. Figures 11 and 12 from the previous version are now replaced with Figure 6 in the revised manuscript (still showing data from Rats 3 and 176). The data plotted in Fig 6 match what is plotted in the supplementary figures for those 2 rats (but with slightly different cropping of the x-axes) and all plots draw directly from nHT.mat.

      (1b) I tried to replicate also Fig 3C with the results from the provided code, but I failed especially for nDkl > 2.2. Fig 3A and B look to be OK.

      There was error in the previous Fig 3C which was plotting the data from the wrong column of the Trials2Acquisition Table. We suspect this arose because some changes to the file were not updated in Dropbox. However, that figure has changed (now Figure 2) as already mentioned, and no longer plots data obtained with that specific nDkl criterion. The figure now shows criteria that do not attempt to match the Gibbon and Balsam criterion.

      (1c) The trials to learn from the code do match with those in the  Trials2Acquisition Table, but the authors’ code doesn’t reproduce the reported trials to learn values in the nDkl Acquisition Table. The trials to learn from the code are ~20 trials different on average from the table’s ones, for 1:20, 1:100, and 1:1000 nDkl.

      We agree that discrepancies between those different files were a source of potential confusion because they were using different criteria or different ways of measuring response rate (i.e., the “conventional” calculation of rate as number of responses/time, vs our adjusted calculation in which the 1<sup>st</sup> response in the CS was excluded as well as the time spent in the magazine, vs parsed response rates based on inter-response intervals). To avoid this, there is now a single table called Acquisition_Table.xlsx in OSF that includes Trials to acquisition for each rat based on a range of criteria or estimates of response rate in labelled columns. The data shown in Figure 2 are all based on the conventional calculation of response rate (provided in Columns E to H of Acquisition_Table.xlsx). To make the source of these data explicit, we have provided in OSF the matlab code that draws the data from the nHT.mat file to obtain these values for trials-to-acquisition.

      (1d) The nDkl Acquisition Table has columns with the value of the nDkl statistics at various acquisition landmarks, but the value does not look to be true, especially for rat 19. The nDkl curve provided by the authors (Supplementary Materials) doesn’t match the values in the table. The curve is below 10 until at least 300 trials, while the table reports a value higher than 20 (24.86) at the earliest evidence of learning (~120 trials?).

      We are very grateful to the reviewer for finding this discrepancy in our previous files. The individual plots in the Supplementary Materials now contain a plot of the nDkl computed using the conventional calculation of response rate (plot 3 in each 6-panel figure) and a plot of the nDkl computed using the new adjusted calculation of response rate (plot 4). These correspond to the signed nDkl columns for each rat in the full data file nHT.mat. The nDkl values at different acquisition landmarks included in Acquisition_Table.xlsx (Cols AB to AF) correspond to the second of these nDkl formulations. We point out that, of the acquisition landmarks based on the conventional calculation of response rate (Cols E to J of Acquisition_Tabls.xlsx), only the first two landmarks (CSrate>Contextrate and min_nDkl) match the permanently positive and minimum values of the plotted nDkl values. This is because the subsequent acquisition landmarks are based on a recalculation of the nDkl starting from the trial when CSrate>ContextRate, whereas the plotted nDkl starts from Trial 1.

      (2) The cumulative number of responses during the trial (Total) in the raw data table is not measured directly, but indirectly estimated from the pre-CS period, as (cumNR_Pre*[cumITI/cumT_Pre])+ cumNR_CS (cumNR_Pre: cumulative nose-poke response number during pre-CS period; cumITI: cumulative sum of ITI duration; cumT_Pre: cumulative pre-CS duration; cumNR_CS: cumulative response number during CS), according to ‘Explanation of TbyTdataTable (MATLAB).docx’.Why not use the actual cumulative responses during the whole trial instead of using a noisier measure during a smaller time window and then scaling it for the total period?

      Unfortunately, the bespoke software used to control the experimental events and record the magazine activity did not record data continuously throughout the experiment. The ITI responses were only sampled during a specified time-window (the “pre-CS” period) immediately before each CS onset. Therefore, response counts across the whole ITI had to be extrapolated.

      (3) Regarding the “Matlab code for Find Trials to Criterion.docx”:

      (a) What’s the rationale for not using all the trials to calculate nDkl but starting the cumulative summation from the earliest evidence trial (truncated)? Also, this procedure is not described in the manuscript, and this should be mentioned.

      The procedure was perhaps not described clearly enough in the previous manuscript. We have expanded that text to make it clearer (page 12) which includes the text…

      “We started from this trial, rather than from Trial 1, because response rate data from trials prior to the point of acquisition would dilute the evidence for a statistically significant difference in responding once it had emerged, and thereby increase the number of trials required to observe significant responding to the CS. The data from Rat 1 illustrates this point. The CS response rate of Rat 1 permanently exceeded its overall response rate on Trial 52 (when the nD<sub>KL</sub> also became permanently positive). The nD<sub>KL</sub>, calculated from that trial onwards, surpassed 0.82 (odds 4:1) after a further 11 trials (on Trial 63) and reached 1.92 (p < .05) on Trial 81. By contrast, the nD<sub>KL</sub> for this rat, calculated from Trial 1, did not permanently exceed 0.82 until Trial 83 and did not exceed 1.92 until Trial 93, adding 10 or 20 trials to the point of acquisition.”

      (3b) The authors' threshold is the trial when the nDkl value exceeds the threshold permanently.  What about using just the first pass after the minimum?

      Rat 19 provides one example where the nDkl was initially positive, and even exceeded threshold for odds 4:1 and p<.05, but was followed by an extended period when the nDkl was negative because the CS response rate was less than the overall response rate. It illustrates why the first trial on which the nDkl passes a threshold cannot be used as a reliably index of acquisition.

      (3c) Can the authors explain why a value of 0.5 is added to the cumulative response number before dividing it by the cumulative time?

      This was done to provide an “unbiased” estimate of the response count because responses are integers. For example, if a rat has made 10 responses over 100 s of cumulative CS time, the estimated rate should be at least 10/100 but could be anything up to, but not including, 11/100. A rate of 10.5/100 is the unbiased estimate. However, we have now removed this step when calculating the nDkl to identify trials to acquisition because we recognise that it would represent a larger correction to the rate calculated across short intervals than across long intervals and therefore bias comparison between CS and overall response rates that involve very different time durations. As such, the correction would artefactually inflate evidence that the CS response rate was higher than the contextual response rate. However, as noted earlier in this reply, we have now instituted a similar correction when calculating the pre-CS response rate over the final 5 sessions for rats that did not register a single response (hence we set their response count to 0.5).

      (3d) Although the authors explain that nDkl was set to negative if pre-CS rate is higher than CS rate, this is not included in the code because the code calculates the nDkl using the truncated version, starting to accumulate the poke numbers and time from the earliest evidence, thus cumulative CS rate is always higher than cumulative contextual rate. I expect then that the cumulative CS rate will be always higher than the cumulative pre-CS rate.

      Yes, that is correct. The negative sign is added to the nDkl when it is computed starting from Trial 1. But when it is computed starting from the trial when the CS rate is permanently > the overall rate, there is no need to add a sign because the divergence is always in the positive direction.

      (3e) Regarding the Wilcoxon signed rank test, please clarify in the manuscript that the input ‘rate’ is not the cumulative rate as used for the earliest evidence. Please also clarify if the rates being compared for the signed nDkl are just the instantaneous rates or the cumulative ones. I believe that these are the ‘cumulative’ ones (not as for Wilcoxon signed rank test), because if not, the signed nDkl curve of rat 3 would fluctuate a lot across the x-axis.

      The reviewer is correct in both cases. However, as already mentioned, we have removed the analysis involving the Wilcoxon test. The description of the nDkl already specifies that this was done using the cumulative rates.

      (4) Supplemental table ‘nDkl Acquisition Table.xlsx’ 3rd column (“Earliest”) descriptions are unclear.

      (a) It is described in the supplemental ‘Explanation of Excel Tables.docx’ as the ‘earliest estimate of the onset of a poke rate during the CSs higher than the contextual poke rate’, while the last paragraph of the manuscript’s method section says ‘Columns 4, 5 and 6 of the table give the trial after which conditioned responding appeared as estimated in the above described three different ways— by the location of the minimum in the nDkl, the last upward 0 crossings, and the CS parse consistently greater than the ITI parse, respectively. Column 3 in that table gives the minimum of the three estimates.’ I plotted the data from column 3 (right) and comparing them with Fig 3A (left) makes it clear that there’s an issue in this column. If the description in the ‘Explanation of Excel Tables.docx’ is incorrect, please update it.

      We agree that the naming of these criteria can cause confusion, hence we have changed them. On page 9 we have replaced “earliest” with “first” in describing the criterion plotted in Figure 2A showing the trial starting from which the cumulative CS response rate permanently exceeded the cumulative overall rate. What is labelled as “Earliest” in “Acquisition_Table.xlsx” is, as the explanation says, the minimum value across the 3 estimates in that table.

      (b) Also, the term ‘contextual poke rate’ in the 3rd column’s description isconfusing as in the nDkl calculation it represents the poke rate during all the training time, while in the first paragraph of the ‘Data analysis’ part, the earliest evidence is calculated by comparing the ITI (pre-CS baseline) poke rate.

      Yes, we have kept the term “contextual” response rate to refer to responding across the whole training interval (the ITI and the CS duration). This is used in calculation of the nDkl. For consistency with this comparison, we now take the first estimate of acquisition (in Fig 2A) based on a comparison between the CS rate and the overall (context) rate (not the pre-CS rate).

      Reviewer #2 (Recommendations for the authors):

      In response to the Rebuttal comments:

      Analytical (1) relating to Figure 3C/D

      This is a reasonable set of alternative analyses, but it is not clear that it answers the original comment regarding why the fit was worse when using a theoretically derived measure. Indeed, Figure 3C now looks distinctly different to the original Gibbon and Balsam data in terms of the shape of the relationship (specifically, the Group Median - filled orange circles) diverge from the black regression line.

      As mentioned in response to Reviewer 1, there was a mistake in Figure 3C of the revised manuscript. The figure was actually plotting data using a more stringent criterion of nDkl > 5.4, corresponding to p<0.001. The figure was referencing the data in column J of the public Trials2Acquisition Table. The data previously plotted in Figure 3C are no longer plotted because we no longer attempt to identify a criterion exactly matching that used by Gibbon and Balsam.

      We agree that the data shown in the first 3 panels of Figure 2 do diverge somewhat from the black regression line at the highest levels of informativeness (C/T ratios > 70), and the regression lines fitted to the data have slopes greater than -1. We acknowledge this on page 14 of the revised manuscript. Since Gibbon and Balsam did not report data from groups with such high ratios, we can’t know whether their data too would have diverged from the regression line at this point. We now report in the text a regression fitted to the first 10 groups in our experiment, which have C/T ratios that coincide with those of Gibbon and Balsam, and those regression lines do have slopes much closer to -1 (and include -1 in the 95% confidence intervals). We believe the divergence in our data at the high C/T ratios may be due to the fact that our rats were not given magazine training before commencing training with the CS and food. Because of this, it is quite likely that many rats did not find the food immediately after delivery on the first few trials. Indeed, in subsequent experiments, when we have continued to record magazine entries after CS-offset, we have found that rats can take 90 s or more to enter the magazine after the first pellet delivery. This delay would substantially increase the effective CS-US interval, measured from CS onset to discovery of the food pellet by the rat, making the CS much less informative over those trials. We now make this point on pages 14-15 of the revised manuscript.

      Analytical (2)

      We may have very different views on the statistical and scientific approaches here.

      This scalar relationship may only be uniquely applicable to the specific parameters of an experiment where CS and US responding are measured with the same behavioral response (magazine entry). As such, statements regarding the simplicity of the number of parameters in the model may simply reflect the niche experimental conditions required to generate data to fit the original hypotheses.

      To the extent that our data are consistent with the data reported decades ago by Gibbon and Balsam indicates the scalar relationship they identified is not unique to certain niche conditions since those special conditions must be true of both the acquisition of sign-tracking responses in pigeons and magazine entry responses in rats. How broadly it applies will require further experimental work using different paradigms and different species to assess how the rate of acquisition is affected across a wide range of informativeness, just as we have done here.

    1. eLife Assessment

      The study presents valuable findings of an optimized E. coli cell-free protein synthesis (eCFPS) system that has been simplified by reducing the number of core components from 35 to 7; furthermore, the findings communicate a simplified 'fast lysate' preparation that eliminates the need for traditional runoff and dialysis steps. This study is an advance towards simplifying protein expression workflows, and the evidence provided is solid, starting with nanoluc, a protein that expresses readily in many systems, to applications to more challenging proteins like the functional self-assembling vimentin and the active restriction endonuclease Bsal. Data on the underlying mechanisms and efficiency of the presented system in terms of protein yield relative to other known cell-free systems would greatly enhance the findings' significance and the strength of the evidence. The paper remains of interest to scientists in microbiology, biotechnology and protein synthesis.

    2. Reviewer #1 (Public review):

      Summary:

      The authors presented a simplified E. coli cell-free protein synthesis (eCFPS) system that reduces core reaction components from 35 to 7, improving protein expression levels. They also presented a "fast lysate" protocol that simplifies extract preparation, enhancing accessibility and robustness for diverse applications.

      Strengths:

      The authors present a valuable new protocol for eCFPS, which simplifies its application.

      Weaknesses:

      The authors only provided the data for optimization, leaving the underlying mechanism that explains the phenomena unexplained.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have made a convincing argument that the current system of in vitro translation using E. coli extracts can be significantly optimized to work with much lesser components, while maintaining activity. They have showcased their improved activity using not only physical but also functional readouts.

      Strengths:

      The experiments are designed in a very logical and easy-to-understand manner, which makes it easier not only to follow the paper but also to reproduce the results. Functional assays with the synthesized proteins are a good way to demonstrate functionality and applicability of the system.

      Weaknesses:

      The production of the lysate requires special instrumentation, limiting accessibility. While the strengths of the study are well-emphasized, the limitations are not mentioned. Representation of some experiments could be done in a more complete manner.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aimed to overcome the challenges associated with complex, conventional prokaryotic cell-free protein synthesis (CFPS) systems, which require up to thirty-five components, by developing a streamlined and efficient E. coli CFPS platform to encourage broader adoption. The main objective was to reduce the number of reaction components from thirty-five to seven, while also developing an accessible 'fast lysate' preparation protocol that eliminates time-consuming runoff and dialysis steps. The authors also sought to demonstrate the robustness and translational quality of this streamlined system by efficiently synthesising challenging functional proteins, including the cytotoxic restriction endonuclease BsaI and the self-assembling intermediate filament protein vimentin.

      Strengths:

      This study presents several key strengths of the optimised E. coli cell-free protein synthesis system in terms of its design, performance and accessibility.

      (1) The reaction mixture has been dramatically simplified, with the number of essential core components successfully reduced from up to thirty-five in conventional systems to just seven.

      (2) The "fast lysate" protocol is a significant advance in terms of procedure.

      (3) The system's ability to synthesise challenging, functional proteins is evidence of its robustness.

      Weaknesses:

      (1) Title: "A simplified and highly efficient cell-free protein synthesis system for prokaryotes".

      (a) This title is misleading since one would expect a simplified and highly efficient cell-free protein synthesis system to yield similar protein levels compared to current cell-free protein synthesis systems. What this study shows is that the composition of cell-free protein synthesis systems can be simplified while maintaining a certain level of protein synthesis. Here, optimisation does not involve maintaining protein synthesis yield while simplifying the cell-free protein synthesis system; rather, it involves developing a simplified cell-free protein synthesis system. As mentioned in my comments below, this study lacks a comparison of protein levels with a typical cell-free protein synthesis system.

      (b) What do the authors mean by "highly efficient"? Highly efficient compared to what experimental conditions? If one is interested in the yield of protein synthesis, is this simplified system highly efficient compared to current systems?

      (2) Figures 1, 3-5 :

      (a) What do relative luciferase units represent? How are these units calculated?

      (b) In this system, the level of expression depends mainly on the level of NLuc transcripts and the efficiency of NLuc translation. How did the authors ensure that the chemical composition of the different eCFPS buffers only affected protein translation and not transcript levels? In other words, are luciferase units solely an indicator of protein synthesis efficiency, or do they also depend on transcription efficiency, which could vary depending on the experimental conditions?

      (c) How long were the eCFPS reactions allowed to proceed before performing the luciferase activity measurement? Depending on the reaction time, the absence or presence of certain compounds may or may not impact NLuc expression. For example, it can be assumed that tRNA does not significantly affect NLuc levels over a short period of time, and that endogenous tRNA in the lysate is present at sufficient concentrations. However, over a longer period of time, the addition of tRNA could be essential to achieve optimal NLuc levels.

      (d) The authors show that tRNA and amino acids are not strictly essential for the expression of NLuc, likely due to residual amounts within the cell lysate. However, are the protein levels achieved without added amino acids and tRNA sufficient for biochemical assays that require a certain amount of protein? It is important to note that the focus here is on optimising the simplicity of the buffer rather than the level of protein expression. In fact, the simplicity of the buffer is prioritised over the amount of protein produced. This should be made clear.

      (e) How would the NLuc level compare if all the components were optimised individually and present in an optimised buffer, compared to a buffer optimised for simplicity as described by the authors?

      (3) Line 71, Streamlining eCFPS: removal of dispensable components. This title is misleading because it creates the false impression that proteins can be produced in vitro without the addition of certain compounds. While this is true, the level of protein produced may not be sufficient for subsequent biochemical analyses. This should be made clear.

      (4) Figure 2: In the legend, "(A) Protein expression levels of the eCFPS system measured at varying concentrations of KGlu and MgGlu2" would be more accurate if changed to "(A) Protein expression levels of the eCFPS system using an Nanoluciferase (NLuc) reporter DNA measured at varying concentrations of KGlu and MgGlu2".

      (5) Lanes 302-303: "The thorough optimization of the seven core components was a critical step in achieving high protein expression levels". What are "high expression levels"? Compared to what?

    5. Author response:

      Thank you for overseeing the review of our manuscript and for providing the eLife Assessment and Public Reviews. We are highly appreciative of the detailed, constructive feedback from the editors and reviewers.

      We acknowledge the core issues raised and we are committed to undertaking the necessary experiments and textual revisions to address every critique.

      Here is a summary of the key revisions we plan to undertake to address the major points raised:

      (1) Absolute yield comparison and efficiency clarification (eLife Assessment, R#3)

      We will perform new quantitative experiments to provide the absolute protein yield of our optimized eCFPS system and benchmark it against a published, widely recognized high-yield CFPS protocol. This will directly address the central requirement for industry comparison and strengthen the claim of "high efficiency." Furthermore, we will revise the manuscript's terminology, especially in the title and abstract, to accurately reflect the system's success in "streamlining" and "robustness" in addition to performance.

      (2) Mechanistic rationale for simplification (eLife Assessment, R#1)

      We will substantially expand the Discussion to provide a mechanistic explanation for why activity is maintained after removing up to 28 components. This analysis will focus on the retention of endogenous metabolic enzymes and residual factors within the "Fast Lysate," citing relevant literature (e.g., Yokoyama et al., 2010, as suggested by R#1) to support the role of metabolic pathways in compensating for the lack of exogenous tRNA, CTP/UTP, and specific amino acids.

      (3) Transcription-translation coupling (R#3)

      To address the concern that expression changes might be due to transcription rather than translation efficiency, we will perform control experiments to monitor mRNA levels under key optimized conditions. This will help confirm that the observed efficiency changes are primarily attributable to translation.

      (4) Data presentation and completeness (R#2)

      We will revise the presentation of data in figures (e.g., Figure 2) to use appropriate graph types for discrete data and ensure all units, incubation times, and conditions are clearly and consistently specified. Furthermore, we will add a paragraph to the Discussion addressing the study's limitations, specifically the potential implications of DTT removal for certain protein types.

      We are confident that these planned revisions will address the reviewers' recommendations and result in a stronger manuscript.

    1. eLife Assessment

      This study provides important evidence for the mechanism underlying KCNC1-related developmental and epileptic encephalopathy. The authors have generated and characterized a new knock-in mouse with a pathogenic mutation found in patients to determine the synaptic and circuit mechanisms contributing to KCNC1-associated epilepsy. They provide convincing evidence for reduced excitability of parvalbumin-positive fast-spiking interneurons, but not in neighboring excitatory neurons, and suggest that this may contribute to seizures and premature death in the mice.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have created a new model of KCNC1-related DEE in which a pathogenic patient variant (A421V) is knocked into mouse in order to better understand the mechanisms through which KCNC1 variants lead to DEE.

      Strengths:

      (1) The creation of a new DEE model of KCNC1 dysfunction.

      (2) InVivo phenotyping demonstrates key features of the model such as early lethality and several types of electrographic seizures.

      (3) The ex vivo cellular electrophysiology is very strong and comprehensive including isolated patches to accurately measure K+ currents, paired recording to measure evoked synaptic transmission, and the measurement of membrane excitability at different timepoint and in two cell types.

      (4) 2P imaging relates the cellular dysfunction in PV neurons to epilepsy.

    3. Reviewer #2 (Public review):

      Summary:

      Wengert et al. generated and comprehensively characterized the Kcnc1 A421V/+ knock-in mouse, which models developmental epileptic encephalopathy. The Kcnc1 gene encodes the Kv3.1 channel subunit, which, similar to the role of BK-channels in some excitatory neurons, facilitates high-frequency firing in inhibitory neurons by accelerating the downward hyperpolarization of individual action potentials. Although various Kcnc1 mutations are linked to developmental epileptic encephalopathies, the functional impact of the A421V mutation remained controversial. To elucidate its effect on the neuronal excitability and neurological functions, the authors generated cre-dependent KI mice and thoroughly characterized them using neonatal neurological assessments, high-quality in vitro electrophysiology, and in vivo imaging/electrophysiology analyses. These studies revealed impaired excitability in the PV+ inhibitory interneurons, correlating with the emergence of epilepsy and premature death. Overall, this study provides strong support for the role of the A421V mutation in disrupting inhibitory function.

      Overall, the study is well-designed and conducted at a high quality. The use of a Cre-dependent KI system is effective for maintaining the mutant line despite the premature death phenotype, and may also minimize the phenotype drift that can arise when breeding from mice using milder phenotype manifestation (as ones with severe phenotype often fail to reproduce). The neonatal behavior analysis is thoroughly conducted, and the in vitro electrophysiology studies are of high quality, providing robust insights into the functional impact of the mutation.

      One limitation of this study is the demonstration of the trafficking defect of mutant Kv3.1, which relies solely on the fluorescence density, and such analysis often lacks a rigorous quantitative measurement. A biochemical analysis (surface biotinylation or immunoblot using membrane fractionation) will make the conclusion more convincing, although this poses a technical challenge as the Kv3.1 is expressed primarily expressed only in a subset of PV+ cells.

      While the study focused on the superficial layer because Kv3.1 is the major channel subunit, some of the neurons co-express Kv3.2, and Kv3.1 and Kv3.2 can form heteromeric channels. It would be interesting to explore whether the mutant Kv3.1 subunits exert a dominant-negative effect on Kv3.2 in these populations.

    4. Reviewer #3 (Public review):

      Summary:

      Here Wengert et al., establish a rodent model of KCNC1 (Kv3.1) epilepsy by introducing the A421V mutation. The authors perform video-EEG, slice electrophysiology, and in vivo 2P imaging of calcium activity to establish a disease mechanisms involving impairment in the excitability of fast spiking parvalbumin (PV) interneurons in the cortex and thalamic PV cells.

      Outside out nucleated patch recordings were used to evaluate the biophysical consequence of the A421V mutation on potassium currents and showed a clear reduction in potassium currents. Similarly action potential generation in cortical PV interneurons was severely reduced. Given that both potassium currents and action potential generation was found to be unaffected in excitatory pyramidal cells in the cortex the authors propose that loss of inhibition leads to hyperexcitability and seizure susceptibility in a mechanism similar to that of Dravet Syndrome.

      Strengths:

      This manuscript establishes a new rodent model of KCNC1-developmental and epileptic encephalopathy. The manuscript provides strong evidence that parvabumin interneurons are impaired by the Kcnc1-A421V mutation and that cortical excitatory neurons are not impaired. Together, these findings support the conclusion that seizure phenotypes associated with Kcnc1-A421V are caused by impaired cortical inhibition.

      Weaknesses:

      The manuscript identifies a partial mechanism of disease that leaves several aspects unresolved including the possible role of subcortical regions in the seizure mechanism. Similarly, while the authors identify a reduction in potassium currents and a reduction in PV cell surface expression of Kv3.1 why the A421V missense mutation leads to a more severe phenotype than previously reported loss-of-function mutations in Kv3.1is not clear.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):           

      Summary:

      The authors have created a new model of KCNC1-related DEE in which a pathogenic patient variant (A421V) is knocked into a mouse in order to better understand the mechanisms through which KCNC1 variants lead to DEE.  

      Strengths:

      (1)  The creation of a new DEE model of KCNC1 dysfunction. 

      (2)  In Vivo phenotyping demonstrates key features of the model such as early lethality and several types of electrographic seizures. 

      (3)  The ex vivo cellular electrophysiology is very strong and comprehensive including isolated patches to accurately measure K+ currents, paired recording to measure evoked synaptic transmission, and the measurement of membrane excitability at different time points and in two cell types.

      We thank Reviewer 1 for these positive comments related to strengths of the study.   

      Weaknesses:

      (1) The assertion that membrane trafficking is impaired by this variant could be bolstered by additional data.

      We agree with this comment. However, given the technical challenges of standard biochemical experiments for investigating voltage-gated potassium channels (e.g., antibody quality), the lack of a Kv3.1-A421V specific antibody, and the fact that Kv3.1 is expressed in only a small subset of cells, we did not undertake this approach. However, we did perform additional experiments and analysis to improve the rigor of the experiments supporting our conclusion that membrane trafficking is impaired in the Kcnc1-A421V/+ mouse. 

      Such experiments support a highly significant and robust difference in our (albeit imperfect) measurement of the membrane:cytosol ratio of Kv3.1 immunofluorescence between WT and Kcnc1-A421V/+ mice, which is consistent with lack of membrane trafficking (Figure 3). In the revised manuscript, we have added additional data points to this plot and updated the representative example images using improved imaging techniques to better showcase how Kcnc1-A421V/+ PV-INs differ from age-matched WT littermate controls. We think the result is quite clear. Future biochemical experiments perhaps best performed in a culture system in vitro could provide additional support for this conclusion.

      (2) In some experiments details such as the age of the mice or cortical layer are emphasized, but in others, these details are omitted.

      We apologize for this omission. We have now clarified the age of the mice and cortical layer for each experiment in the Methods and Results sections as well as figure legends.   

      (3) The impairments in PV neuron AP firing are quite large. This could be expected to lead to changes in PV neuron activity outside of the hypersynchronous discharges that could be detected in the 2-photon imaging experiments, however, a lack of an effect on PV neuron activity is only loosely alluded to in the text. A more formal analysis is lacking. An important question in trying to understand mechanisms underlying channelopathies like KCNC1 is how changes in membrane excitability recorded at the whole cell level manifest during ongoing activity in vivo. Thus, the significance of this work would be greatly improved if it could address this question.

      Yes, the impairments in the neocortical PV-IN excitability are notably severe relative to other PV interneuronopathies that we and others have directly investigated (e.g., Kv3.1 or Kv3.2-/- knockout mice; Scn1a+/- mice). In the revised version of the manuscript, we have now added a more thorough in vivo 2P calcium imaging investigation and analysis of our in vivo 2P calcium imaging data of PV-IN (and presumptive excitatory cell) neural activity (Figure 8 and Supplementary Figure 9, Methods- lines 230-271 Results- lines 630-657, and Discussion lines- 795-814). 

      Because of the prominent recruitment of neuropil during presumptive myoclonic seizures, further investigation of individual neuronal excitability in vivo required a slightly different labeling strategy now using a soma-tagged GCaMP8m as well as a separate AAV containing tdTomato driven by the PV-IN-specific S5E2 enhancer. Our new results reveal an increase in the baseline calcium transient frequency in non-PV-INs, and reduced mean transient amplitudes in both non-PV cells and PV-INs. These interesting findings, which are consistent with attenuated PV-IN-mediated perisomatic inhibition leading to disinhibited excitatory cells in the Kcnc1-A421V/+ mice, link our in vivo results to the slice electrophysiology experiments. Of course, there are residual issues with the application of this technique to interneurons and the ability to resolve individual or small numbers of spikes, which likely explains the lack of genotype difference in calcium transient frequency in PV-INs.

      (4) Myoclonic jerks and other types of more subtle epileptiform activity have been observed in control mice, but there is no mention of littermate control analyzed by EEG. 

      We performed additional experiments as requested and did not observe myoclonic jerks or any other epileptic activity in WT control mice. We have included this data in the revised manuscript (Figure 9C).   

      Reviewer #2 (Public review):           

      Summary:

      Wengert et al. generated and thoroughly characterized the developmental epileptic encephalopathy phenotype of Kcnc1A421V/+ knock-in mice. The Kcnc1 gene encodes the Kv3.1 channel subunit. Analogous to the role of BK channels in excitatory neurons, Kv3 channels are important for the recurrent high-frequency discharge in interneurons by accelerating the downward hyperpolarization of the individual action potential. Various Kcnc1 mutations are associated with developmental epileptic encephalopathy, but the effect of a recurrent A421V mutation was somewhat controversial and its influence on neuronal excitability has not been fully established. In order to determine the neurological deficits and underlying disease mechanisms, the authors generated cre-dependent KI mice and characterized them using neonatal neurological examination, high-quality in vitro electrophysiology, and in vivo imaging/electrophysiology analyses. These analyses revealed excitability defects in the PV+ inhibitory neurons associated with the emergence of epilepsy and premature death. Overall, the experimental data convincingly support the conclusion.

      Strengths:

      The study is well-designed and conducted at high quality. The use of the Cre-dependent KI mouse is effective for maintaining the mutant mouse line with premature death phenotype, and may also minimize the drift of phenotypes which can occur due to the use of mutant mice with minor phenotype for breeding. The neonatal behavior analysis is thoroughly conducted, and the in vitro electrophysiology studies are of high quality.

      We appreciate these positive comments from Reviewer 2. 

      Weaknesses:

      While not critically influencing the conclusion of the study, there are several concerns.

      In some experiments, the age of the animal in each experiment is not clearly stated. For example, the experiments in Figure 2 demonstrate impaired K+ conductance and membrane localization, but it is not clear whether they correlated with the excitability and synaptic defects shown in subsequent figures. Similarly, it is unclear how old mice the authors conducted EEG recordings, and whether non-epileptic mice are younger than those with seizures. 

      We have now updated the manuscript to include clear report of age for all experiments including the impaired K<sup>+</sup> conductance (now Figure 3) and EEG (now Figure 9). There was no intention to omit this information. The recordings of K<sup>+</sup> conductance impairments in PV-INs from Kcnc1-A421V/+ mice were completed at P1621. Thus, we interpret the loss of potassium current density to be causally linked with the impairments in intrinsic physiological function at that same time-period in neocortical layer II-IV PV-INs and more subtly in PV-positive cells in the RTN and neocortical layer V PVINs.

      Mice used in the EEG experiments were P24-48, an age range which roughly corresponded with the midpoint on the survival curve for Kcnc1-A421V/+ mice. Although we saw significant mouse-to-mouse variability in seizure phenotype, no Kcnc1-A421V/+ mice completely lacked epilepsy or marked epileptiform abnormalities, neither of which were seen in WT mice. We did not detect a clear relationship between seizure frequency/type and mouse age. 

      The trafficking defect of mutant Kv3.1 proposed in this study is based only on the fluorescence density analysis which showed a minor change in membrane/cytosol ratio. It is not very clear how the membrane component was determined (any control staining?). In addition to fluorescence imaging, an addition of biochemical analysis will make the conclusion more convincing (while it might be challenging if the Kv3.1 is expressed only in PV+ cells).

      This relates to comment 3 of Reviewer 1. We agree that, in the initial submission of the manuscript, the evidence from IHC for Kv3.1 trafficking deficits was somewhat subtle. In the revised version of the paper, we have gathered additional replicates of this original experiment with improved imaging quality and clarify how the membrane component was specified, to now show a robust and highly significant (***P<0.001) decrease in membrane:cytosol Kv3.1 ratio. We have also now provided new example images better showcasing the deficits observed in the Kcnc1-A421V/+ mice (Figure 3). The membrane compartment was defined as the outermost 1 micron of the parvalbumin-defined cell soma (drawn blind to the Kv3.1b signal), and, importantly, all analysis was conducted blinded to mouse genotype. These measures help to ensure that the result is robust and unbiased. Nonetheless, we have added a paragraph in the Discussion section highlighting the limitations of our IHC evidence for trafficking impairment (Lines 868-883). 

      While the study focused on the superficial layer because Kv3.1 is the major channel subunit, the PV+ cells in the deeper cortical layer also express Kv3.1 (Chow et al., 1999) and they may also contribute to the hyperexcitable phenotype via negative effect on Kv3.2; the mutant Kv3.1 may also block membrane trafficking of Kv3.1/Kv3.2 heteromers in the deeper layer PV cells and reduce their excitability. Such an additional effect on Kv3.2, if present, may explain why the heterozygous A421V KI mouse shows a more severe phenotype than the Kv3.1 KO mouse (and why they are more similar to Kv3.2 KO). Analyzing the membrane excitability differences in the deep-layer PV cells may address this possibility.

      We appreciate this thoughtful suggestion. We have now provided data from neocortical layer V PV interneurons in the revised manuscript (Supplementary Figure 5). Abnormalities in intrinsic excitability from neocortical layer V PV-INs in Kcnc1A421V/+ mice were present, but less pronounced than in PV-INs from more superficial cortical layers. These results are consistent with the view that greater relative expression of Kv3.2 “dilutes” the impact of the Kv3.1 A421V/+ variant. More specific determination of whether the A421V/+ variant impairs membrane trafficking and/or gating of Kv3.2 remains unclear. 

      We attempted to assess how the mutant Kv3.1 affects Kv3.2 localization, but were unsuccessful due to the lack of reliable antibodies. After immunostaining mouse brain sections with two different anti-Kv3.2 antibodies, only one produced somewhat promising signal (see below). However, even in this case, Kv3.2 staining was successful only once (out of five independent staining experiments) and the signal varied across cortical regions, showing widespread cellular Kv3.2 signal in some areas (b, top panel), and barely detectable signal in others, regardless of Kv3.1 expression. In the remaining four attempts, we detected only ‘fiber-like’ immunostaining signal, further diminishing our confidence in anti-Kv3.2 antibody, although results could be improved with still further testing and refinement which we will attempt. Consequently, this important question remains unsolved in this study. 

      Author response image 1.

      Immunostaining of Kv3.1 and Kv3.2 in sagittal mouse brain sections. a) An example of intracellular Kv3.2 immunostaining signal, variable across the cortex of a WT mice independent of Kv3.1 expression b) Kv3.2 is detectable intracellularly in most of the cells in the top panel but barely detectable in the lowest panel. c) Representative image of Kv3.2 immunostaining signal in other sagittal mouse brain sections.

      We have discussed these important implications and limitations of our results in the Discussion (Lines 868-883). We agree with the Reviewer’s interpretation that an impact on Kv3.1/Kv3.2 heteromultimers across the neocortex may explain why the Kcnc1A421V/+ mouse exhibits a more severe phenotype than Kv3.1-/- or Kv3.2-/- mice (see below), a view which we have attempted to further clarify in the Conclusion.    

      In Table 1, the A421V PV+ cells show a depolarized resting membrane potential than WT by ~5 mV which seems a robust change and would influence the circuit excitability. The authors measured firing frequency after adjusting the membrane voltage to -65mV, but are the excitability differences less significant if the resting potential is not adjusted? It is also interesting that such a membrane potential difference is not detected in young adult mice (Table 2). This loss of potential compensation may be important for developmental changes in the circuit excitability. These issues can be more explicitly discussed.

      We do not entirely understand this finding and its apparent developmental component. It could be compensatory, as suggested by the Reviewer; however, it is transient and seems to be an isolated finding (i.e., it is not accompanied by compensation in other properties). It is also possible that this change in Kcnc1-A421V/+ PV-INs may reflect impaired/delayed development. We cannot test excitability at a meaningfully later time point as the mice are deceased.

      The revised version of the manuscript contains additional data (Supplementary Figure 4) showing that major deficits in intrinsic excitability are still observed even when the resting membrane potential is left unadjusted. These results are further discussed in the Results section (lines 522-523) and the Discussion section (lines 727-731).   

      Reviewer #3 (Public review):           

      Summary:

      Here Wengert et al., establish a rodent model of KCNC1 (Kv3.1) epilepsy by introducing the A421V mutation. The authors perform video-EEG, slice electrophysiology, and in vivo 2P imaging of calcium activity to establish disease mechanisms involving impairment in the excitability of fast-spiking parvalbumin (PV) interneurons in the cortex and thalamic PV cells.

      Outside-out nucleated patch recordings were used to evaluate the biophysical consequence of the A421V mutation on potassium currents and showed a clear reduction in potassium currents. Similarly, action potential generation in cortical PV interneurons was severely reduced. Given that both potassium currents and action potential generation were found to be unaffected in excitatory pyramidal cells in the cortex the authors propose that loss of inhibition leads to hyperexcitability and seizure susceptibility in a mechanism similar to that of Dravet Syndrome.  

      Strengths: 

      This manuscript establishes a new rodent model of KCNC1-developmental and epileptic encephalopathy. The manuscript provides strong evidence that parvabumin-type interneurons are impaired by the A421V Kv3.1 mutation and that cortical excitatory neurons are not impaired. Together these findings support the conclusion that seizure phenotypes are caused by reduced cortical inhibition.

      We thank Reviewer 3 for their view of the strengths of the study.

      Weaknesses:

      The manuscript identifies a partial mechanism of disease that leaves several aspects unresolved including the possible role of the observed impairments in thalamic neurons in the seizure mechanism. Similarly, while the authors identify a reduction in potassium currents and a reduction in PV cell surface expression of Kv3.1 it is not clear why these impairments would lead to a more severe disease phenotype than other loss-of-function mutations which have been characterized previously. Lastly, additional analysis of videoEEG data would be helpful for interpreting the extent of the seizure burden and the nature of the seizure types caused by the mutation.

      We agree with this comment(s) from Reviewer 3. We studied neurons in the reticular thalamus and layer V neocortical PV-INs since they are also linked to epilepsy pathogenesis and are known to express Kv3.1. However, for most of the study, we focused on neocortical layer II-IV PV-INs, because these cells exhibited the most robust impairments in intrinsic excitability. Cross of our novel Kcnc1-Flox(A421V)/+ mice to a cerebral cortex interneuron-specific driver that would avoid recombination in the thalamus, such as Ppp1r2-Cre (RRID:IMSR_JAX:012686), could assist in determining the relative contribution of thalamic reticular nucleus dysfunction to overall phenotype as used by (Makinson et al., 2017) to address a similar question; however, we have been unable to obtain this mouse despite extensive effort. There are of course other Kv3.1expressing neurons in the brain, including in the hippocampus, amygdala, and cerebellum, and we have provided additional discussion (Lines 731-736) of this issue.

      We further agree with the Reviewer that a major question in the field of KCNC1-related neurological disorders is the mechanistic underpinning of why the KCNC1-A421V variant leads to a more severe disease phenotype than other loss of function KCNC1 variants, and, further, why the mouse phenotype is more severe than the Kcnc1 knockout. Previous results and our own recordings in heterologous systems suggest that the A421V variant is more profoundly loss of function than the R320H variant (Oliver et al., 2017; Cameron et al., 2019; Park et al., 2019), which is consistent with A421V having a more severe disease phenotype. Relative to knockout of Kv3.1, our results are consistent with the view that the A421V exhibits dominant negative activity by reducing surface expression of Kv3.1 and/or Kv3.2 (an effect that would not occur in knockout mice), with a possible additional contribution of impairing gating of those Kv3.1-A421V variant containing Kv3.1/Kv3.2 heteromultimers by inclusion of A421V subunits into the heterotetramer. Our finding that the magnitude of total potassium current was reduced in PV-INs by ~50% is consistent with a combination of these various mechanisms but does not distinguish between them.

      In the revised version of the manuscript, we have provided a more complete discussion of these important remaining questions regarding our interpretation of how the severity of KCNC1 disorders relates to the biophysical features of the ion channel variant (lines 868883).

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):          

      Major

      (1) The authors suggest that the reduced K+ current density in Kcnc1-A421V/+ neurons is due in part to impaired trafficking and cell surface expression of Kv3.1 in these neurons. The data supporting this claim aren't completely convincing. First, it's difficult to visualize a difference in Kv3.1 localization in the images shown in panel H, and importantly, it seems problematic that the method to assess Kv3.1 levels in membrane vs. cytosol relied on using PV co-staining to define the membrane compartment as the outermost 1 um of the PV-defined cell soma. This doesn't seem to be the best method to define the membrane compartment, as the PV signal should be largely cytosolic.

      As noted above, we have completed additional data collection to confirm our results, and have performed additional imaging and updated our example images to be more representative of the observed deficits in membrane Kv3.1 expression in the Kcnc1-A421V/+ mice. We attempted to identify a marker to more clearly label the membrane to combine with PV immunocytochemistry but were unable to do so despite some effort. 

      Is it possible that in control neurons, the cytosolic PV signal localizes within the membrane-bound Kv3.1 signal, with less colocalization, whereas in Kcnc1-A421V/+ neurons, there would be more colocalization of the cytosolic PV and improperly trafficked Kv3.1.? Could the data be presented in this way showing altered colocalization of Kv3.1 with PV?

      We do not entirely understand the nature of this concern. In our experiments, we utilized the PV signal to determine the cell membrane and cytosolic compartments in an unbiased manner using a 1-micron shell traced around/outside the edge of the PV signal to define the membrane compartment, with the remainder of the area (minus the nuclear signal defined by DAPI) defined as the cytosol (see Methods 176-186). Because we did not identify any alterations in PV signal or correlation between PV immunohistochemistry and tdTomato expression in Cre reporter strains between WT and Kcnc1-A421V/+ mice, we believe that our strategy for determining membrane:cytosol ratio of Kv3.1 in an unbiased manner is acceptable (albeit of course imperfect). 

      Alternatively, membrane fractionation could be performed on WT vs Kcnc1-A421V/+ neurons, followed by Western blotting with a Kv3.1 antibody to show altered proportions in the cytosolic vs. membrane protein fractions. It's important that these results are convincing, as the findings are mentioned in the Abstract, the Results section, and multiple times in the Discussion, although it is still unclear how much the potential altered trafficking contributes to the decrease in K+ currents versus changes in channel gating.

      Multiple technical barriers made it difficult for us to gain direct biochemical evidence for altered trafficking of the A421V/+ Kv3.1 variant (see above). It is not clear how membrane fractionation techniques could be easily applied in this case (at least by us) when PV-INs constitute 3-5% of all neocortical neurons. We further agree (as noted above) that it is difficult to properly disentangle the relative roles of impaired membrane trafficking vs. gating deficits to the observed effect; however, we think that both phenomena are likely occurring. In the revised version of the manuscript, we have more explicitly discussed these limitations in the Discussion section (Lines 868-883).   

      (2) More information is needed regarding the age of mice used for experiments for the following results (added to the Results section as well as figure legends):

      PV density (Supplementary Figure 1) 

      K+ current data (Figure 2A-G)       

      Kv3.1 localization (Figure 2H and I)        

      RTN electrophysiology (Supplementary Figure 3)

      Excitatory neuron electrophysiology (Figure 4)             

      In vivo 2P calcium imaging (Figure 7) 

      Video-EEG (Figure 8)

      We apologize for omitting this critical information. In the revised manuscript, we have provided the age of mice for each of our experiments in the results section, in the figure legend, and in the methods section.   

      (3) It's unclear why developmental milestones/behavioral assessments were only done at P5-P10. In the previous publication of another Kcnc1 LOF variant (Feng et al. 2024), no differences were found at P5-P10, and it was suggested in the discussion that this finding was "consistent with the known developmental expression pattern of Kv3.1 in mouse, where Kv3.1 protein does not appear until P10 or later". In that paper, they did find behavioral deficits at 2-4 months. Even though this model is more severe than the previous model, it would be interesting to determine if there are any behavioral deficits at a later time point (especially as they find more neurophysiological impairments at P32P42).

      As in our previous study, the lack of clear behavioral deficits in developmental milestones from P5-15 is potentially expected considering the developmental expression of Kv3.1, and we performed these experiments primarily to showcase that the Kcnc1-A421V/+ mice exhibit otherwise normal overall early development (although this could be an artifact of the sensitivity of our testing methods).

      For the revised manuscript, we have conducted additional experiments to investigate behavioral deficits in adult Kcnc1-A421V/+ mice. We found cognitive/learning deficits in both Kcnc1-A421V/+ mice relative to WT in both the Barnes maze (Figure 2A-C) and Ymaze (Figure 2D-F). Other aspects of animal behavior including cerebellar-related motor function are likely also impaired at post-weaning timepoints, and will be included in a forthcoming research study focusing on the motor function in these mice.  

      (4) In the Results section, it should be more clearly stated which cortical layer/layers are being studied. In some cases, it mentions layers 2-4, and in some, only layer 4, and in others, it doesn't mention layers at all. Toward the beginning of the Results section, the rationale for focusing on layers 2-4 to assess the effects of this variant should be well described and then, for each experiment, it should be stated which cortical layers were assessed. Related to this point, it seems electrophysiology was only done in layer 4; the rationale for this should also be included.

      We have now clarified which neocortical layers were under investigation in the study. All PV-INs were targeted in somatosensory layers II-IV, while excitatory neurons were either cortical layer IV spiny stellate cells or pyramidal cells. Paired recordings were also completed in layer IV. We have also more explicitly articulated our rationale for looking at PV-INs in layers II-IV to examine the cellular/circuitlevel impact of Kv3.1 in a model of developmental and epileptic encephalopathy (Lines 487-491). 

      (5) Kcnc1-A421V/+ PV neurons showed more robust impairments in AP shape and firing at P32-42 than at P16-21 (Figure 3), and only showed synaptic neurotransmission alterations at P32-42 (Figure 6). Thus, it's unclear why Kcnc1-A421V/+ excitatory neurons were only assessed at P16-21 (Figure 4 and Supplementary Figure 4 related to Figure 5), particularly if only secondary or indirect effects on this population would be expected.

      We appreciate this excellent point raised by the Reviewer and we have taken the suggestion to examine excitatory neurons at P32-42 in addition to the earlier juvenile timepoint. Our new results from the later timepoint are similar to our results at P16-21: Excitatory neurons show no statistically significant impairments in intrinsic excitability at either of the two timepoints examined (Supplementary Figure 7). This adds support to our original conclusion that PV-INs represent the major driver of disease pathology across development.   

      (6) The 2P calcium imaging experiments are potentially interesting, however, a relationship between these results and the electrophysiology results for PV neurons is lacking. Was there an attempt to assess the frequency and/or amplitude of calcium events specifically in PV neurons, outside of the hypersynchronous discharges, to determine whether there are differences between WT and Kcnc1-A421V/+, as was seen in the electrophysiological analyses? It does seem there are some key differences between the two experiments (age: later timepoint for 2P vs. P16-21 and P32-42, layer: 2/3 vs. 4, and PV marking method: virus vs. mouse line), but the electrophysiological differences reported were quite strong. Thus, it would be surprising if there were no alterations in calcium activity among the Kcnc1-A421V/+ PV neurons.

      In our initial experiments, the prominent neuropil GCaMP signal in Kcnc1-A421V/+ mice rendered it difficult to distinguish and accurately describe baseline neuronal excitability in PV-INs and non-PV cells. In our revised manuscript, we utilized a soma-tagged GCaMP8m and separately labeled PV-INs through S5E2-tdTomato. This strategy made it possible to assess the amplitude and frequency of calcium transients in both PV-positive and PV-negative cells in vivo. We have updated the description of our methods (lines 230-271) and our results (lines 630-657) in the revised manuscript.

      As noted above, our more detailed analysis of somatic calcium transients in PV-IN and non-PV cells during quiet rest (Figure 8 and Supplementary Figure 9) shows that PV-INs from Kcnc1-A421V/+ mice are abnormally excitable- having reduced transient amplitude relative to WT controls. Interestingly, non-PV cells also exhibited an increased calcium transient frequency and reduced amplitude which is potentially consistent with reduced perisomatic inhibition causing disinhibition in cortical microcircuits. We again highlight that the slow kinetics of GCaMP combined with the calcium buffering and brief spikes of PVINs render quantification of action potential frequency and comparisons between groups difficult.  

      (7) As mentioned above, it would be helpful to state the time points or age ranges of these experiments to better understand the results and relate them to each other. For example, the 2P imaging showed apparent myoclonic seizures in 7/7 Kcnc1-A421V/+ mice (recorded for a total of 30-50 minutes/mouse), but the video-EEG showed myoclonic seizures in only 3/11 Kcnc1-A421V/+ mice (recorded for 48-72 hours/mouse). Were these experiments done at very different age ranges, so this difference could be due to some sort of progression of seizure types and events as the mice age? Is it possible these are not the same seizure types (even though they are similarly described)? This discrepancy should be discussed.

      Mice in the EEG experiments were between the ages of P24 and 48, slightly younger than the age in which we carried out the in vivo calcium imaging experiments (>P50). Therefore, an age-related exacerbation in myoclonic jerks is possible. 

      As is highlighted by the Reviewer, it is interesting that the myoclonic seizures were only detected in a portion of the Kcnc1-A421V/+ mice during EEG monitoring (4/12). We believe that the difference is most likely driven by more sensitive detection of the myoclonic jerk activity and behavior in the 2P imaging of neuropil cellular activity compared to our video-EEG monitoring and 2P imaging of soma-tagged GCaMP. We have occasionally observed repetitive myoclonic jerking in mice that appears highly localized (i.e. one forepaw only) suggesting that the myoclonic seizures exist on a spectra of severity from focal to diffuse. It is therefore possible that myoclonic events and electrographic activity may be slightly underestimated in our video-EEG experiments? 

      We have now added a few lines discussing this discrepancy in the Discussion (lines 809814).   

      (8) Myoclonic jerks and other types of more subtle epileptiform activity have been observed in control mice. Was video-EEG performed on control mice? These data should be added to Figure 8.

      We have added recordings in control WT mice (N=4). We did not detect myoclonic jerks or other epileptiform activity in the control mice (Figure 9).  

      Minor

      (1) In the first Results section, Line 365, the P value (P<0.001) is different from that in the legend for Figure 1, line 743 (P<0.0001).

      We have fixed this discrepancy. 

      (2) For Supplementary Figure 1, it would be helpful to show images that span the cortical layers (1-6), as PV and Kv3.1 are both expressed across the cortical layers.

      We have updated Supplementary Figure 1 with better example images that span the cortical layers.    

      (3) Error bars should be added to the line graphs in Supplementary Figure 2, particularly panels B and C. Some of the differences appear small considering the highly significant p-values (i.e. body weight at P7 and brain weight at P21).

      The values shown in Supplementary Figure 2D-E are percentages of mice displaying a particular characteristic, so there is no variance for the data.

      Supplementary Figure 2B-C actually do contain error bars plotted as SEM, however, because of the large number of N and small degree of variance in the measurements, the error bars are not apparent in the graphs. This has been noted in the Supplementary Figure 2 legend for clarity. 

      (4) In Figure 3, although the Kcnc1-A421V/+ neurons have elevated AP amplitudes relative to WT, the representative traces for P16-21 and P32-42 groups appear strikingly opposite (traces in B in G appear to have much higher amplitudes than those in C and H). As this is one of the three AP phenotypes described, it would be nice to have it reflected in the traces.

      We have updated our example traces to better represent our main findings including AP amplitude for both P16-21 and P32-42 timepoints.  

      (5) Were any effects on the AHP assessed in the electrophysiology experiments? As other studies have reported the effects of altered Kv3 channel activity on AHP, this parameter could be interesting to report as well.

      We have now provided data on the afterhyperpolarization for each condition displayed in the Supplementary data tables. Interestingly, we failed to detect significant differences in AHP between WT and Kcnc1-A421V/+ PV-INs, RTN neurons, or pyramidal cells, although we did identify differences in the dV/dt of the repolarization phase of the AP.   

      (6) The figure legend for Figure 7 has errors in the panel labeling (D instead of C, and two Fs).

      This error has been corrected in the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      Specific comments and questions for the authors:         

      (1) Do the authors provide a reason for why the juvenile animals are unaffected by the A421V mutation? Is it that PV cells have not fully integrated at this early time point or that Kv3.1 expression is low? Is the developmental expression profile of Kv3.1 in PV cells known and if so could the authors update the discussion with this information?

      We interpret the normal early developmental milestones (P5-P15) to reflect that Kcnc1-A421V/+ mice exhibit the onset of their neurological impairment at the same time that PV-INs upregulate Kv3.1, develop a fast-spiking physiological phenotype, and integrate into functional circuits in the third and fourth postnatal weeks. We have updated the discussion (Line 780-782) with this information and more clearly describe our interpretation of these early-life behavioral experiments.   

      (2) I would like to see a more complete analysis of the Video-EEG data that is included in Figure 8. What was the seizure duration and frequency? Were there spike-wave seizure types observed? Were EEG events that involve thalamocortical circuitry affected such as spindles? Was sleep architecture impaired in the model? Were littermate control animals recorded?

      Although classical convulsive seizures represent only part of the overall epilepsy phenotype that this mouse exhibits, we agree that reporting seizure duration and frequency is important. We have now included this in our revised manuscript (line 624-626). We have also now added WT control mice to our dataset, and, as expected, we failed to observe any epileptic features in our WT recordings.

      In our EEG experiments, we did not record EMG activity in the mouse to allow for unambiguous determination of sleep vs. quiet wakefulness. For that reason, and because we believe it beyond the scope of this particular study, we did not examine sleep-related EEG phenomena such as spindles or sleep architecture. We have, however, added a line in the discussion (line 771-774) suggesting that future studies focus on a more thorough investigation of the EEG activity in these animals. 

      (3) The in vivo calcium imaging data shows synchronous bursts in A421V animals which is in agreement with the synchronous bursts observed in the EEG. Overall the analysis of the in vivo calcium imaging data appears to be rudimentary and perhaps this is a missed opportunity. What additional insights were gained from this technically demanding experiment that were not obtained from the EEG recordings?

      As noted above, in the revised version of the manuscript, we have conducted additional experiments which allowed us to separately examine PV-IN and non-PV neuron excitability via 2P in vivo calcium imaging. This required an alternative strategy to label individual neuronal somata without contamination by the robust neuropil signal that we observed in the approach undertaken in the original submission. We’ve described the details of this new approach in methods (Lines 230-271) and results section (lines 630-657).

      Our new results (Figure 8 and Supplementary Figure 9) reveal that, during quiet rest, neocortical PV-INs from Kcnc1-A421V/+ mice exhibit a reduction in calcium transient amplitude during quiet wakefulness and that non-PV cells exhibit altered transient frequency and amplitude. Overall, we believe that these results are consistent with the view that PV-IN-mediated perisomatic inhibition is compromised in Kcnc1-A421V/+ mice which leads to a downstream hyperexcitability in excitatory neurons within cortical microcircuits.  

      (4) The increased severity of seizure phenotypes observed in the A421V model relative to knockout mice is interesting but also confusing given what is known about this mutation. As the authors point out, a possible explanation is that the mutation is acting in a dominant negative manner, where mutant Kv3.1 channels compete with other Kvs that would otherwise be able to partially compensate for the loss of Kv function. Alternatively, the A421V mutation might act by affecting the trafficking of heterotetrameric Kv3 channels to the membrane. Can the authors clarify why a trafficking deficit would produce a different effect than a loss of function mutation? Are the authors proposing that a hypomorphic mutation involving both a partial trafficking deficit and a dominant negative effect of those channels that are properly localized is more severe than a "clean" loss of function? The roughly 50% loss of potassium current absent a change in gating would be expected to behave like a loss-of-function mutation. This might be addressed by comparing the surface expression of the other Kv channels and/or through the use of Kv3.1-selective pharmacology.

      These are excellent points raised by the Reviewer. As noted above, we have endeavored to clarify our hypothesis as to the basis of this phenomenon, although the mechanistic basis for the more severe phenotype in the Kcnc1-A421V/+ mouse relative to the Kv3.1 knockout is not entirely clear. Our physiology results and the evidence presented supporting a trafficking impairment, are consistent with dominant negative action of the Kv3.1 A421V variant at the level of channel gating and/or trafficking. To restate, we think the Kcnc1-A421V/+ heterozygous variant is more severe than a Kv3.1 knockout for (at least) three reasons: variant Kv3.1 is incorporated into Kv3.1/Kv3.2 heterotetramers to (1) impair trafficking to the membrane as well as (2) alter the electrophysiological function of those channels that do successfully traffic to the membrane (while Kv3.1 knockout affects Kv3.1 only), and (3) the heterozygous variant may escape compensatory upregulation of Kv3.2 and which is known to occur in Kv3.1 knockout mice.

      For example, our data suggests and is consistent with the view that heterotetramers of WT Kv3.1 and Kv3.2 potentially come together with the A421V Kv3.1 subunit in the endoplasmic reticulum and then fail to traffic to the membrane due to the presence of one or more A421V subunit(s), as evidenced by increased Kv3.1 staining in the cytosol in the Kcnc1-A421V/+ mouse relative to WT. This is in contrast to what would occur in the Kv3.1knockout mice as there is no subunit produced from the null allele to impair WT Kv3.2 subunits from forming fully functional Kv3.2 homotetramers to then reach the cell surface and function properly. This is one specific possible mechanism for dominant negative activity.

      A non-mutually-exclusive mechanism is that inclusion of one or more Kv3.1 A421V subunits into Kv3 heterotetramers impairs gating and prevents potassium flux such that, even if the tetramer does reach the membrane, that entire tetramer fails to contribute to the total potassium current. This is another possible mechanism for dominant negative function of the A421V subunit.

      Experimental elucidation of the precise mechanism of the dominant negative activity of the A421V Kcnc1 variant is beyond the scope of this study; yet, our lab is continuing to work on this. It will likely require dose-response experiments in which various ratios of WT and Kv3.1 A421V subunits are co-expressed in heterologous cells and then recorded for an overall effect on potassium current similar to (Clatot et al., 2017).

      In the revised manuscript, we have updated our discussion of these mechanistic considerations for KCNC1-related epilepsy syndromes in lines 868-883 in the Discussion. 

      References

      Cameron JM et al. (2019) Encephalopathies with KCNC1 variants: genotype-phenotypefunctional correlations. Annals of Clinical and Translational Neurology 6:1263– 1272.

      Clatot J, Hoshi M, Wan X, Liu H, Jain A, Shinlapawittayatorn K, Marionneau C, Ficker E, Ha T, Deschênes I (2017) Voltage-gated sodium channels assemble and gate as dimers. Nature Communications 8.

      Makinson CD, Tanaka BS, Sorokin JM, Wong JC, Christian CA, Goldin AL, Escayg A, Huguenard JR (2017) Regulation of Thalamic and Cortical Network Synchrony by Scn8a. Neuron 93:1165-1179.e6.

      Oliver KL et al. (2017) Myoclonus epilepsy and ataxia due to KCNC1 mutation: Analysis of 20 cases and K+ channel properties. Annals of Neurology 81.

      Park J et al. (2019) KCNC1-related disorders: new de novo variants expand the phenotypic spectrum. Annals of Clinical and Translational Neurology 6:1319–1326.

    1. eLife Assessment

      This valuable study provides solid evidence that supports TANGO2 homologs, including HRG-9 and HRG-10, can play a role in cellular bioenergetics and oxidative stress homeostasis. It also challenges the previously reported role of TANGO in heme transport and paves the way for future mechanistic studies addressing the mechanisms of how TANGO2 regulates oxidative stress homeostasis. The strengths include the use of different model systems, genetic tools, behavioral assays and efforts by the authors in using the same reagents to reproduce results of other groups.

    2. Reviewer #1 (Public review):

      Sandkuhler et al. re-evaluated the biological functions of TANGO2 homologs in C. elegans, yeast, and zebrafish. Compared to the previously reported role of TANGO2 homologs in transporting heme, Sandkuhler et al. expressed a different opinion on the biological functions of TANGO2 homologs. With the support of some results from their tests, they conclude that 'there is insufficient evidence to support heme transport as the primary function of TANGO2', in addition to the evidence that C. elegans TANGO2 helps counteract oxidative stress.. While the differences are reported in this study, more work is needed to elucidate the intuitive biological function of TANGO2.

      Strengths:

      (1) This work revisits a set of key experiments, including the toxic heme analog GaPP survival assay, the fluorescent ZnMP accumulation assay, and the multi-organismal investigations documented by Sun et al. in Nature (2022), which are critical for comparing the two works. Meanwhile, the authors also highlight the differences in reagents and methods between the two studies, demonstrating significant academic merit.

      (2) This work reported additional phenotypes for the C. elegans mutant of the TANGO2 homologs, including lawn avoidance, reduced pharyngeal pumping, smaller brood size, faster exhaustion under swimming test, and a shorter lifespan. These phenotypes are important for understanding the biological function of TANGO2 homologs, while they were missing from the report by Sun et al.

      (3) Investigating the 'reduced GaPP consumption' as a cause of increased resistance against the toxic GaPP for the TANGO2 homologs, hrg-9 hrg-10 double null mutant provides a valuable perspective for studying the biological function of TANGO2 homologs.

      (4) The induction of hrg-9 gene expression by paraquat indicates a strong link between TANGO2 and mitochondrial function.

      (5) This work thoroughly evaluated the role of TANGO2 homologs in supporting yeast growth using multiple yeast strains and also pointed out the mitochondrial genome instability feature of the yeast strain used by Sun et al.

      Weakness:

      It is always a challenge to replicate someone else's work, but it is worthwhile to take on the challenge, provide evidence, and raise concerns about it. These authors attempted to replicate the experiment using the same biological material as that used by Sun et al. in Nature (2022), despite some experimental differences between the two studies. This study does not have many technical weaknesses, but it can become a much better project by focusing on the new phenotypes discovered here.

    3. Reviewer #2 (Public review):

      This work offers a valuable re-evaluation of earlier claims from other groups about TANGO2 functions and proposes that energy-related and stress-related pathways may be more important to the disorder than previously thought. A key strength of this work is the use of multiple model systems. The authors provide solid data that show how TANGO2 is probably only indirectly involved in heme transport and provide support for alternative mechanisms where TANGO2 is actually directly control. These findings provide valuable information for researchers seeking more accurate therapeutic targets.

      Strengths:

      The study refutes earlier claims about TANGO2's involvement in heme transport and extends previous findings by implicating TANGO2 in metabolism and oxidative stress, thereby highlighting new aspects of its role in cell physiology. The use of different model systems (Saccharomyces cerevisiae, Caenorhabditis elegans, Danio rerio) to address the main research questions is useful and demonstrates evolutionary conservation of the studied processes. Finally, the results suggest a broader impact than previously described, somewhat supporting the novelty of the study.

      Weaknesses:

      Although the phenotypic analyses are broad and generally well executed, a key limitation is that the main conclusions mainly rely on these readouts. While informative, sole phenotypic analyses cannot directly demonstrate the underlying molecular mechanisms proposed by the authors. The study includes limited functional or biochemical assays connecting TANGO2 orthologs to the proposed energy and stress pathways. Some observations would benefit from additional orthogonal validation to strengthen the overall interpretation. As a result, the evidence supporting the central mechanistic interpretation remains indirect, although compelling.

      Overall, the authors have achieved their stated aims, and their results mainly support their main conclusion (i.e., TANGO2 is unlikely to function in heme transport and is probably linked to energy and stress pathways). However, much of the evidence comes from phenotypic analyses, which limits the strength of the mechanistic claims, leaving the proposed pathways somewhat indirect.

      This work is likely to have a valuable impact on the subfield by clarifying that TANGO2 is not involved (at least directly) in heme transport and clarifying its actual role in energy and stress-related processes. By rigorously reassessing and confuting earlier claims from other studies across multiple model systems, the current work will help to guide the future research and therapeutic exploration in the context of TANGO2 deficiencies. This study will provide a solid foundation for more mechanistic insights into TANGO2 function.

    4. Reviewer #3 (Public review):

      In this paper, Sandkuhler et al. reassessed the role of TANGO2 as a heme chaperone proposed by Sun et al in a recently published paper (https://doi.org/10.1038/s41586-022-05347-z). Overall, Sandkuhler et al. conclude that the heme-related roles of TANGO2 had been overemphasized by Sun et al. especially because the hrg9 gene does not exclusively respond to different regimens of heme synthesis/uptake but is susceptible to a greater extent to, for example, oxidative stress. Impaired heme trafficking is then interpreted as due to general mitochondrial dysfunction. In recent years, the discussion around the heme-related roles of TANGO2 has been tantalizing but is still far from a definitive consensus. Discrepancies between results and their interpretation are testament to how ambitious the understanding of TANGO2 and the phenotypes associated with TANGO2 defects are.

      The work presented by Sandkuhler et al. is methodologically sound, and the authors have appropriately addressed my concerns in the first round of review. Overall, this paper challenges the recent developments in the field in relation to heme trafficking and provides a wider perspective on the biological roles of TANGO2.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) A detailed comparison between this work and the work of Sun et al. on experimental protocols and reagents in the main text will be beneficial for readers to assess critically.

      We have added a Key Reagents Table outlining the key reagents used in our study. In terms of experimental protocols, we replicated those described by Sun et al. in most instances and described any differences when present. With this resubmission, we included additional ZnMP accumulation experiments in liquid media (see point 3 below).

      (2) The GaPP used by Sun et al. (purchased from Frontier Scientific) is more effective in killing the worm than the one used in this study (purchased from Santa Cruz). Is the different outcome due to the differences in reagents? Moreover, Sun et al. examined the lethality after 3-4 days, while this work examined the lethality after 72 hours. Would the extra 24 hours make any difference in the result?

      We now cite product vender differences as a possible reason for the observed difference in worm death, as the reviewer suggests, on page 8 (see text below) and include these differences in the Key Reagents Table. We also now stress the fact that our experiments included different doses of GaPP and the use of eat-2 mutants as an additional control, which we believe adds rigor and demonstrates the potency of GaPP in our experiments. We decided on assessment at 72 hours, as we deemed it a less nebulous time point as compared to 3-4 days. Most of the observed worm death occurred earlier in this interval, so we believe it is unlikely that large group differences would emerge after an additional 24 hours.

      “Exposing worms to GaPP, a toxic heme analog, we observed that nematodes deficient in HRG-9 and HRG-10 displayed increased survival compared to WT worms, consistent with prior work,[13] though the between-group difference was markedly smaller in our study. We required higher GaPP concentrations to induce lethality, potentially due to product vendor differences, but did observe a clear dose-dependent effect across strains. Although it was previously proposed that the survival benefit seen in worms lacking HRG-9 and HRG-10 resulted from reduced transfer from intestinal cells after GaPP ingestion, our data suggest the reduced lethality is more likely due to decreased environmental GaPP uptake. Supporting this notion, DKO worms exhibited lawn avoidance, reduced pharyngeal pumping, and modestly lower intestinal ZnMP accumulation when exposed to this fluorescent heme analog on agar plates. In liquid media, DKO worms demonstrated higher fluorescence, but only in ZnMP-free conditions, suggesting the presence of gut granule autofluorescence. Furthermore, survival following exposure to GaPP was highest in eat-2 mutants, despite heme trafficking being unaffected in this strain.”

      (3) This work reported the opposite result of Sun et al. for the fluorescent ZnMP accumulation assay. However, the experimental protocols used by the two studies are massively different. Sun et al. did the ZnMP staining by incubating the L4-stage worms in an axenic mCeHR2 medium containing 40 μM ZnMP (purchased from Frontier Scientific) and 4 μM heme at 20 ℃ for 16 h, while this work placed the L4-stage worms on the OP50 E. coli seeded NGM plates treated with 40 μM ZnMP (purchased from Santa Cruz) for 16 h. The liquid axenic mCeHR2 medium is bacteria-free, heme-free, and consistent for ZnMP uptake by worms. This work has mentioned that the hrg-9 hrg-10 double null mutant has bacterial lawn avoidance and reduced pharyngeal pumping phenotypes. Therefore, the ZnMP staining protocol used in this work faces challenges in the environmental control for the wild type vs. the mutant. The authors should adopt the ZnMP staining protocol used by Sun et al. for a proper evaluation of fluorescent ZnMP accumulation.

      We agree with this comment. As such, we performed the ZnMP assay in liquid media conditions, as now described on page 13:

      “For liquid media experiments, three generations of worms were cultured in regular heme (20 uM) axenic media, with the first two generations receiving antibiotic-supplemented media (10 mg/ml tetracycline) and the 3<sup>rd</sup> generation cultivated without antibiotic. L4 worms from the 3<sup>rd</sup> generation were placed in media containing 40uM ZnMP for 16 hours before being prepared and mounted for imaging as above. Worms were imaged on Zeiss Axio Imager 2 at 40x magnification, with image settings kept uniform across all images. Fluorescent intensity was measured within the proximal region of the intestine using ImageJ.”

      In heme-free media, both WT and DKO worms invariably entered L1 arrest, thus we were not able to replicate the results reported by Sun et al. Using media containing heme, we did see an increase in fluorescence, but this was only in the ZnMP-free condition, indicating that the increased signal was attributable to autofluorescence. This is a known phenomenon associated with gut granules in C. elegans in the setting of oxidative stress. The results of these experiments are now summarized on page 6:

      “DKO nematodes at the L4 larval stage were previously shown to accumulate the fluorescent heme analog zinc mesoporphyrin IX (ZnMP) in intestinal cells in low-heme (4 µM) liquid media. While attempting to replicate this experiment, we observed that both wildtype and DKO nematodes entered L1 arrest under these conditions. Therefore, to allow for developmental progression, we grew worms on standard OP50 E. coli plates and in media containing physiological levels of heme (20 µM). We then examined whether differences in ZnMP uptake persisted under these basal conditions. DKO worms grown on ZnMP-treated E. coli plates displayed significantly reduced intestinal ZnMP fluorescence compared to N2 (Figure 1B and C). Using basal heme media with ZnMP, there was no significant difference in ZnMP fluorescence between DKO and wildtype nematodes, although DKO worms grown in media without ZnMP exhibited significantly higher autofluorescence (Figure 1D and E). To test whether autofluorescence may have contributed to the higher fluorescent intensities previously reported in heme-deficient DKO worms, we repeated this experiment on agar plates under starved conditions but did not observe a difference between groups (Figure 1B).”

      (4) A striking difference between the two studies is that Sun et al. emphasize the biochemical function of TANGO2 homologs in heme transporting with evidence from some biochemical tests. In contrast, this work emphasizes the physiological function of TANGO2 homologs with evidence from multiple phenotypical observations. In the discussion part, the authors should address whether these observed phenotypes in this study can be due to the loss of heme transporting activities upon eliminating TANGO2 homologs. This action can improve the merit of academic debate and collaboration.

      Thank you for this suggestion. The following text has been added to the Discussion section (page 9):

      “In addition to altered pharyngeal pumping, DKO worms displayed multiple previously unreported phenotypic features, suggesting a broader metabolic impairment and reminiscent of some clinical manifestations observed in patients with TDD. Elucidating the mechanisms underlying this phenotype, and whether they reflect a core bioenergetic defect, is an active area of investigation in our lab. Several C. elegans heme-responsive genes have been characterized, revealing relatively specific defects in heme uptake or utilization rather than broad organismal dysfunction. For example, hrg-1 and hrg-4 mutants exhibit impaired growth only under heme-limited conditions,[23] and hrg-3 loss affects brood size and embryonic viability specifically when maternal heme is scarce.[24] ]By contrast, hrg-9 and hrg-10 mutants exhibit the most severe organismal phenotypes of the hrg family, to date, including reduced pharyngeal pumping, decreased motility, shortened lifespan, and smaller broods, even when fed a heme-replete diet.”

      Reviewer #2 (Public review):

      (1) The manuscript is written mainly as a criticism of a previously published paper. Although reproducibility in science is an issue that needs to be acknowledged, a manuscript should focus on the new data and the experiments that can better prove and strengthen the new claims.

      Thank you for this suggestion. While the primary intent of this study was to replicate key findings from the 2022 publication by Sun et al., the revised manuscript now emphasizes underlying mechanisms more broadly rather than focusing narrowly on that prior publication.

      (2) The current presentation of the logic of the study and its results does not help the authors deliver their message, although they possess great potential.

      We have attempted to rectify this through substantial revision of the Discussion section and other places throughout the manuscript.

      (3) The study is missing experiments to link hrg-9 and hrg-10 more directly to bioenergetic and oxidative stress pathways.

      The reviewer is correct in this assertion, but it was not our intent to definitively prove this link or, indeed, the primary mechanism of TANGO2 in the present manuscript. This said, we are actively engaged in this endeavor in our lab and anticipate these data will be published in a separate, forthcoming publication.

      We have added additional references pertaining to hrg-9 enrichment as part of the mitochondrial unfolded protein response (page 10) and a comparison of the phenotype observed in hrg-9 and hrg-10 deficient worms versus those lacking other proteins in the hrg family (page 9).

      Reviewer #3 (Public review):

      (1) The authors stress - with evidence provided in this paper or indicated in the literature - that the primary role of TANGO2 and its homologues is unlikely to be related to heme trafficking, arguing that observed effects on heme transport are instead downstream consequences of aberrant cellular metabolism. But in light of a mounting body of evidence (referenced by the authors) connecting more or less directly TANGO2 to heme trafficking and mobilization, it is recommended that the authors comment on how they think TANGO2 could relate to and be essential for heme trafficking, albeit in a secondary, moonlighting capacity. This would highlight a seemingly common theme in emerging key players in intracellular heme trafficking, as it appears to be the case for GAPDH - with accumulating evidence of this glycolytic enzyme being critical for heme delivery to several downstream proteins.

      TANGO2 is essential for mitochondrial health, albeit in a yet unknown capacity. In the absence of TANGO2, defects in heme trafficking may be secondary sequelae of mitochondrial dysfunction. We would point out that prior studies that attempted to show that TANGO2 and its homologs are involved in heme trafficking proposed very different mechanisms (direct binding vs. membrane protein interaction) and relied on artificially low or high heme conditions to produce these effects. We have attempted to address these more clearly in the Discussion section and have added a fifth figure to summarize our current unifying theory for how heme levels and mitochondrial stress may be linked.

      (2) The observation - using eat-2 mutants and lawn avoidance behaviour - that survival patterns can be partially explained by reduced consumption, is fascinating. It would be interesting to quantify the two relative contributions.

      We have completed additional ZnMP experiments in liquid media at the reviewers’ request. This experimental condition eliminates lawn avoidance as a factor in consumption. Fluorescent intensity was significantly higher in the DKO worms in media lacking ZnMP, indicating increased autofluorescence in DKO worms, while signal was not significantly different in media with ZnMP.

      (3) In the legend to Figure 1A it's a bit unclear what the differently coloured dots represent for each condition. Repeated measurements, worms, independent experiments? The authors should clarify this.

      The following sentence has been added to the legend for Figure 1:

      “Each dot represents the number of offspring laid by one adult worm on one GaPP-treated plate after 24 hours.”

      (4) It would help if the entire fluorescence images (raw and processed) for the ZnMP treatments were provided. Fluorescence images would also benefit Figure 1B.

      Fluorescent intensity values pertaining to the ZnMP experiments are included in our Extended Data supplement, and we have added representative images to Figure 1, per the reviewer’s request. We thank the reviewer for this helpful suggestion. We would be happy to upload raw images to an open-access repository if deemed necessary by the editorial team.

      (5) Increasingly, the understanding of heme-dependent roles relies on transient or indirect binding to unsuspected partners, not necessarily relying on a tight affinity and outdating the notion of heme as a static cofactor. Despite impressive recent advancements in the detection of these interactions (for example https://doi.org/10.1021/jacs.2c06104; cited by the authors), a full characterisation of the hemome is still elusive. Sandkuhler et al. deemed it possible but seem to question that heme binding to TANGO2 occurs. However, Sun et al. convincingly showed and characterised TANGO2 binding to heme. It is recommended that the authors comment on this.

      We believe it is plausible that TANGO2 binds heme (as do hundreds of other proteins), especially as it has been shown to bind other hydrophobic molecules. However, we also note that a separate paper examining the role of TANGO2 in heme transport posited that GAPDH is the sole heme binding partner for cytoplasmic transport (https://doi.org/10.1038/s41467-025-62819-2), contradicting the originally posited theory of how TANGO2 functions. This is described in the Discussion section and, as noted above, we have added an additional figure to demonstrate our unifying hypothesis for why TANGO2 may be important in the low-heme state, irrespective of any direct effect on heme trafficking.

      Additional comments and revisions:

      (1) It was suggested that a triple mutant (eat-2; hrg-9; hrg-10) be tested to determine the primary driver of GaPP toxicity. We appreciate this suggestion, but we offer the following rationale for why these experiments were not pursued. The eat-2 mutant, which lacks a nicotinic acetylcholine receptor subunit in pharyngeal muscles, was included solely as a dietary restriction control to illustrate that reduced GaPP toxicity in the hrg-9/10 double mutant could arise from poor feeding rather than defective heme transport. Both eat-2 and hrg-9/10 mutants exhibit markedly reduced feeding but via different mechanisms. In our assays, GaPP survival was inversely correlated with ingestion rate: eat-2 animals, which feed the least, showed the highest survival, while hrg-9/10 mutants showed intermediate feeding and intermediate survival. Consistent with this, eat-2 worms also displayed the lowest ZnMP accumulation.

      (2) GaPP solution was added to NGM plates after seeding with OP50. This is now expressly stated in the Methods section (page 15). We would note that Sun et al. mixed GaPP in with NGM in the liquid phase. We would expect that if there were a difference in GaPP exposure due to these different protocols, worms in our experiment would have received higher GaPP concentrations.

      “Standard NGM plates were treated with 1, 2, 5, or 10 µM gallium protoporphyrin IX (GaPP; Santa Cruz) after seeding with OP50. Plates were swirled to ensure an even distribution of GaPP and allowed to dry completely.

      (3) The manuscript has been reworked to read as more of an independent study rather than a rebuttal of prior work, though the primary objective of validating prior work remains unchanged.

      (4) Several technical details of experiments have been moved from the main text to the materials and methods section.

      (5) One reviewer noted that the figure numbering should be adjusted. Numbering does not progress sequentially (i.e., 1A…1B…2A…2B) early in the text, because we have opted to consolidate data pertaining to heme analog experiments in Figure 1 and behavioral data in Figure 2.

      (6) “Kingdoms” has been changed to “domains” (page 4).

      (7) Example images are now included for Figure 1B, as noted above.

    1. eLife Assessment

      This work significantly advances our understanding of chromatin organization within regions of repetitive sequences in the parasitic protozoan Trypanosoma brucei. Using cutting edge interdisciplinary tools, the authors provide compelling evidence for two discrete types of repetitive DNA element-associated proteins- one set involved in essential centromere function; and, the other involved in glycoprotein antigenic variation via homologous recombination. Thus, these fundamental findings have implications for this parasite's biology, and for therapeutic targeting in kinetoplastid diseases. This work will be exciting to those in the centromere/mitosis and parasite immunity fields.

    2. Reviewer #1 (Public review):

      Summary:

      Carloni et al. comprehensively analyze which proteins bind repetitive genomic elements in Trypanosoma brucei. For this, they perform mass spectrometry on custom-designed, tagged programmable DNA-binding proteins. After extensively verifying their programmable DNA-binding proteins (using bioinformatic analysis to infer target sites, microscopy to measure localization, ChIP-seq to identify binding sites), they present, among others, two major findings: 1) 14 of the 25 known T. brucei kinetochore proteins are enriched at 177bp repeats. As T. brucei's 177bp repeat-containing intermediate-sized and mini-chromosomes lack centromere repeats but are stable over mitosis, Carloni et al. use their data to hypothesize that a 'rudimentary' kinetochore assembles at the 177bp repeats of these chromosomes to segregate them. 2) 70bp repeats are enriched with the Replication Protein A complex, which, notably, is required for homologous recombination. Homologous recombination is the pathway used for recombination-based antigenic variation of the 70bp-repeat-adjacent variant surface glycoproteins.

      Strengths and Weaknesses:

      The manuscript was previously reviewed through Review Commons. As noted there, the experiments are well controlled, the claims are well supported, and the methods are clearly described. The conclusions are convincing. All concerns I raised have been addressed except one (minor point #8):

      "The way the authors mapped the ChIP-seq data is potentially problematic when analyzing the same repeat type in different genomic regions. Reads with multiple equally good mapping positions were assigned randomly. This is fine when analyzing repeats by type, independent of genomic position, which is what the authors do to reach their main conclusions. However, several figures (Fig. 3B, Fig. 4B, Fig. 5B, Fig. 7) show the same repeat type at specific genomic locations." Due to the random assignment, all of these regions merely show the average signal for the given repeat. I find it misleading that this average is plotted out at "specific" genomic regions.<br /> Initially, I suggested a workaround, but the authors clarified why the workaround was not feasible, and their explanation is reasonable to me. That said, the figures still show a signal at positions where they can't be sure it actually exists. If this cannot be corrected analytically, it should at least be noted in the figure legends, Results, or Discussion.

      Importantly, the authors' conclusions do not hinge on this point; they are appropriately cautious, and their interpretations remain valid regardless.

      Significance:

      This work is of high significance for chromosome/centromere biology, parasitology, and the study of antigenic variation. For chromosome/centromere biology, the conceptual advancement of different types of kinetochores for different chromosomes is a novelty, as far as I know. It would certainly be interesting to apply this study as a technical blueprint for other organisms with mini-chromosomes or chromosomes without known centromeric repeats. I can imagine a broad range of labs studying other organisms with comparable chromosomes to take note of and build on this study. For parasitology and the study of antigenic variation, it is crucial to know how intermediate- and mini-chromosomes are stable through cell division, as these chromosomes harbor a large portion of the antigenic repertoire. Moreover, this study also found a novel link between the homologous repair pathway and variant surface glycoproteins, via the 70bp repeats. How and at which stages during the process, 70bp repeats are involved in antigenic variation is an unresolved, and very actively studied, question in the field. Of course, apart from the basic biological research audience, insights into antigenic variation always have the potential for clinical implications, as T. brucei causes sleeping sickness in humans and nagana in cattle. Due to antigenic variation, T. brucei infections can be chronic.

      Comments on revised version:

      All my recommendations have been addressed.

    3. Reviewer #2 (Public review):

      The Trypanosoma brucei genome, like that of other eukaryotes, contains diverse repetitive elements. Yet, the chromatin-associated proteome of these regions remains largely unexplored. This study represents a very important conceptual and technical advancement by employing synthetic TALE DNA-binding proteins fused to YFP to selectively capture proteins associated with specific repetitive sequences in T. brucei chromatin. The data presented here are convincing, supported by appropriate controls and a well-validated methodology, aligned with current state-of-the-art approaches.

      The authors used synthetic TALE DNA binding proteins, tagged with YFP, which were designed to target five specific repeat elements in T. brucei genome, including centromere and telomeres-associated repeats and those of a transposon element. This is in order to identify specific proteins that bind to these repetitive sequences in T. brucei chromatin. Validation of the approach was done using a TALE protein designed to target the telomere repeat (TelR-TALE) that detected many of the proteins that were previously implicated with telomeric functions. A TALE protein designed to target the 70 bp repeats that reside adjacent to the VSG genes (70R-TALE) detected proteins that function in DNA repair and a protein designed to target the 177 bp repeat arrays (177R-TALE) identified kinetochore proteins associated T. brucei mega base chromosomes, as well as in intermediate and mini-chromosomes, which imply that kinetochore assembly and segregation mechanisms are similar in all T. brucei chromosomes.

      This study represents a significant conceptual and technical advancement. To the best of our knowledge, it is the first report of employing TALE-YFP for affinity-based detection of protein complexes bound to repetitive genomic sequences in T. brucei. This approach enhances our understanding the organization in these important regions of the trypanosomal chromatin and provides the foundation for investigating the functional roles of associated proteins in parasite biology. These findings will be of particular interest to researchers studying the molecular biology of kinetoplastid parasites and other unicellular organisms, as well as to scientists investigating the roles of repetitive genomic elements in chromatin structure and their functional role in higher eukaryotes.

      Importantly, any essential or unique interacting partners identified using the approach employed here, could serve as a potential target for therapeutic intervention in severe tropical diseases cause by kinetoplastids.

    1. eLife Assessment

      This important study presents an impressive large-scale effort to assess the reproducibility of published findings in the field of Drosophila immunity. The authors analyse 400 papers published between 1959 and 2011, and assess how many of the claims in these papers have been tested in subsequent publications. In a companion article they report the results of experiments to test a subset of the claims that, according to the literature, have not been tested. The present article also explores if various factors related to authors, institutions and journals influence reproducibility in this field. The evidence supporting the claims is solid, but there is considerable scope for strengthening and extending the analysis. The limitations inherent to evaluating reproducibility based on the published literature should also be acknowledged.

    2. Reviewer #1 (Public review):

      Summary:

      The authors set out on the ambitious task of establishing the reproducibility of claims from the Drosophila immunity literature. Starting out from a corpus of 400 articles from 1959 and 2011, the authors sought to determine whether their claims were confirmed or contradicted by previous or subsequent publications. Additionally, they actively sought to replicate a subset of the claims for which no previous replications were available (although this set was not representative of the whole sample, as the authors focused on suspicious and/or easily testable claims). The focus of the article is on inferential reproducibility; thus, methods don't necessarily map exactly to the original ones.

      The authors present a large-scale analysis of the individual replication findings, which are presented in a companion article (Westlake et al., 2025. DOI 10.1101/2025.07.07.663442). In their retrospective analysis of reproducibility, the authors find that 61% of the original claims were verified by the literature, 7.5% were partialy verified, and only 6.8% were challenged, with 23.8% having no replication available. This is in stark contrast with the result of their prospective replications, in which only 16% of claims were successfully reproduced.

      The authors proceed to investigate correlates of replicability, with the most consistent finding being that findings stemming from higher-ranked universities (and possibly from very high impact journals) were more likely to be challenged.

      Strengths:

      (1) The work presents a large-scale, in-depth analysis of a particular field of science that includes authors with deep domain expertise of the field. This is a rare endeavour to establish the reproducibility of a particular subfield of science, and I'd argue that we need many more of these in different areas.

      (2) The project was built on a collaborative basis (https://ReproSci.epfl.ch/), using an online database (https://ReproSci.epfl.ch/), which was used to organize the annotations and comments of the community about the claims. The website remains online and can be a valuable resource to the Drosophila immunity community.

      (3) Data and code are shared in the authors' GitHub repository, with a Jupyter notebook available to reproduce the results.

      Main concerns:

      (1) Although the authors claim that "Drosophila immunity claims are mostly replicable", this conclusion is strictly based on the retrospective analysis - in which around 84% of the claims for which a published verification attempt was found. This is in very stark contrast with the findings that the authors replicate prospectively, of which only 16% are verified.

      Although this large discrepancy may be explained by the fact that the authors focused on unchallenged and suspicious claims (which seems to be their preferred explanation), an alternative hypothesis is that there is a large amount of confirmation bias in the Drosophila immunity literature, either because attempts to replicate previous findings tend to reach similar results due to researcher bias, or because results that validate previous findings are more likely to be published.

      Both explanations are plausible (and, not being an expert in the field, I'd have a hard time estimating their relative probability), and in the absence of prospective replication of a systematic sample of claims - which could determine whether the replication rate for a random sample of claims is as high as that observed in the literature -, both should be considered in the manuscript.

      (2) The fact that the analysis of factors correlating with reproducibility includes both prospective and retrospective replications also leads to the possibility of confusion bias in this analysis. If most of the challenged claims come from the authors' prospective replications, while most of the verified ones come from those that were replicated by the literature, it becomes unclear whether the identified factors are correlated with actual reproducibility of the claims or with the likelihood that a given claim will be tested by other authors and that this replication will be published.

      (3) The methods are very brief for a project of this size, and many of the aspects in determining whether claims were conceptually replicated and how replications were set up are missing.

      Some of these - such as the PubMed search string for the publications and a better description of the annotation process - are described in the companion article, but this could be more explicitly stated. Others, however, remain obscure. Statements such as "Claims were cross-checked with evidence from previous, contemporary and subsequent publications and assigned a verification category" summarize a very complex process for which more detail should be given - in particular because what constitutes inferential reproducibility is not a self-evident concept. And although I appreciate that what constitutes a replication is ultimately a case-by-case decision, a general description of the guidelines used by the authors to determine this should be provided. As these processes were done by one author and reviewed by another, it would also be useful to know the agreement rates between them to have a general sense of how reproducible the annotation process might be.

      The same gap in methods descriptions holds for the prospective replications. How were labs selected, how were experimental protocols developed, and how was the validity of the experiments as a conceptual replication assessed? I understand that providing the methods for each individual replication is beyond the scope of the article, but a general description of how they were developed would be important.

      (4) As far as I could tell, the large-scale analysis of the replication results was not preregistered, and many decisions seem somewhat ad hoc. In particular, the categorization of journals (e.g. low impact, high impact, "trophy") and universities (e.g. top 50, 51-100, 101+) relies on arbitrary thresholds, and it is unclear how much the results are dependent on these decisions, as no sensitivity analyses are provided.

      Particularly, for analyses that correlate reproducibility with continuous variable (such as year of publication, impact factor or university ranking, I'd strongly favor using these variables as continuous variables in the analysis (e.g. using logistic regression) rather than performing pairwise comparisons between categories determined by arbitrary cutoffs. This would not only reduce the impact of arbitrary thresholds in the analysis, but would also increase statistical power in the univariate analyses (as the whole sample can be used in at once) and reduce the number of parameters in the multivariate model (as they will be included as a single variable rather than multiple dummy variables when there are more than two categories).

      (5) The multivariate model used to investigate predictors of replicability includes unchallenged claims along with verified ones in the outcome, which seems like an odd decision. If the intention is to analyze which factors are correlated with reproducibility, it would make more sense to remove the unchallenged findings, as these are likely uninformative in this sense. In fact, based on the authors' own replications of unchallenged findings, they may be more likely to belong the "challenged" category than to the "unchallenged" one if they were to be verified.

    3. Reviewer #2 (Public review):

      Summary:

      Lemaitre et al. conducted an analysis of 400 publications in the Drosophila immunity field (1959-2011), performing both univariable and multivariable analyses to identify factors that correlate with or influence the irreproducibility of scientific claims. Some of the findings are unexpected, for instance, neither the career stage of the PI nor that of the first author appears to matter that much, while others, such as the influence of institutional prestige or publication in "trophy journals," are more predictable. The results provide valuable insight into patterns of irreproducibility in academia and may help inform policies to improve research reproducibility in the field.

      Strengths:

      This study is based on a large, manually curated dataset, complemented by a companion paper (Westlake et al., 2025. DOI 10.1101/2025.07.07.663442) that provides additional details on experimentally documented cases. The statistical methods are appropriate, and the findings are both important and informative. The results are clearly presented and supported by accessible documentation through the ReproSci project.

      Weaknesses:

      The analysis is limited to a specific field (immunity) and model system (Drosophila). Since biological context may influence reproducibility -- for example, depending on whether mechanisms are more hardwired or variable -- and the model system itself may contribute to these effects (as the authors note), it remains unclear to what extent these findings generalize to other fields or organisms. The authors could expand the discussion to address the potential scope and limitations of the study's generalizability.

    4. Reviewer #3 (Public review):

      Summary:

      The authors of this paper were trying to identify how reproducible, or not, their subfield (Drosophilia immunity) was since its inception over 50 years ago. This required identifying not only the papers, but the specific claims made in the paper, assessing if these claims were followed up in the literature, and if so whether the subsequent papers supported or refuted the original claim. In addition to this large manually curated effort, the authors further investigated some claims that were left unchallenged in the literature by conducting replications themselves. This provided a rich corpus of the subfield that could be investigated into what characteristics influence reproducibility.

      Strengths:

      A major strength of this study is the focus on a subfield, the detailing of identifying the main, major, and minor claims - which is a very challenging manual task - and then cataloging not only their assessment of if these claims were followed up in the literature, but also what characteristics might be contributing to reproducibility, which also included more manual effort to supplement the data that they were able to extract from the published papers. While this provides a rich dataset for analysis, there is a major weakness with this approach, which is not unique to this study.

      Weaknesses:

      The main weakness is relying heavily on the published literature as the source for if a claim was determined to be verified or not. There are many documented issues with this stemming from every field of research - such as publication bias, selective reporting, all the way to fraud. It's understandable why the authors took this approach - it is the only way to get at a breadth of the literature - however the flaw with this approach is it takes the literature as a solid ground truth, which it is not. At the same time, it is not reasonable to expect the authors to have conducted independent replications for all of the 400 papers they identified. However, there is a big difference trying to assess the reproducibility of the literature by using the literature as the 'ground truth' vs doing this independently like other large-scale replication projects have attempted to do. This means the interpretation of the data is a bit challenging.

      Below are suggestions for the authors and readers to consider:

      (1) I understand why the authors prefer to mention claims as their primary means of reporting what they found, but it is nested within paper, and that makes it very hard to understand how to interpret these results at times. I also cannot understand at the high-level the relationship between claims and papers. The methods suggest there are 3-4 major claims per paper, but at 400 papers and 1,006 claims, this averages to ~2.5 claims per paper. Can the authors consider describing this relationship better (e.g., distribution of claims and papers) and/or considering presenting the data two ways (primary figures as claims and complimentary supplementary figures with papers as the unit). This will help the reader interpret the data both ways without confusion. I am also curious how the results look when presented both ways (e.g., does shifting to the paper as the unit of analysis shift the figures and interpretation?). This is especially true since the first and last author analysis shows there is varying distribution of papers and claims by authors (and thus the relationship between these is important for the reader).

      (2) As mentioned above, I think the biggest weakness is that the authors are taking the literature at face value when assigning if a claim was validated or challenged vs gathering new independent evidence. This means the paper leans more on papers, making it more like a citation analysis vs an independent effort like other large-scale replication projects. I highly recommend the authors state this in their limitations section.

      On top of that, I have questions that I could not figure out (though I acknowledge I did not dig super deep into the data to try). The main comment I have is How was verified (and challenged) determined? It seems from the methods it was determined by "Claims were cross-checked with evidence from previous, contemporary and subsequent publications and assigned a verification category". If this is true, and all claims were done this way - are verified claims double counted then? (e.g., an original claim is found by a future claim to be verified - and thus that future claim is also considered to be verified because of the original claim).

      Related, did the authors look at the strength of validation or challenged claims? That is, if there is a relationship mapping the authors did for original claims and follow-up claims, I would imagine some claims have deeper (i.e., more) claims that followed up on them vs others. This might be interested to look at as well.

      (3) I recommend the authors add sample sizes when not present (e.g., Fig 4C). I also find that the sample sizes are a bit confusing, and I recommend the authors check them and add more explanation when not complete, like they did for Fig 4A. For example, Fig 7B equals to 178 labs (how did more than 156 labs get determined here?), and yet the total number of claims is 996 (opposed to 1,006). Another example, is why does Fig 8B not have all 156 labs accounted for? (related to Fig 8B, I caution on reporting a p value and drawing strong conclusions from this very small sample size - 22 authors). As a last example, Fig 8C has al 156 labs and 1,006 claims - is that expected? I guess it means authors who published before 1995 (as shown in Figure 8A continued to publish after 1995?) in that case, it's all authors? But the text says when they 'set up their lab' after 1995, but how can that be?

      (4) Finally, I think it would help if the authors expanded on the limitations generally and potential alternative explanations and/or driving factors. For example, the line "though likely underestimated' is indicated in the discussion about the low rate of challenged claims, it might be useful to call out how publication bias is likely the driver here and thus it needs to be carefully considered in the interpretation of this. Related, I caution the authors on overinterpreting their suggestive evidence. The abstract for example, states claims of what was found in their analysis, when these are suggestive at best, which the authors acknowledge in the paper. But since most people start with the abstract, I worry this is indicating stronger evidence than what the authors actually have.

      The authors should be applauded for the monumental effort they put into this project, which does a wonderful job of having experts within a subfield engage their community to understand the connectiveness of the literature and attempt to understand how reliable specific results are and what factors might contribute to them. This project provides a nice blueprint for others to build from as well as leverage the data generated from this subfield, and thus should have an impact in the broader discussion on reproducibility and reliability of research evidence.

    1. eLife Assessment

      This study introduces an important approach using selection linked integration (SLI) to generate Plasmodium falciparum lines expressing single, specific surface adhesins PfEMP1 variants, enabling precise study of PfEMP1 trafficking, receptor binding, and cytoadhesion. By moving the system to different parasite strains and introducing an advanced SLI2 system for additional genomic edits, this work provides compelling evidence for an innovative and rigorous platform to explore PfEMP1 biology and identify novel proteins essential for malaria pathogenesis including immune evasion.

    2. Reviewer #1 (Public review):

      One of the roadblocks in PfEMP1 research has been the challenges in manipulating var genes to incorporate markers to allow the transport of this protein to be tracked and to investigate the interactions taking place within the infected erythrocyte. In addition, the ability of Plasmodium falciparum to switch to different PfEMP1 variants during in vitro culture has complicated studies due to parasite populations drifting from the original (manipulated) var gene expression. Cronshagen et al have provided a useful system with which they demonstrate the ability to integrate a selectable drug marker into several different var genes that allows the PfEMP1 variant expression to be 'fixed'. This on its own represents a useful addition to the molecular toolbox and the range of var genes that have been modified suggests that the system will have broad application. As well as incorporating a selectable marker, the authors have also used selective linked integration (SLI) to introduce markers to track the transport of PfEMP1, investigate the route of transport and probe interactions with PfEMP1 proteins in the infected host cell.

      One of the major strengths of this paper is that the authors have not only put together a robust system for further functional studies, but they have used it to produce a range of interesting findings including:

      Co-activation of rif and var genes when in a head-to-head orientation.

      The reduced control of expression of var genes in the 3D7-MEED parasite line.

      More support for the PTEX transport route for PfEMP1.<br /> Identification of new proteins involved in PfEMP1 interactions in the infected erythrocyte, including some required for cytoadherence.

      In most cases the experimental evidence is straightforward, and the data support the conclusions strongly. The authors have been very careful in the depth of their investigation, and where unexpected results have been obtained, they have looked carefully at why these have occurred.

      A weakness of the paper is, as mentioned above, that the results are sometimes not as clear as might have been expected, for example, in the requirement for panning modified parasites to produce binding to EPCR. Where this has happened, the authors take a robust and thoughtful approach, and acknowledge that (as in most research) there are more questions to address. Being able to select specific var gene switches using drug markers will provide some useful starting points to understand how switching happens in P. falciparum. However, our trypanosome colleagues might remind us that forcing switches may show us some mechanisms, but perhaps not all.

      Despite these sometimes complicated findings, the authors have achieved their aim as stated in the title of the paper, and in doing so have provided an excellent resource to themselves and other researchers in the field to answer some important questions.

      Overall, the authors have produced a useful and robust system to support functional studies on PfEMP1, which provides a platform for future studies manipulating the domain content in var genes. They have used this system to produce a range of interesting findings and to support its use by the research community.

      Comments on revisions:

      I have no further recommendations for changes by the authors. They have addressed my concerns, and the paper reads very well.

    3. Reviewer #2 (Public review):

      Summary

      Croshagen et al develop a range of tools based on selection-linked integration (SLI) to study PfEMP1 function in P. falciparum. PfEMP1 is encoded by a family of ~60 var genes subject to mutually exclusive expression. Switching expression between different family members can modify the binding properties of the infected erythrocyte while avoiding the adaptive immune response. Although critical to parasite survival and Malaria disease pathology, PfEMP1 proteins are difficult to study owing to their large size and variable expression between parasites within the same population. The SLI approach previously developed by this group for genetic modification of P. falciparum is employed here to selectively and stably activate expression of target var genes at the population level. Using this strategy, the binding properties of specific PfEMP1 variants were measured for several distinct var genes with a novel semi-automated pipeline to increase throughput and reduce bias. Activation of similar var genes in both the common lab strain 3D7 and the cytoadhesion competent FCR3/IT4 strain revealed higher binding for several PfEMP1 IT4 variants with distinct receptors, indicating this strain provides a superior background for studying PfEMP1 binding. SLI also enables modifications to target var gene products to study PfEMP1 trafficking and identify interacting partners by proximity-labeling proteomics, revealing two novel exported proteins required for cytoadherence. Overall, the data demonstrate a range of SLI-based approaches for studying PfEMP1 that will be broadly useful for understanding the basis for cytoadhesion and parasite virulence.

      Comments:

      While the capability of SLI to active selected var gene expression was initially reported by Omelianczyk et al., the present study greatly expands the utility of this approach. Several distinct var genes are activated in two different P. falciparum strains and shown to modify the binding properties of infected RBCs to distinct endothelial receptors; development of SLI2 enables multiple SLI modifications in the same parasite line; SLI is used to modify target var genes to study PfEMP1 trafficking and determine PfEMP1 interactomes with BioID. Along the way, the authors also demonstrate a new selection marker for P. falciparum transfection (a mutant FNT lactate transporter that provides resistance to the compound BH267.meta). Curiously, Omelianczyk et al activated a single var (Pf3D7_0421300) and observed elevated expression of an adjacent var arranged in a head to tail manner, possibly resulting from local chromatin modifications enabling expression of the neighboring gene. In contrast, the present study observed activation of neighboring genes with head to head but not head to tail arrangement, which may be the result of shared promoter regions. The reason for these differing results is unclear although it should be noted that the two studies examined different var loci.

      The IT4var19 panned line that became binding-competent showed increased expression of both paralogs of ptp3 (as well as a phista and gbp), suggesting that overexpression of PTP3 may improve PfEMP1 display and binding. Interestingly, IT4 appears to be the only known P. falciparum strain (only available in PlasmoDB) that encodes more than one ptp3 gene (PfIT_140083100 and PfIT_140084700). PfIT_140084700 is almost identical to the 3D7 PTP3 (except for a ~120 residue insertion in 3D7 beginning at residue 400). In contrast, while the C-terminal region of PfIT_140083100 shows near perfect conservation with 3D7 PTP3 beginning at residue 450, the N-terminal regions between the PEXEL and residue 450 are quite different. This may indicate the generally stronger receptor binding observed in IT4 relative to 3D7 results from increased PTP3 activity due to multiple isoforms or that specialized trafficking machinery exists for some PfEMP1 proteins.

      Revisions:

      The authors thoughtfully addressed all the reviewer comments.

    4. Reviewer #3 (Public review):

      Summary:

      The submission from Cronshagen and colleagues describes the application of a previously described method (selection linked integration) to the systematic study of PfEMP1 trafficking in the human malaria parasite Plasmodium falciparum. PfEMP1 is the primary virulence factor and surface antigen of infected red blood cells and is therefore a major focus of research into malaria pathogenesis. Since the discovery of the var gene family that encodes PfEMP1 in the late 1990s, there have been multiple hypotheses for how the protein is trafficked to the infected cell surface, crossing multiple membranes along the way. One difficulty in studying this process is the large size of the var gene family and the propensity of the parasites to switch which var gene is expressed, thus preventing straightforward gene modification-based strategies for tagging the expressed PfEMP1. Here the authors solve this problem by forcing expression of a targeted var gene by fusing the PfEMP1 coding region with a drug selectable marker separated by a skip peptide. This enabled them to generate relatively homogenous populations of parasites all expressing tagged (or otherwise modified) forms of PfEMP1 suitable for study. They then applied this method to study various aspects of PfEMP1 trafficking.

      Strengths:

      The study is very thorough, and the data are well presented. The authors used SLI to target multiple var genes, thus demonstrating the robustness of their strategy. They then perform experiments to investigate possible trafficking through PTEX, they knockout proteins thought to be involved in PfEMP1 trafficking and observe defects in cytoadherence, and they perform proximity labeling to further identify proteins potentially involved in PfEMP1 export. These are independent and complimentary approaches that together tell a very compelling story.

      Weaknesses:

      (1) When the authors targeted IT4var19, they were successful in transcriptionally activating the gene, however they did not initially obtain cytoadherent parasites. To observe binding to ICAM-1 and EPCR, they had to perform selection using panning. This is an interesting observation and potentially provides insights into PfEMP1 surface display, folding, etc. However, it also raises questions about other instances in which cytoadherence was not observed. Would panning of these other lines have successfully selected for cytoadherent infected cells? Did the authors attempt panning of their 3D7 lines? Given that these parasites do export PfEMP1 to the infected cell surface (Figure 1D), it is possible that panning would similarly rescue binding. Likewise, the authors knocked out PTP1, TryThrA and EMPIC3 and detected a loss of cytoadhesion, but they did not attempt panning to see if this could rescue binding. The strong selection that panning exerts on parasite populations could result in selection of compensatory changes that enable cytoadherence, which could be very informative, although the analysis could potentially be quite complicated and beyond the scope of the current paper. Nonetheless, these are important concepts to consider when assessing these phenotypes.

      (2) The authors perform a series of trafficking experiments to help discern whether PfEMP1 is trafficked through PTEX. While the results were not entirely definitive, they make a strong case for PTEX in PfEMP1 export. The authors then used BioID to obtain a proxiome for PfEMP1 and identified proteins they suggest are involved in PfEMP1 trafficking. However, it seemed that components of PTEX were missing from the list of interacting proteins. Is this surprising and does this observation shed any additional light on the possibility of PfEMP1 trafficking through PTEX? This warrants a comment or discussion.

      Comments on revisions:

      The authors have responded thoroughly and constructively to suggestions and comments in the initial review. I have no additional comments. This is a great contribution to the literature.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment:

      This study introduces an important approach using selection linked integration (SLI) to generate Plasmodium falciparum lines expressing single, specific surface adhesins PfEMP1 variants, enabling precise study of PfEMP1 trafficking, receptor binding, and cytoadhesion. By moving the system to different parasite strains and introducing an advanced SLI2 system for additional genomic edits, this work provides compelling evidence for an innovative and rigorous platform to explore PfEMP1 biology and identify novel proteins essential for malaria pathogenesis including immune evasion.

      Reviewer #1 (Public review):

      One of the roadblocks in PfEMP1 research has been the challenges in manipulating var genes to incorporate markers to allow the transport of this protein to be tracked and to investigate the interactions taking place within the infected erythrocyte. In addition, the ability of Plasmodium falciparum to switch to different PfEMP1 variants during in vitro culture has complicated studies due to parasite populations drifting from the original (manipulated) var gene expression. Cronshagen et al have provided a useful system with which they demonstrate the ability to integrate a selectable drug marker into several different var genes that allows the PfEMP1 variant expression to be 'fixed'. This on its own represents a useful addition to the molecular toolbox and the range of var genes that have been modified suggests that the system will have broad application. As well as incorporating a selectable marker, the authors have also used selective linked integration (SLI) to introduce markers to track the transport of PfEMP1, investigate the route of transport, and probe interactions with PfEMP1 proteins in the infected host cell.

      What I particularly like about this paper is that the authors have not only put together what appears to be a largely robust system for further functional studies, but they have used it to produce a range of interesting findings including:

      Co-activation of rif and var genes when in a head-to-head orientation.

      The reduced control of expression of var genes in the 3D7-MEED parasite line.

      More support for the PTEX transport route for PfEMP1.

      Identification of new proteins involved in PfEMP1 interactions in the infected erythrocyte, including some required for cytoadherence.

      In most cases the experimental evidence is straightforward, and the data support the conclusions strongly. The authors have been very careful in the depth of their investigation, and where unexpected results have been obtained, they have looked carefully at why these have occurred.

      We thank the reviewer for the kind assessment and the comments to improve the paper.

      (1) In terms of incorporating a drug marker to drive mono-variant expression, the authors show that they can manipulate a range of var genes in two parasite lines (3D7 and IT4), producing around 90% expression of the targeted PfEMP1. Removal of drug selection produces the expected 'drift' in variant types being expressed. The exceptions to this are the 3D7-MEED line, which looks to be an interesting starting point to understand why this variant appears to have impaired mutually exclusive var gene expression and the EPCR-binding IT4var19 line. This latter finding was unexpected and the modified construct required several rounds of panning to produce parasites expressing the targeted PfEMP1 and bind to EPCR. The authors identified a PTP3 deficiency as the cause of the lack of PfEMP1 expression, which is an interesting finding in itself but potentially worrying for future studies. What was not clear was whether the selected IT4var19 line retained specific PfEMP1 expression once receptor panning was removed.

      We do not have systematic long-term data for the Var19 line but do have medium-term data. After panning the Var19 line, the binding assays were done within 3 months without additional panning. The first binding assay was 2 months after the panning and the last binding assays three weeks later, totaling about 3 months without panning. While there is inherent variation in these assays that precludes detection of smaller changes, the last assay showed the highest level of binding, giving no indication for rapid loss of the binding phenotype. Hence, we can say that the binding phenotype appears to be stable for many weeks without panning the cells again and there was no indication for a rapid loss of binding in these parasites.

      Systematic long-term experiments to assess how long the Var19 parasites retain binding would be interesting, but given that the binding-phenotype appears to remain stable over many weeks or even months, this would only make sense if done over a much longer time frame. Such data might arise if the line is used over extended times for a specific project in which case it might be advisable to monitor continued binding. We included a statement in the discussion that the binding phenotype was stable over many weeks but that if long-term work with this line is planned, monitoring the binding phenotype might be advisable: “In the course of this work the binding phenotype of the IT4var19 expressor line remained stable over many weeks without further panning. However, given that initial panning had been needed for this particular line, it might be advisable for future studies to monitor the binding phenotype if the line is used for experiments requiring extended periods of cultivation.”

      (2) The transport studies using the mDHFR constructs were quite complicated to understand but were explained very clearly in the text with good logical reasoning.

      We are aware of this being a complex issue and are glad this was nevertheless understandable.

      (3) By introducing a second SLI system, the authors have been able to alter other genes thought to be involved in PfEMP1 biology, particularly transport. An example of this is the inactivation of PTP1, which causes a loss of binding to CD36 and ICAM-1. It would have been helpful to have more insight into the interpretation of the IFAs as the anti-SBP1 staining in Figure 5D (PTP-TGD) looks similar to that shown in Figure 1C, which has PTP intact. The anti-EXP2 results are clearly different.

      We realize the description of the PTP1-TGD IFA data and that of the other TGDs (see also response to Recommendation to authors point 4 and reviewer 2, major points 6 and 7) was rather cursory. The previously reported PTP1 phenotype is a fragmentation of the Maurer’s clefts into what in IFA appear to be many smaller pieces (Rug et al 2014, referenced in the manuscript). The control in Fig. 5D has 13 Maurer’s cleft spots (previous work indicates an average of ~15 MC per parasite, see e.g. the originally co-submitted eLife preprint doi.org/10.7554/eLife.103633.1 and references therein). The control mentioned by the reviewer in Fig. 1C has about 22 Maurer’s clefts foci, at the upper end of the typical range, but not unusual. In contrast, the PTP1-TGD in Fig. 5D, has more than 30 foci with an additional cytoplasmic pool and additional smaller, difficult to count foci. This is consistent with the published phenotype in Rug et al 2014. The EXP1 stained cell has more than 40 Maurer’s cleft foci, again beyond what typically is observed in controls. Therefore, these cells show a difference to the control in Fig. 5 but also to Fig. 1C. Please note that we are looking at two different strains, in Fig. 1 it is 3D7 and in Fig. 5 IT4. While we did not systematically assess this, the Maurer’s clefts number per cell seemed to be largely comparable between these strains (Fig. 10C and D in the other eLife preprint doi.org/10.7554/eLife.103633.1). 

      Overall, as the PTP1 loss phenotype has already been reported, we did not go into more experimental detail. However, we now modified the text to more clearly describe how the phenotype in the PTP1-TGD parasites was different to control: “IFAs showed that in the PTP1-TGD parasites, SBP1 and PfEMP1 were found in many small foci in the host cell that exceeded the average number of ~ 15 Maurer’s clefts typically found per infected RBC [66] (Fig. 5D). This phenotype resembled the previously reported Maurer’s clefts phenotype of the PTP1 knock out in CS2 parasites [39].”

      (4) It is good to see the validation of PfEMP1 expression includes binding to several relevant receptors. The data presented use CHO-GFP as a negative control, which is relevant, but it would have been good to also see the use of receptor mAbs to indicate specific adhesion patterns. The CHO system if fine for expression validation studies, but due to the high levels of receptor expression on these cells, moving to the use of microvascular endothelial cells would be advisable. This may explain the unexpected ICAM-1 binding seen with the panned IT4var19 line.

      We agree with the reviewer that it is desirable to have better binding systems for studying individual binding interactions. As the main purpose of this paper was to introduce the system and provide proof of principle that the cells show binding, we did not move to more complicated binding systems. However, we would like to point out that the CSA binding was done on receptor alone in addition to the CSA-expressing HBEC-5i cells and was competed successfully with soluble CSA. In addition, apart from the additional ICAM1-binding of the Var19 line, all binding phenotypes were conform with expectations. We therefore hope the tools used for binding studies are acceptable at this stage of introducing the system while future work interested in specific PfEMP1 receptor interactions may use better systems, tailored to the specific question (e.g. endothelial organoid models and engineered human capillaries and inhibitory antibodies or relevant recombinant domains for competition).

      (5) The proxiome work is very interesting and has identified new leads for proteins interacting with PfEMP1, as well as suggesting that KAHRP is not one of these. The reduced expression seen with BirA* in position 3 is a little concerning but there appears to be sufficient expression to allow interactions to be identified with this construct. The quantitative impact of reduced expression for proxiome experiments will clearly require further work to define it.

      This is a valid point. Clearly there seems to be some impact on binding when BirA* is placed in the extracellular domain (either through reduced presentation or direct reduction of binding efficiency of the modified PfEMP1; please see also minor comment 10 reviewer 2). The exact quantitative impact on the proxiome is difficult to assess but we note that the relative enrichment of hits to each other is rather similar to the other two positions (Fig. 6H-J). We therefore believe the BioIDs with the 3 PfEMP1-BirA* constructs are sufficient to provide a general coverage of proteins proximal to PfEMP1 and hope this will aid in the identification of further proteins involved in PfEMP1 transport and surface display as illustrated with two of the hits targeted here.

      The impact of placing a domain on the extracellular region of PfEMP1 will have to be further evaluated if needed in other studies. But the finding that a large folded domain can be placed into this part at all, even if binding was reduced, in our opinion is a success (it was not foreseeable whether any such change would be tolerated at all).

      (6) The reduced receptor binding results from the TryThrA and EMPIC3 knockouts were very interesting, particularly as both still display PfEMP1 on the surface of the infected erythrocyte. While care needs to be taken in cross-referencing adhesion work in P. berghei and whether the machinery truly is functionally orthologous, it is a fair point to make in the discussion. The suggestion that interacting proteins may influence the "correct presentation of PfEMP1" is intriguing and I look forward to further work on this.

      We hope future work will be able to shed light on this.

      Overall, the authors have produced a useful and reasonably robust system to support functional studies on PfEMP1, which may provide a platform for future studies manipulating the domain content in the exon 1 portion of var genes. They have used this system to produce a range of interesting findings and to support its use by the research community. Finally, a small concern. Being able to select specific var gene switches using drug markers could provide some useful starting points to understand how switching happens in P. falciparum. However, our trypanosome colleagues might remind us that forcing switches may show us some mechanisms but perhaps not all.

      Point noted! From non-systematic data with the Var01 line that has been cultured for extended periods of time (several years), it seems other non-targeted vars remain silent in our SLI “activation” lines but how much SLI-based var-expression “fixing” tampers with the integrity of natural switching mechanisms is indeed very difficult to gage at this stage. We now added a statement to the discussion that even if mutually exclusive expression is maintained, it is not certain the mechanisms controlling var expression all remain intact: “However, it should be noted that it is not known whether all mechanisms controlling mutually exclusive expression and switching remain intact in parasites with SLI-activated var genes.”

      Reviewer #2 (Public review):

      Summary

      Croshagen et al develop a range of tools based on selection-linked integration (SLI) to study PfEMP1 function in P. falciparum. PfEMP1 is encoded by a family of ~60 var genes subject to mutually exclusive expression. Switching expression between different family members can modify the binding properties of the infected erythrocyte while avoiding the adaptive immune response. Although critical to parasite survival and Malaria disease pathology, PfEMP1 proteins are difficult to study owing to their large size and variable expression between parasites within the same population. The SLI approach previously developed by this group for genetic modification of P. falciparum is employed here to selectively and stably activate the expression of target var genes at the population level. Using this strategy, the binding properties of specific PfEMP1 variants were measured for several distinct var genes with a novel semi-automated pipeline to increase throughput and reduce bias. Activation of similar var genes in both the common lab strain 3D7 and the cytoadhesion competent FCR3/IT4 strain revealed higher binding for several PfEMP1 IT4 variants with distinct receptors, indicating this strain provides a superior background for studying PfEMP1 binding. SLI also enables modifications to target var gene products to study PfEMP1 trafficking and identify interacting partners by proximity-labeling proteomics, revealing two novel exported proteins required for cytoadherence. Overall, the data demonstrate a range of SLI-based approaches for studying PfEMP1 that will be broadly useful for understanding the basis for cytoadhesion and parasite virulence.

      We thank the reviewer for the kind assessment and the comments to improve the paper.

      Comments

      (1) While the capability of SLI to actively select var gene expression was initially reported by Omelianczyk et al., the present study greatly expands the utility of this approach. Several distinct var genes are activated in two different P. falciparum strains and shown to modify the binding properties of infected RBCs to distinct endothelial receptors; development of SLI2 enables multiple SLI modifications in the same parasite line; SLI is used to modify target var genes to study PfEMP1 trafficking and determine PfEMP1 interactomes with BioID. Curiously, Omelianczyk et al activated a single var (Pf3D7_0421300) and observed elevated expression of an adjacent var arranged in a head-to-tail manner, possibly resulting from local chromatin modifications enabling expression of the neighboring gene. In contrast, the present study observed activation of neighboring genes with head-to-head but not head-totail arrangement, which may be the result of shared promoter regions. The reason for these differing results is unclear although it should be noted that the two studies examined different var loci.

      The point that we are looking at different loci is very valid and we realize this is not mentioned in the discussion. We now added to the discussion that it is unclear if our results and those cited may be generalized and that different var gene loci may respond differently

      “However, it is unclear if this can be generalized and it is possible that different var loci respond differently.”

      (2) The IT4var19 panned line that became binding-competent showed increased expression of both paralogs of ptp3 (as well as a phista and gbp), suggesting that overexpression of PTP3 may improve PfEMP1 display and binding. Interestingly, IT4 appears to be the only known P. falciparum strain (only available in PlasmoDB) that encodes more than one ptp3 gene (PfIT_140083100 and PfIT_140084700). PfIT_140084700 is almost identical to the 3D7 PTP3 (except for a ~120 residue insertion in 3D7 beginning at residue 400). In contrast, while the C-terminal region of PfIT_140083100 shows near-perfect conservation with 3D7 PTP3 beginning at residue 450, the N-terminal regions between the PEXEL and residue 450 are quite different. This may indicate the generally stronger receptor binding observed in IT4 relative to 3D7 results from increased PTP3 activity due to multiple isoforms or that specialized trafficking machinery exists for some PfEMP1 proteins.

      We thank the reviewer for pointing this out, the exact differences between the two PTP3s of IT4 and that of other strains definitely should be closely examined if the function of these proteins in PfEMP1 binding is analysed in more detail. 

      It is an interesting idea that the PTP3 duplication could be a reason for the superior binding of IT4. We always assumed that IT4 had better binding because it was less culture adapted but this does not preclude that PTP3(s) is(are) a reason for this. However, at least in our 3D7 PTP3 can’t be the reason for the poor binding, as our 3D7 still has PfEMP1 on the surface while in the unpanned IT4-Var19 line and in the Maier et al., Cell 2008 ptp3 KO (PMID: 18614010)) PfEMP1 is not on the surface anymore. 

      Testing the impact of having two PTP3s would be interesting, but given the “mosaic” similarity of the two PTP3s isoforms, a simple add-on experiment might not be informative. Nevertheless, it will be interesting in future work to explore this in more detail.

      Reviewer #3 (Public review):

      Summary:

      The submission from Cronshagen and colleagues describes the application of a previously described method (selection linked integration) to the systematic study of PfEMP1 trafficking in the human malaria parasite Plasmodium falciparum. PfEMP1 is the primary virulence factor and surface antigen of infected red blood cells and is therefore a major focus of research into malaria pathogenesis. Since the discovery of the var gene family that encodes PfEMP1 in the late 1990s, there have been multiple hypotheses for how the protein is trafficked to the infected cell surface, crossing multiple membranes along the way. One difficulty in studying this process is the large size of the var gene family and the propensity of the parasites to switch which var gene is expressed, thus preventing straightforward gene modification-based strategies for tagging the expressed PfEMP1. Here the authors solve this problem by forcing the expression of a targeted var gene by fusing the PfEMP1 coding region with a drug-selectable marker separated by a skip peptide. This enabled them to generate relatively homogenous populations of parasites all expressing tagged (or otherwise modified) forms of PfEMP1 suitable for study. They then applied this method to study various aspects of PfEMP1 trafficking.

      Strengths:

      The study is very thorough, and the data are well presented. The authors used SLI to target multiple var genes, thus demonstrating the robustness of their strategy. They then perform experiments to investigate possible trafficking through PTEX, they knock out proteins thought to be involved in PfEMP1 trafficking and observe defects in cytoadherence, and they perform proximity labeling to further identify proteins potentially involved in PfEMP1 export. These are independent and complimentary approaches that together tell a very compelling story.

      We thank the reviewer for the kind assessment and the comments to improve the paper.

      Weaknesses:

      (1)  When the authors targeted IT4var19, they were successful in transcriptionally activating the gene, however, they did not initially obtain cytoadherent parasites. To observe binding to ICAM-1 and EPCR, they had to perform selection using panning. This is an interesting observation and potentially provides insights into PfEMP1 surface display, folding, etc. However, it also raises questions about other instances in which cytoadherence was not observed. Would panning of these other lines have been successfully selected for cytoadherent infected cells? Did the authors attempt panning of their 3D7 lines? Given that these parasites do export PfEMP1 to the infected cell surface (Figure 1D), it is possible that panning would similarly rescue binding. Likewise, the authors knocked out PTP1, TryThrA, and EMPIC3 and detected a loss of cytoadhesion, but they did not attempt panning to see if this could rescue binding. To ensure that the lack of cytoadhesion in these cases is not serendipitous (as it was when they activated IT4var19), they should demonstrate that panning cannot rescue binding.

      These are very important considerations. Indeed, we had repeatedly attempted to pan 3D7 when we failed to get the SLI-generated 3D7 PfEMP1 expressor lines to bind, but this had not been successful. The lack of binding had been a major obstacle that had held up the project and was only solved when we moved to IT4 which readily bound (apart from Var19 which was created later in the project). After that we made no further efforts to understand why 3D7 does not bind but the fact that PfEMP1 is on the surface indicates this is not a PTP3 issue because loss of PTP3 also leads to loss of PfEMP1 surface display. Also, as the parent 3D7 could not be panned, we assumed this issue is not easily fixed in the SLI var lines we made in 3D7.

      Panning the TGD lines: we see the reasoning for conducting panning experiments with the TGD lines. However, on second thought, we are unsure this should be attempted. The outcome might not be easily interpretable as at least two forces will contribute to the selection in panning experiments with TGD lines that do not bind anymore:

      Firstly, panning would work against the SLI of the TGD, resulting in a tug of war between the TGD-SLI and binding. This is because a small number of parasites will loop out the TGD plasmid (revert) and would normally be eliminated during standard culturing due to the SLI drug used for the TGD. These revertant cells would bind and the panning would enrich them. Hence, panning and SLI are opposed forces in the case of a TGD abolishing binding. It is unclear how strong this effect would be, but this would for sure lead to mixed populations that complicate interpretations. 

      The second selecting force are possible compensatory changes to restore binding. These can be due to different causes: (i) reversal of potential independent changes that may have occurred in the TGD parasites and that are in reality causing the binding loss (i.e. such as ptp3 loss or similar, the concern of the reviewer) or (ii) new changes to compensate the loss of the TGD target (in this case the TGD is the cause of the binding loss but for instance a different change ameliorates it by for instance increasing PfEMP1 expression or surface display). As both TGDs show some residual binding and have VAR01 on the surface to at least some extent, it is possible that new compensatory changes might indeed occur that indirectly increase binding again. 

      In summary, even if more binding occurs after panning of the lines, it is not clear whether this is due to a compensatory change ameliorating the TGD or reversal of an unrelated change or are counter-selections against the SLI. To determine the cause, the panned TGD lines would need to be subjected to a complex and time-consuming analysis (WGS, RNASeq, possibly Maurer’s clefts phenotype) to find out whether they were SLI-revertants, or had an unrelated chance that was reverted or a new compensatory change that helps binding. This might be further muddled if a mix of cells come out of the selection that have different changes of the options indicated above. In that case, it might even require scRNASeq to make sense of the panning experiment. Due to the envisaged difficulty in interpreting the outcome, we did not attempt this panning.

      To exclude loss of ptp3 expression as the reason for binding loss (something we would not have seen in the WGS if it is only due to a transcriptional change), we now carried out RNASeq with the TGD lines that have a binding phenotype. While we did not generate replicas to obtain quantitative data, the results show that both ptp3 copies were expressed in these TGDs comparable to other parasite lines that do bind with the same SLI-activated var gene, indicating that the effect is not due to ptp3 (see response to point 4 on PTP3 expression in the Recommendations for the authors). While we can’t fully exclude other changes in the TGDs that might affect binding, the WGS did not show any obvious alterations that could be responsible for this. 

      (2) The authors perform a series of trafficking experiments to help discern whether PfEMP1 is trafficked through PTEX. While the results were not entirely definitive, they make a strong case for PTEX in PfEMP1 export. The authors then used BioID to obtain a proxiome for PfEMP1 and identified proteins they suggest are involved in PfEMP1 trafficking. However, it seemed that components of PTEX were missing from the list of interacting proteins. Is this surprising and does this observation shed any additional light on the possibility of PfEMP1 trafficking through PTEX? This warrants a comment or discussion.

      This is an interesting point and we agree that this warrants to be discussed. A likely reason why PTEX components are not picked up as interactors is that BirA* is expected to be unfolded when it passes through the channel and in that state can’t biotinylate. Labelling likely would only be possible if PfEMP1 lingered at the PTEX translocation step before BirA* became unfolded to go through the channel which we would not expect under physiological conditions. We added the following sentences to the discussion: “While our data indicates PfEMP1 uses PTEX to reach the host cell, this could be expected to have resulted in the identification of PTEX components in the PfEMP1 proxiomes, which was not the case. However, as BirA* must be unfolded to pass through PTEX, it likely is unable to biotinylate translocon components unless PfEMP1 is stalled during translocation. For this reason, a lack of PTEX components in the PfEMP1 proxiomes does not necessarily exclude passage through PTEX.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Most of my comments are in the public section. I would just highlight a few things:

      (1) In the binding studies section you talk about "human brain endothelial cells (HBEC-5i)". These cells do indeed express CSA but this is a property of their immortalisation rather than being brain endotheliium, which does not express CSA. I think this could be confusing to readers so I think you might want to reword this sentence to focus on CSA expressing the cell line rather than other features.

      We thank the reviewer for pointing this out, we now modified the sentence to focus on the fact these are CSA expressing cells and provided a reference for it.

      (2) As I said in the public section, CHO cells are great for proof of concept studies, but they are not endothelium. Not a problem for this paper.

      Noted! Please also see our response to the public review.

      (3) I wonder whether your comment about how well tolerated the Bir3* insertion is may be a bit too strong. I might say "Nonetheless, overall the BirA* modified PfEMP1 were functional."

      Changed as requested.

      (4) I'm not sure how you explain the IFA staining patterns to the uninitiated, but perhaps you could explain some of the key features you are looking for.

      We apologise for not giving an explanation of the IFA staining patterns in the first place. Please see detailed response to public review of this reviewer (point 3 on PTP1-TGD phenotype) and to reviewer 2 (Recommendations to the authors, points 6 and 7 on better explaining and quantifying the Maurer’s clefts phenotypes). For this we now also generated parasites that episomally express mCherry tagged SBP1 in the TGD parasites with the reduced binding phenotype. This resulted in amendments to Fig. S7, addition of a Fig. S8 and updated results to better explain the phenotypes. 

      This is a great paper - I just wish I'd had this system before.

      Thank you!

      Reviewer #2 (Recommendations for the authors):

      Major Comments

      (1) Does the RNAseq analysis of 3D7var0425800 and 3D7MEEDvar0425800 (Figure 1G, H) reveal any differential gene expression that might suggest a basis for loss of mutually exclusive var expression in the MEED line?

      We now carried out a thorough analysis of these RNASeq experiments to look for an underlying cause for the phenotype. This was added as new Figure 1J and new Table S3. This analysis again illustrated the increased transcript levels of var genes. In addition, it showed that transcripts of a number of other exported proteins, including members of other gene families, were up in the MEED line. 

      One hit that might be causal of the phenotype was sip2, which was down by close to 8-fold (pAdj 0.025). While recent work in P. berghei found this ApiAP2 to be involved in the expression of merozoite genes (Nishi et al., Sci Advances 2025(PMID: 40117352)), previous work in P. falciparum showed that it binds heterochromatic telomere regions and certain var upstream regions (Flück et al., PlosPath 2010 (PMID: 20195509), now cited in the manuscript). The other notable change was an upregulation of the non-coding RNA ruf6 which had been linked with impaired mono-allelic var expression (Guizetti et al., NAR 2016 (PMID: 27466391), now also cited in the manuscript). While it would go beyond this manuscript to follow this up, it is conceivable that alterations in chromosome end biology due to sip2 downregulation or upregulation of ruf6 are causes of the observed phenotype

      We now added a paragraph on the more comprehensive analysis of the RNA Seq data of the MEED vs non-MEED lines at the end of the second results section.

      (2) Could the inability of the PfEMP1-mDHFR fusion to block translocation (Fig 2A) reflect unique features of PfEMP1 trafficking, such as the existence of a soluble, chaperoned trafficking state that is not fully folded? Was a PfEMP1-BPTI fusion ever tested as an alternative to mDHFR?

      This is an interesting suggestion. The PfEMP1-BPTI was never tested. However, a chaperoned trafficking state would likely also affect BPTI. Given that both domains (mDHFR and BPTI) in principle do the same when folded and would block when the construct is in the PV, it is not so likely that using a different blocking domain would make a difference. Therefore, the scenario where BPTI would block when mDHFR does not, is not that probable. The opposite would be possible (mDHFR blocking while BPTI does not, because only the latter depends on the redox state). However, this would only happen if the block  occurred before the construct reaches the PV.

      At present, we believe the lacking block to be due to the organization of the domains in the construct. In the PfEMP1-mDHFR construct in this manuscript the position of the blocking domain is further away from the TMD compared to all other previously tested mDHFR fusions. Increased distance to the TMD has previously been found to be a factor impairing the blocking function of mDHFR (Mesen-Ramirez et al., PlosPath 2016 (PMID: 27168322)). Hence, our suspicion that this is the reason for the lacking block with the PfEMP1-mDHFR rather than the type of blocking domain. However, the latter option can’t be fully excluded and we might test BPTI in future work.

      (3) The late promoter SBP1-mDHFR is 2A fused with the KAHRP reporter. Since 2A skipping efficiency varies between fusion contexts and significant amounts of unskipped protein can be present, it would be helpful to include a WB to determine the efficiency of skipping and provide confidence that the co-blocked KAHRP in the +WR condition (Fig 2D) is not actually fused to the C-terminus of SBP1-mDHFR-GFP.

      Fortunately, this T2A fusion (crt_SBP1-mDHFR-GFP-2A-KAHRP-mScarlet<sup>epi</sup>) was used before in work that included a Western blot showing its efficient skipping (S3 A Fig in MesenRamirez et al., PlosPath 2016). In agreement with these Western blot result, fluorescence microscopy showed very limited overlap of SBP1-mDHFR-GFP and KAHRP-mCherry in absence of WR (Fig. 3B in Mesen-Ramirez et al., PlosPath 2016 and Fig. 2 in this manuscript) which would not be the case if these two constructs were fused together. Please note that KAHRP is known to transiently localize to the Maurer’s clefts before reaching the knobs (Wickham et al., EMBOJ 2001, PMID: 11598007), and therefore occasional overlap with SBP1 at the Maurer’s clefts is expected. However, we would expect much more overlap if a substantial proportion of the construct population would not be skipped and therefore the co-blocked KAHRP-mCherry in the +WR sample is unlikely to be due to inefficient skipping and attachment to SBP1-mDHFR-GFP.

      (4) Does comparison of RNAseq from the various 3D7 and IT4 lines in the study provide any insight into PTP3 expression levels between strains with different binding capacities? Was the expression level of ptp3a/b in the IT4var19 panned line similar to the expression in the parent or other activated IT4 lines? Could the expanded ptp3 gene number in IT4 indicate that specialized trafficking machinery exists for some PfEMP1 proteins (ie, IT4var19 requires the divergent PTP3 paralog for efficient trafficking)?

      PTP3 in the different IT4 lines that bind:

      In those parasite lines that did bind, the intrinsic variation in the binding assays, the different binding properties of different PfEMP1 variants and the variation in RNA Seq experiments to compare different parasite lines precludes a correlation of binding level vs ptp3 expression. For instance, if a PfEMP1 variant has lower binding capacity, ptp3 may still be higher but binding would be lower than if comparing to a parasite line with a better binding PfEMP1 variant. Studying the effect of PTP3 levels on binding could probably be done by overexpressing PTP3 in the same PfEMP1 SLI expressor line and assessing how this affects binding, but this would go beyond this manuscript.

      PTP3 in panned vs unpanned Var19:

      We did some comparisons between IT4 parent, and the IT4-Var19 panned and unpanned

      (see Author response table 1). This did not reveal any clear associations. While the parent had somewhat lower ptp3 transcript levels, they were still clearly higher than in the unpanned Var19 line and other lines had also ptp3 levels comparable to the panned IT4-Var19 (see Author response table 2) 

      PTP3 in the TGDs and possible reason for binding phenotype:

      A key point is whether PTP3 could have influenced the lack of binding in the TGD lines (see also weakness section and point 1 of public review of reviewer 3: ptp3 may be an indirect cause resulting in lacking binding in TGD parasites). We now did RNA Seq to check for ptp3 expression in the relevant TGD lines although we did not do a systematic quantitative comparison (which would require 3 replicates of RNASeq), but we reasoned that loss of expression would also be evident in one replicate. There was no indication that the TGD lines had lost PTP3 expression (see Author response table 2) and this is unlikely to explain the binding loss in a similar fashion to the Var19 parasites. Generally, the IT4 lines showed expression of both ptp3 genes and only in the Var19 parasites before panning were the transcript levels considerably lower:

      Author response table 1.

      Parent vs IT4-Var19 panned and unpanned

      Author response table 2.

      TGD lines with binding phenotype vs parent

      The absence of an influence of PTP3 on the binding phenotype in the cell lines in this manuscript (besides Var19) is further supported by its role in PfEMP1 surface display. Previous work has shown that KO of ptp3 leads to a loss of VAR2CSA surface display (Maier et al., Cell 2008). The unpanned Var19 parasite also lacked PfEMP1 surface display and panning and the resulting appearance of the binding phenotype was accompanied by surface display of PfEMP1. As both, the EMPIC3 and TryThra-TGD lines had still at least some PfEMP1 on the surface, this also (in addition to the RNA Seq above) speaks against PTP3 being the cause of the binding phenotype. The same applies to 3D7 which despite the poor binding displays PfEMP1 on the host cell surface (Figure 1D). This indicating that also the binding phenotype in 3D7 is not due to PTP3 expression loss, as this would have abolished PfEMP1 surface display. 

      The idea about PTP3 paralogs for specific PfEMP1s is intriguing. In the future it might be interesting to test the frequency of parasites with two PTP3 paralogs in endemic settings and correlate it with the PfEMP1 repertoire, variant expression and potentially disease severity. 

      (5) The IT4var01 line shows substantially lower binding in Figure 5F compared with the data shown in Figure 4E and 6F. Does this reflect changes in the binding capacity of the line over time or is this variability inherent to the assay?

      There is some inherent variability in these assays. While we did not systematically assess this, we had no indication that this was due to the parasite line changing. The Var01 line was cultured for months and was frozen down and thawed more than once without a clear gradual trend for more or less binding. While we can’t exclude some variation from the parasite side, we suspect it is more a factor of the expression of the receptor on the CHO cells the iRBCs bind to. 

      Specifically, the assays in Fig. 6F and 4E mentioned by the reviewer both had an average binding to CD36 of around 1000 iE/mm2, only the experiments in Fig. 5F are different (~ 500 iE/mm2) but these were done with a different batch of CHO cells at a different time to the experiments in Fig. 6F and 4E. 

      (6) In Figure S7A, TryThrA and EMPIC3 show distinct localization as circles around the PfEMP1 signal while PeMP2 appears to co-localize with PfEMP1 or as immediately adjacent spots (strong colocalization is less apparent than SBP1, and the various PfEMP1 IFAs throughout the study). Does this indicate that TryThrA and EMPIC3 are peripheral MC proteins? Does this have any implications for their function in PfEMP1 binding? Some discussion would help as these differences are not mentioned in the text. For the EMPIC3 TGD IFAs, localization of SBP1 and PfEMP1 is noted to be normal but REX1 is not mentioned (although this also appears normal).

      We apologise for the lacking description of the candidate localisations and cursory description of the Maurer’s clefts phenotypes (next point). Our original intent was to not distract too much from the main flow of the manuscript as almost every part of the manuscript could be followed up with more details. However, we fully agree that this is unsatisfactory and now provided more description (this point) and more data (next point).

      Localisation of TryThrA and EMPIC3 compared to PfEMP1 at the Maurer’s clefts: the circular pattern is reminiscent of the results with Maurer’s clefts proteins reported by McMillan et al using 3D-SIM in 3D7 parasites (McMillan et al., Cell Microbiology 2014 (PMID: 23421990)). In that work SBP1 and MAHRP1 (both integral TMD proteins) were found in foci but REX1 (no TMD) in circular structures around these foci similar to what we observed here for TryThrA and EMPIC3 which both also lack a TMD. The SIM data in McMillan et al indicated that also PfEMP1 is “more peripheral”, although it did only partially overlap with REX1. The conclusion from that work was that there are sub-compartments at the Maurer’s clefts. In our IFAs (Fig. S7A) PfEMP1 is also only partially overlapping with the TryThrA and EMPIC3 circles, potentially indicating similar subcompartments to those observed by 3D-SIM. We agree with the reviewer that this might be indicative of peripheral MC proteins, fitting with a lack of TMD in these candidates, but we did not further speculate on this in the manuscript.

      We now added enlargements of the ring-like structures to better illustrate this observation in Fig. S7A. In addition, we now specifically mention the localization data and the ring like signal with TryThrA and EMPIC3 in the results and state that this may be similar to the observations by McMillan et al., Cell Microbiology 2014.

      We also thank the reviewer for pointing out that we had forgotten to mention REX1 in the EMPIC3-TGD, this was amended.  

      (7) The atypical localization in TryThrA TGD line claimed for PfEMP1 and SBP1 in Fig S7B is not obvious. While most REX1 is clustered into a few spots in the IFA staining for SBP1 and REX1, SBP1 is only partially located in these spots and appears normal in the above IFA staining for SBP1 and HA. The atypical localization of PfEMP1-HA is also not obvious to me. The authors should clarify what is meant by "atypical" localization and provide support with quantification given the difference between the two SBP1 images shown.

      We apologise for the inadequate description of these IFA phenotypes. The abnormal signal for SBP1, REX1 and PfEMP1 in the TryThrA-TGD included two phenotypes found with all 3 proteins: 

      (1) a dispersed signal for these proteins in the host cell in addition to foci (the control and the other TGD parasites have only dots in the host cell with no or very little detectable dispersed signal). 

      (2) foci of disproportionally high intensity and size, that we assumed might be aggregation or enlargement of the Maurer’s clefts or of the detected proteins.

      The reason for the difference between the REX1 (aggregation) phenotype and the PfEMP1 and SBP1 (dispersed signal, more smaller foci) phenotypes in the images in Fig. S7B is that both phenotypes were seen with all 3 proteins but we chose a REX1 stained cell to illustrate the aggregation phenotype (the SBP1 signal in the same cell is similar to the REX1 signal, illustrating that this phenotype is not REX1 specific; please note that this cell also has a dispersed pool of REX1 and SBP1). 

      Based on the IFAs 66% (n = 106 cells) of the cells in the TryThrA-TGD parasites had one or both of the observed phenotypes. We did not include this into the previous version of the manuscript because a description would have required detouring from the main focus of this results section. In addition, IFAs have some limitations for accurate quantifications, particularly for soluble pools (depending on fixing efficiency and agent, more or less of a soluble pool in the host cell can leak out). 

      To answer the request to better explain and quantify the phenotype and given the limitations of IFA, we now transfected the TryThrA-TGD parasites with a plasmid mediating episomal expression of SBP1-mCherry, permitting live cell imaging and a better classification of the Maurer’s clefts phenotype. Due to the two SLI modifications in these parasites (using up 4 resistance markers) we had to use a new selection marker (mutated lactate transporter PfFNT, providing resistance to BH267.meta (Walloch et al., J. Med. Chem. 2020 (PMID: 32816478))) to transfect these parasites with an additional plasmid. 

      These results are now provided as Fig. S8 and detailed in the last results section. The new data shows that the majority of the TryThrA-TGD parasites contain a dispersed pool of SBP1 in the host cell. About a third of the parasites also showed disproportionally strong SBP1 foci that may be aggregates of the Maurer’s clefts. We also transfected the EMPIC3-TGD parasites with the FNT plasmid mediating episomal SBP1-mCherry expression and observed only few cells with a cytoplasmic pool or aggregates (Fig. S8). Overall these findings agree with the previous IFA results. As the IFA suggests similar results also for REX1 and PfEMP1, this defect is likely not SBP1 specific but more general (Maurer’s clefts morphology; association or transport of multiple proteins to the Maurer’s clefts). This gives a likely explanation for the cytoadherence phenotype in the TryThrA-TGD parasites. The reason for the EMPIC3-TGD phenotype remains to be determined as we did not detect obvious changes of the Maurer’s clefts morphology or in the transport of proteins to these structures in these experiments. 

      Minor comments

      (1) Italicized numbers in parenthesis are present in several places in the manuscript but it is not clear what these refer to (perhaps differently formatted citations from a previous version of the manuscript). Figure 1

      legend: (121); Figure S3 legend: (110), (111); Figure S6 legend: (66); etc.

      We thank the reviewer for pointing out this issue with the references, this was amended.

      (2) Figure 5A and legend: "BSD-R: BSD-resistance gene". Blasticidin-S (BS) is the drug while Blasticidin-S deaminase (BSD) is the resistance gene.

      We thank the reviewer for pointing this out, the legend and figure were changed.

      (3) Figure 5E legend: µ-SBP1-N should be α-SBP1-N.

      This was amended.

      (4) Figure S5 legend: "(Full data in Table S1)" should be Table S3.

      This was amended.

      (5) Figure S1G: The pie chart shows PF3D7_0425700 accounts for 43% of rif expression in 3D7var0425800 but the text indicates 62%.

      We apologize for this mistake, the text was corrected. We also improved the citations to Fig. S1G and H in this section.

      (6) "most PfEMP1-trafficking proteins show a similar early expression..." The authors might consider including a table of proteins known to be required for EMP1 trafficking and a graph showing their expression timing. Are any with later expressions known?

      Most exported proteins are expressed early, which is nicely shown in Marti et al 2004 (cited for the statement) in a graph of the expression timing of all PEXEL proteins (Fig. 4B in that paper). PNEPs also have a similar profile (Grüring et al 2011, also cited for that statement), further illustrated by using early expression as a criterion to find more PNEPs (Heiber et al., 2013 (PMID: 23950716)). Together this includes most if not all of the known PfEMP1 trafficking proteins. The originally co-submitted paper (Blancke-Soares & Stäcker et al., eLife preprint doi.org/10.7554/eLife.103633.1) analysed several later expressed exported proteins

      (Pf332, MSRP6) but their disruption, while influencing Maurer’s clefs morphology and anchoring, did not influence PfEMP1 transport. However, there are some conflicting results for Pf332 (referenced in Blancke-Soares & Stäcker et al). This illustrates that it may not be so easy to decide which proteins are bona fide PfEMP1 trafficking proteins. We therefore did not add a table and hope it is acceptable for the reader to rely on the provided 3 references to back this statement.

      (7)  Figure S1J: The predominate var in the IT4 WT parent is var66 (which appears to be syntenic with Pf3D7_0809100, the predominate var in the 3D7 WT parent). Is there something about this locus or parasite culture conditions that selects for these vars in culture? Is this observed in other labs as well?

      This is a very interesting point (although we are not certain these vars are indeed syntenic, they are on different chromosomes). As far as we know at least Pf3D7_0809100 is commonly a dominant var transcribed in other labs and was found expressed also in sporozoites (Zanghì et al. Cell Rep. 2018). However, it is unclear how uniform this really is. For IT4 we do not know in full but have also here commonly observed centromeric var genes to be dominating transcripts in unselected parasite cultures. It is possible that transcription drifts to centromeric var genes in cultured parasites. However, given the anecdotal evidence, it is unknown to which extent this is related to an inherent switching and regulation regiment or a consequence of faulty regulation following prolonged culturing.

      (8) Figure 4B, C: Presumably the asterisks on the DNA gels indicate non-specific bands but this is not described in the legend. Why are non-specific bands not consistent between parent and integrated lanes?

      We apologize for not mentioning this in the legend, this was amended.

      It is not clear why the non-specific bands differ between the lines but in part this might be due to different concentrations and quality of DNA preps. A PCR can also behave differently depending on whether the correct primer target is present or not. If present, the PCR will run efficiently and other spurious products will be outcompeted, but in absence of the correct target, they might become detectable.  

      Overall, we do not think the non-specific bands are indications of anything untoward with the lines, as for instance in Fig. 4B the high band in the 5’ integration in the IT4 line (that does not occur anywhere else) can’t be due to a genomic change as this is the parental line and does not contain the plasmid for integration. In the same gel, the ori locus band of incorrect size (likely due to crossreaction of the primers to another var gene which due to the high similarity of the ATS region is not always fully avoidable), is present in both, the parent IT4 and the integrant line which therefore also is not of concern. In C there are a couple of bands of incorrect size in the Integration line. One of these is very faint and both are too large and again therefore are likely other vars that are inefficiently picked up by these primers. The reason they are not seen in the parent line is that there the correct primer binding site is present, which then efficiently produces a product that outcompetes the product derived from non-optimal matching primer products and hence appear in the Int line where the correct match is not there anymore. For these reasons we believe these bands are not of any concern.  

      (9) Figure 4C: Is there a reason KAHRP was used as a co-marker for the IFA detecting IT4var19 expression instead of SBP1 which was used throughout the rest of the study?

      This is a coincidence as this line was tested when other lines were tested for KAHRP. As there were foci in the host cell we were satisfied that the HA-tagged PfEMP1 is produced and the localization deemed plausible. 

      (10) Figure 6: Streptavidin labeling for the IT4var01-BirA position 3 line is substantially less than the other two lines in both IFA and WB. Does the position 3 fusion reduce PfEMP1 protein levels or is this a result of the context or surface display of the fusion? Interestingly, the position 3 trypsin cleavage product appears consistently more robust compared with the other two configurations. Does this indicate that positioning BirA upstream of the TM increases RBC membrane insertion and/or makes the surface localized protein more accessible to trypsin?

      It is possible that RBC membrane insertion or trypsin accessibility is increased for the position 3 construct. But there could also be other explanations:

      The reason for the more robustly detected protected fragment for the position 3 construct in the WB might also be its smaller size (in contrast to the other two versions, it does not contain BirA*) which might permit more efficient transfer to the WB membrane. In that case the more robust band might not (only) be due to better membrane insertion or better trypsin accessibility.

      The lower biotinylation signal with the position 3 construct might also be explained by the farther distance of BirA* to the ATS (compared to position 1 and 2), the region where interactors are expected to bind. The position 1 and 2 constructs may therefore generally be more efficient (as closer) to biotinylate ATS proximal proteins. Further, in the final destination (PfEMP1 inserted into the RBC membrane) BirA* would be on the other side of the membrane in the position 3 construct while in the position 1 and 2 constructs BirA* would be on the side of the membrane where the ATS anchors PfEMP1 in the knob structure. In that case, labelling with position 3 would come from interactions/proximities during transport or at the Maurer’s clefts (if there indeed PfEMP1 is not membrane embedded) and might therefore be less.

      Hence, while alterations in trypsin accessibility and RBC membrane insertion are possible explanations, other explanations exist. At present, we do not know which of these explanations apply and therefore did not mention any of them in the manuscript. 

      Reviewer #3 (Recommendations for the authors):

      (1) In the abstract and on page 8, the authors mention that they generate cell lines binding to "all major endothelial receptors" and "all known major receptors". This is a pretty allencompassing statement that might not be fully accepted by others who have reported binding to other receptors not considered in this paper (e.g. VCAM, TSP, hyaluronic acid, etc). It would be better to change this statement to something like "the most common endothelial receptors" or "the dominant endothelial receptors", or something similar.

      We agree with the reviewer that these statements are too all-encompassing and changed them to “the most common endothelial receptors” (introduction) and “the most common receptors” (results).

      (2) The authors targeted two rif genes for activation and in each case the gene became the most highly expressed member of the family. However, unlike var genes, there were other rif genes also expressed in these lines and the activated copy did not always make up the majority of rif mRNAs. The authors might wish to highlight that this is inconsistent with mutually exclusive expression of this gene family, something that has been discussed in the past but not definitively shown.

      We thank the reviewer for highlighting this, we now added the following statement to this section: “While SLI-activation of rif genes also led to the dominant expression of the targeted rif gene, other rif genes still took up a substantial proportion of all detected rif transcripts, speaking against a mutually exclusive expression in the manner seen with var genes.”

      (3) In Figure 6, H-J, the authors display volcano plots showing proteins that are thought to interact with PfEMP1. These are labeled with names from the literature, however, several are named simply "1, 2, 3, 4, 5, or 6". What do these numbers stand for?

      We apologize for not clarifying this and thank the reviewer for pointing this out. There is a legend for the numbered proteins in what is now Table S4 (previously Table S3). We now amended the legend of Figure 6 to explain the numbers and pointing the reader to Table S4 for the accessions.

    1. eLife Assessment

      This study resolves a cryo-EM structure of the GPCR, human GPR30, which responds to bicarbonate and regulates cellular responses to pH and ion homeostasis. Understanding the ligand and the mechanism of activation is important to the field of receptor signaling and potentially facilitates drug development targeting this receptor. Structures and functional assays provide solid evidence for a potential bicarbonate binding site.

    2. Reviewer #1 (Public review):

      Summary:

      This study resolves a cryo-EM structure of the GPCR, GPR30, in the presence of bicarbonate, which the author's lab recently identified as the physiological ligand. Understanding the ligand and the mechanism of activation is of fundamental importance to the field of receptor signaling. This solid study provides important insight into the overall structure and suggests a possible bicarbonate binding site.

      Strengths:

      The overall structure, and proposed mechanism of G-protein coupling are solid. Based on the structure, the authors identify a binding pocket that might accommodate bicarbonate. Although assignment of the binding pocket is speculative, extensive mutagenesis of residues in this pocket identifies several that are important to G-protein signaling. The structure shows some conformational differences with a previous structure of this protein determined in the absence of bicarbonate (PMC11217264). To my knowledge, bicarbonate is the only physiological ligand that has been identified for GPR30, making this study an important contribution to the field. However, the current study provides novel and important circumstantial evidence for the bicarbonate binding site based on mutagenesis and functional assays.

      Weaknesses:

      Bicarbonate is a challenging ligand for structural and biochemical studies, and because of experimental limitations, this study does not elucidate the exact binding site. Higher resolution structures would be required for structural identification of bicarbonate. The functional assay monitors activation of GPR30, and thus reports on not only bicarbonate binding, but also the integrity of the allosteric network that transduces the binding signal across the membrane. However, biochemical binding assays are challenging because the binding constant is weak, in the mM range.

      The authors appropriately acknowledge the limitations of these experimental approaches, and they build a solid circumstantial case for the bicarbonate binding pocket based on extensive mutagenesis and functional analysis. However, the study does fall short of establishing the bicarbonate binding site.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, "Cryo-EM structure of the bicarbonate receptor GPR30," the authors aimed to enrich our understanding of the role of GPR30 in pH homeostasis by combining structural analysis with a receptor function assay. This work is a natural development and extension of their previous work on Nature Communications (PMID: 38413581). In the current body of work, they solved the cryo-EM structure of the human GPR30-G-protein (mini-Gsqi) complex in the presence of bicarbonate ions at 3.15 Å resolution. From the atomic model built based on this map, they observed the overall canonical architecture of class A GPCR and also identified 3 extracellular pockets created by ECLs (Pockets A-C). Based on the polarity, location, size, and charge of each pocket, the authors hypothesized that pocket A is a good candidate for the bicarbonate binding site. To identify the bicarbonate binding site, the authors performed an exhaustive mutant analysis of the hydrophilic residues in Pocket A and analyzed receptor reactivity via calcium assay. In addition, the human GPR30-G-protein complex model also enabled the authors to elucidate the G-protein coupling mechanism of this special class A GPCR, which plays a crucial role in pH homeostasis.

      Strengths:

      As a continuation of their recent Nature Communications publication, the authors used cryo-EM coupled with mutagenesis and functional studies to elucidate bicarbonate-GPR30 interaction. This work provided atomic-resolution structural observations for the receptor in complex with G-protein, allowing us to explore its mechanism of action, and will further facilitate drug development targeting GPR30. There were 3 extracellular pockets created by ECLs (Pockets A-C). The authors were able to filter out 2 of them and hypothesized that pocket A was a good candidate for the bicarbonate binding site based on the polarity, location, and charge of each pocket. From there, the authors identified the key residues on GPR30 for its interaction with the substrate, bicarbonate. Together with their previous work, they mapped out amino acids that are critical for receptor reactivity.

      Weaknesses:

      When we see a reduction of a GPCR-mediated downstream signaling, several factors could potentially contribute to this observation: 1) a reduced total expression of this receptor due to the mutation (transcription and translation issue); 2) a reduced surface expression of this receptor due to the mutation (trafficking issue); and 3) a dysfunctional receptor that doesn't signal due to the mutation. In the current revision, based on the gating strategy, the surface expression of the HA-positive WT GPR30-expressing cells is only 10.6% of the total population, while the surface expression levels of the mutants range from 1.89% (P71A) to 64.4% (D111A). Combining this information with the functional readout in Figure 3F and G, as well as their previous work, the authors concluded that mutations at P71, E115, D125, Q138, C207, D210, and H307 would decrease bicarbonate responses. Among those sites,

      E115, Q138, and H307 were from their previous Nature Comm paper.

      Authors claim P71 and C207 make a structural-stability contribution, as their mutations result in a significant reduction in surface expression: P71A (1.89%) and C207A (2.71%). However, compared to 10.6% of the total population in the WT, (P71A is 17.8% of the WT, and C207A is 25.6% of the WT), this doesn't rule out the possibility that the mutated receptor is also dysfunctional: at 10 mM NaHCO3, RFU of WT is ~500, RFU of P71 and C207 are ~0.

      The authors also interpret "The D125ECL1A mutant has lost its activity but is located on the surface" and only mention "D125 is unlikely to be a bicarbonate binding site, and the mutational effect could be explained due to the decreased surface expression". Again, compared to 10.6% of the total population in the WT, D125A (3.94%) is 37.2% of the WT. At 10 mM NaHCO3, the RFU of the WT is ~500, the RFU of D125 is ~0. This doesn't rule out the possibility that the mutated receptor is also dysfunctional. It is not clear why D125A didn't make it to the surface.

      Other mutants that the authors didn't mention much in their text: D111A (64.4%, 607.5% of WT surface expression), E121A (50.4%, 475.5% of WT surface expression), R122 (41.0%, 386.8% of WT surface expression), N276A (38.9%, 367.0% of WT surface expression) and E218A (24.6%, 232.1% of WT surface expression) all have similar RFU as WT, although the surface expression is about 2-6 times more. On the other hand, Q215A (3.18%, 30% of WT surface expression) has similar RFU as WT, with only a third of the receptor on the surface.

      Altogether, the wide range of surface expression across the different cell lines, combined with the different receptor function readouts, makes the cell functional data only partially support their structural observations.

    4. Reviewer #3 (Public review):

      Summary

      GPR30 responds to bicarbonate and plays a role in regulating cellular pH and ion homeostasis. However, the molecular basis of bicarbonate recognition by GPR30 remains unresolved. This study reports the cryo-EM structure of GPR30 bound to a chimeric mini-Gq in the presence of bicarbonate, revealing mechanistic insights into its G-protein coupling. Nonetheless, the study does not identify the bicarbonate-binding site within GPR30.

      Strengths

      The work provides strong structural evidence clarifying how GPR30 engages and couples with Gq.

      Weaknesses

      Several GPR30 mutants exhibited diminished responses to bicarbonate, but their expression levels were also reduced. As a result, the mechanism by which GPR30 recognizes bicarbonate remains uncertain, leaving this aspect of the study incomplete.

    5. Author response:

      The following is the authors’ response to the original reviews.

      The parts of the text that have been changed.The major changes are as follows:

      We re-analyzed the dataset and improved the local resolution of the extracellular region (Author response image 1).

      We re-modeled based on the improved density and canceled the bicarbonate model based on comments from all reviewers.

      We performed calcium assay using cell lines stably expressing the mutants, whose surface expression levels were analyzed by fluorescence-activated cell sorting (FACS)<br /> (Figure 3F, G and Figure 3–figure supplement 1-3).

      Thus, we significantly revised our discussion of the extracellular binding pocket and the result of the mutational study. In the revised manuscript, we speculate that H307 is a candidate for the bicarbonate binding site.

      Author response image 1.

      Figure Comparison of local resolution between re-analyzed and previous maps.A Side and top view of the re-analyzed receptor-focused map of GPR30 colored by local resolution. B Side and top view of the previous receptor-focused map of GPR30 colored by local resolution

      Reviewer #1 (Public Review):

      Summary:

      This study resolves a cryo-EM structure of the GPCR, GPR30, which was recently identified as a bicarbonate receptor by the authors' lab. Understanding the ligand and the mechanism of activation is of fundamental importance to the field of receptor signaling. However, the main claim of the paper, the identification of the bicarbonate binding site, is only partly supported by the structural and functional data, leaving the study incomplete.

      Strengths:

      The overall structure, and proposed mechanism of G-protein coupling seem solid. The authors perform fairly extensive unbiased mutagenesis to identify a host of positions that are important to G-protein signaling. To my knowledge, bicarbonate is the only physiological ligand that has been identified for GPR30, making this study a particularly important contribution to the field.

      Weaknesses:

      Without higher resolution structures and/or additional experimental assessment of the binding pocket, the assignment of the bicarbonate remains highly speculative. The local resolution is especially poor in the ECL loop region where the ligand is proposed to bind (4.3 - 4 .8 Å range). Of course, sometimes it is difficult to achieve high structural resolution, but in these cases, the assignment of ligands should be backed up by even more rigorous experimental validation.The functional assay monitors activation of GPR30, and thus reports on not only bicarbonate binding, but also the integrity of the allosteric network that transduces the binding signal across the membrane. Thus, disruption of bicarbonate signaling by mutagenesis of the putative coordinating residues does not necessarily mean that bicarbonate binding has been disrupted. Moreover, the mutagenesis was apparently done prior to structure determination, meaning that residues proposed to directly surround bicarbonate binding, such as E218, were not experimentally validated. Targeted mutagenesis based on the structure would strengthen the story.

      Moreover, the proposed bicarbonate binding site is surprising in a chemical sense, as it is located within an acidic pocket. The authors cite several other structural studies to support the surprising observation of anionic bicarbonate surrounded by glutamate residues in an acidic pocket (references 31-34). However, it should be noted that in general, these other structures also possess a metal ion (sodium or calcium) and/or a basic sidechain (arginine or lysine) in the coordination sphere, forming a tight ion pair. Thus, the assigned bicarbonate binding site in GPR30 remains an anomaly in terms of the chemical properties of the proposed binding site.

      Thank you for your insightful comments. Based on the weaknesses you pointed out, we reconstructed the receptor based on the improved density and removed the bicarbonate model. We performed calcium assays using cell lines stably expressing the variant based on the structure.

      Reviewer #2(Public Review):

      Summary:

      In this manuscript, "Cryo-EM structure of the bicarbonate receptor GPR30," the authors aimed to enrich our understanding of the role of GPR30 in pH homeostasis by combining structural analysis with a receptor function assay. This work is a natural development and extension of their previous work (PMID: 38413581). In the current body of work, they solved the first cryo-EM structure of the human GPR30-G-protein (mini-Gsqi) complex in the presence of bicarbonate ions at 3.21 Å resolution. From the atomic model built based on this map, they observed the overall canonical architecture of class A GPCR and also identified 4 extracellular pockets created by extracellular loops (ECLs) (Pockets A-D). Based on the polarity, location, and charge of each pocket, the authors hypothesized that pocket D is a good candidate for the bicarbonate binding site. To verify their structural observation, on top of the 10 mutations they generated in the previous work, the authors introduced another 11 mutations to map out the essential residues for the bicarbonate response on hGPR30. In addition, the human GPR30-G-protein complex model also allowed the authors to untangle the G-protein coupling mechanism of this special class A GPCR that plays an important role in pH homeostasis.

      Strengths:

      As a continuation of their recent Nature Communication publication (PMID: 38413581), this study was carefully designed, and the authors used mutagenesis and functional studies to confirm their structural observations. This work provided high-resolution structural observations for the receptor in complex with G-protein, allowing us to explore its mechanism of action, and will further facilitate drug development targeting GPR30. There were 4 extracellular pockets created by ECLs (Pockets A-D). The authors were able to filter out 3 of them and identified that pocket D was a good candidate for the bicarbonate binding site based on the polarity, location, and charge of each pocket. From there, the authors identified the key residues on GPR30 for its interaction with the substrate, bicarbonate. Together with their previous work, they carefully mapped out nine amino acids that are critical for receptor reactivity.

      Weaknesses:

      It is unclear how novel the aspects presented in the new paper are compared to the most recent Nature Communications publication (PMID: 38413581). Some areas of the manuscript appear to be mixed with the previous publication. The work is still impactful to the field. The new and novel aspects of this manuscript could be better highlighted.

      I also have some concerns about the TGFα shedding assay the authors used to verify their structural observation. I understand that this assay was also used in the authors' previous work published in Nature Communications. However, there are still several things in the current data that raised concerns:

      Thank you for your insightful comments. Based on the weaknesses you pointed out, we highlighted the new and novel aspects of this manuscript could be better highlighted.l. We performed calcium assays using cell lines stably expressing the variant based on the structure.

      (1) The authors confirmed the "similar expression levels of HA-tagged hGPR30" mutants by WB in Supplemental Figure 1A and B. However, compared to the hGPR30-HA (~6.5 when normalized to the housekeeping gene, Na-K-ATPase), several mutants of the key amino acids had much lower surface expression: S134A, D210A, C207A had ~50% reduction, D125A had ~30% reduction, and Q215A and P71A had ~20% reduction. This weakens the receptor reactivity measured by the TGFα shedding assay.

      Since the calcium assay data is included in the main figure, the TGFα shedding assay and WB expression quantification data are Figure 3. –– supplement figure 1-4, but we included an explanation of the expression levels in the figure caption.

      (2) In the previous work, the authors demonstrated that hGPR30 signals through the Gq signaling pathway and can trigger calcium mobilization. Given that calcium mobilization is a more direct measurement for the downstream signaling of hGPR30 than the TGFα shedding assay, pairing the mutagenesis study with the calcium assay will be a better functional validation to confirm the disruption of bicarbonate signaling.

      According to the suggestion, we performed calcium assay using cell lines stably expressing the mutants (Figure 3F, G and Figure 3–figure supplement 1-3).

      (3) It was quite confusing for Figure 4B that all statistical analyses were done by comparing to the mock group. It would be clearer to compare the activity of the mutants to the wild-type cell line.

      Thank you for your comment. As you mentioned, the comparisons are made between wild-type GPR30 and mutants in the revised manuscript (Figure 3G, Figure 3.—figure supplement 4B)

      Additional concerns about the structural data include

      (1) E218 was in close contact with bicarbonate in Figure 4D. However, there is no functional validation for this observation. Including the mutagenesis study of this site in the cell-based functional assay will strengthen this structural observation.

      We cancelled the bicarbonate model, and we performed mutation analysis targeting all residues facing the binding pocket using cell lines that stably express variants including E218A.

      (2) For the flow chart of the cryo-EM data processing in Supplemental data 2, the authors started with 10,148,422 particles after template picking, then had 441,348 Particles left after 2D classification/heterogenous refinement, and finally ended with 148,600 particles for the local refinement for the final map. There seems to be a lot of heterogeneity in this purified sample. GPCRs usually have flexible and dynamic loop regions, which explains the poor resolution of the ECLs in this case. Thus, a solid cell-based functional validation is a must to assign the bicarbonate binding pocket to support their hypothesis.

      We re-analyzed the dataset and improved the local resolution of the extracellular region (Author response image 1) and cancelled the bicarbonate model. Yet, as suggested by the reviewer, solid cell-based functional validation is efficient to analyze the receptor function response to bicarbonate. Thus, we performed mutation analysis targeting all residues facing the binding pocket using cell lines stably expressing the mutants, whose surface expression levels were analyzed by FACS (Figure 3F, G and Figure 3.––figure supplement 1-3).

      Reviewer #3 (Public Review):

      Summary:

      GPR30 responds to bicarbonate and regulates cellular responses to pH and ion homeostasis. However, it remains unclear how GPR30 recognizes bicarbonate ions. This paper presents the cryo-EM structure of GPR30 bound to a chimeric mini-Gq in the presence of bicarbonate. The structure together with functional studies aims to provide mechanistic insights into bicarbonate recognition and G protein coupling.

      Strengths:

      The authors performed comprehensive mutagenesis studies to map the possible binding site of bicarbonate.

      Weaknesses:

      Owing to the poor resolution of the structure, some structural findings may be overclaimed.

      Based on EM maps shown in Figure 1a and Figure Supplement 2, densities for side chains in the receptor particularly in ECLs (around 4 Å) are poorly defined. At this resolution, it is unlikely to observe a disulfide bond (C130ECL1-C207ECl2) and bicarbonate ions. Moreover, the disulfide between ECL1 and ECL2 has not been observed in other GPCRs and the published structure of GPR30 (PMID: 38744981). The density of this disulfide bond could be noise.

      The authors observed a weak density in pocket D, which is accounted for by the bicarbonate ions. This ion is mainly coordinated by Q215 and Q138. However, the Q215A mutation only reduced but not completely abolished bicarbonate response, and the author did not present the data of Q138A mutation. Therefore, Q215 and Q138 could not be bicarbonate binding sites. While H307A completely abolished bicarbonate response, the authors proposed that this residue plays a structural role. Nevertheless, based on the structure, H307 is exposed and may be involved in binding bicarbonate. The assignment of bicarbonate in the structure is not supported by the data.

      Thank you for your insightful comments. Based on the weaknesses you pointed out, we reconstructed the receptor based on the improved density and removed the bicarbonate model. We performed calcium assays using cell lines stably expressing the variant based on the structure.

      Reviewer #1 (Recommendations For The Authors):

      (1) The experimental validation of the bicarbonate binding could be strengthened by developing an assay that directly monitors bicarbonate binding (rather than GPCR signaling)

      We agree that a direct binding assay for bicarbonate would be highly attractive (i.e. Filter binding assay using 14C-HCO₃⁻). However, the weak affinity of bicarbonate ions (in the mM range) would make reliable radioisotope-based detection impossible due to minimal specific receptor occupancy and high non-specific background and thus it is highly challenging and there are limitations to what can be done in this structural paper.

      and determining a structure at comparable resolution in the absence of bicarbonate. In addition, all residues that are proposed to be located adjacent to the bicarbonate should be mutated and functionally validated.

      We re-modeled the receptor based on the improved density and canceled the bicarbonate model. We performed calcium assay using cell lines stably expressing the mutants (Figure 3F, G and Figure 3.–figure supplement 1-3).

      (2) What are the maps contoured in Figure 4D? The legend should describe this. Is 218 within the map region shown, or is there no density for its sidechain?

      We removed the corresponding figure and cancelled the bicarbonate model.

      (3) The contour level of the maps in Figure 1 - Figure Supplement 2 should also be indicated. Are these all contoured at the same level?

      Thank you for your comment. We re-analyzed the same data set and obtained new density maps and models. We reworked Figure 1 and Figure 1. figure supplement 2; the contour level of the map for Figure 1 and composite map for the Figure 1. figure supplement 2 is the same, 7.65. 

      (4) Regarding the cited structures of bicarbonate-binding proteins, for three of the four cited structures, the bicarbonate is actually coordinated by positive ligands, with the Asp/Glu playing a more peripheral role:

      Capper et al: Overall basic cavity with tight bidentate coordination by Arg. The Glu is 5-6 Å away.

      Koropatkin et al: Two structures. The first, solved at pH 5, is proposed to have carbonic acid bound. The second, solved at pH 8, shows carbonate in a complex with calcium, with the calcium coordinated by carboxylates.

      Wang et al: The bicarbonate is coordinated by a lysine and a sodium ion. The sodium is coordinated by carboxylates.

      The authors should more thoughtfully discuss the unusual properties of this binding site with regard to the previous literature. Is it possible that bicarbonate binds in complex with a metal ion? Could this possibility be experimentally tested?

      We cancelled the bicarbonate model.

      (5) As a structure of GPR30 has been recently published by another group (PMID: 38744981), it would be valuable to discuss structural similarities and differences and discuss how bicarbonate activation and activation by the chloroquine ligand identified by the other group might both be accommodated by this structure.

      Thank you for your valuable comment. We compared the structure presented by another group and added our discussion, as “During the revision of this manuscript, the structures of apo-GPR30-G<sub>q</sub> (PDB 8XOG) and the exogenous ligand Lys05-bound GPR30-G<sub>q</sub> (PDB 8XOF) were reported [42]. We compared our structure of GPR30 in the presence of bicarbonate with these structures. In the extracellular region, the position of TM5 in GPR30 in the presence of bicarbonate is similar to that in apo-GPR30. In contrast, the position of TM6 is shifted outward relative to that of apo-GPR30, resembling the conformation observed in Lys05-bound GPR30 (Figure 6A, B). Additionally, the position of ECL1 is also shifted outward compared to that of apo-GPR30 (Figure 6B). In the GPR30 structure in the presence of bicarbonate, ECL2 was modeled, suggesting differences in structural flexibility. These findings indicate that the structure of GPR30 in the presence of bicarbonate is different from both the apo structure and the Lys05-bound structure, demonstrating that the structure and the flexibility of the extracellular domain of GPR30 change depending on the type of ligand. Furthermore, focusing on the interaction with G<sub>q</sub>, the αN helix of G<sub>q</sub> is not rotated in the structure bound to Lys05, in contrast to the characteristic bending of the αN helix in our structure (Figure 6C, D). Although it is necessary to consider variations in experimental conditions, such as salt concentration, the differences in the G<sub>q</sub> binding modes suggest that the downstream signals may change in a ligand-dependent manner.” (lines 249-266).

      Reviewer #2 (Recommendations For The Authors):

      (1) It is highly recommended that the authors carefully go through the "insights into bicarbonate binding" section. The results of the new findings in this paper were blended in with the results from the previous work: the importance of E115, Q138, and H307 in the receptor-bicarbonate interaction was shown in the Nature Communication paper but the authors didn't make it clear, which added a little confusion.

      We emphasized this fact in the main text (lines 130-132).

      (2) It would be nice for the authors to add some content about the physiological concentration of HCO3 or refer more to their previous work about the rationale for selecting the bicarbonate dose in their functional assay.

      Thank you for your comment. The physiological concentration of bicarbonate is 22-26 mM in the extracellular fluid, including interstitial fluid and blood, and 10-12 mM in the intracellular fluid. The bicarbonate concentration alters in various physiological and pathological conditions – metabolic acidosis in chronic kidney disease causes a drop to 2-3 mM, and metabolic alkalosis induced by severe vomiting increases HCO<sub>3</sub><sup>-</sup> concentrations more than 30 mM. Thus, our present and previous works clearly show that GPR30 is activated by physiological concentrations of bicarbonate, whether it is localized intracellularly or on the membrane, and that GPR30 can be deactivated or reactivated in various pathophysiological conditions. We added this in the discussion section (lines 267-278).

      (3) In Figure 3A, in the legend, the authors mentioned: "black dashed lines indicate hydrogen bonds". No hydrogen bond was noted in the figure.

      We totally corrected Figure 3.

      (4) Figure 3B, it would be helpful for the authors to denote the meaning of the blue-white-red color coding in the legend.

      We removed the figure.

      (5) Supplemental Figure 3: since AF3 was released on May 3rd, it would be awesome in the revision version if the authors would update this to the AF3 model.

      The AF2 model has been replaced with the AF3. (Figure 2–figure supplement 2A-C). The AF2 and AF3 models are almost identical, and they form incorrect disulfide bonds. This confirms the usefulness of the experimental structural determination in this study.

      (6) Supplemental Figure 4: it wasn't clear to me if the expression experiments were repeated multiple times or if there was any statistical analysis for the expression level was done in this study.

      We performed the expression experiment by western blotting once and did not perform statistical analyses. We performed repeated FACS analyses of HEK cells stably expressing N-terminally HA-tagged wild-type or mutant GPR30s to analyze their membrane and whole-cell expressions during revision (Figure 3.–figure supplement 1-3). Using these stable cells, we performed calcium assays using cell lines stably expressing the mutants (Figure 3F, G and Figure 3–figure supplement 1-3).

      (7) Supplemental Figure 4: Also, is there a reason for the authors to compare the expression level of hGPR30 to the housekeeping gene NA-K-ATPase rather than the total loaded protein? Traditionally housekeeping genes have been used as loading controls to semiquantitatively compare the expression of target proteins in western blots. However, numerous recent studies show that housekeeping proteins can be altered due to experimental conditions, biological variability across tissues, or pathologies. A consensus has developed for using total protein as the internal control for loading. An editorial from the Journal of Biological Chemistry reporting on "Principles and Guidelines for Reporting Preclinical Research" from the workshop held in June 2014 by the NIH Director's Office, Nature Publishing Group, and Science stated, "It is typically better to normalize Western blots using total protein loading as the denominator".

      Thank you for your instructive comment. We evaluated western blotting with the same amount of total protein loaded 20 µg for whole-cell lysate and 1.5 µg for cell surface protein (Figure 3.–figure supplement 3C-F).

      Reviewer #3 (Recommendations For The Authors):

      The claim about this disulfide should be removed unless the authors can provide mass spec evidence.

      Thank you for your crucial comments. Firstly, C130 is a residue of TM3, not ECL1, so our misprint has been corrected to C130<sup>3.25</sup>. C207<sup>ECL2</sup>, located at position 45.50, is the most conserved residue in ECL2, and it forms a disulfide bond with cysteine at position 3.25 (PMID: 35113559). The paper was additionally cited regarding the preservation of the bond of C130<sup>3.25</sup>-C207<sup>ECL2</sup> (line 103). Indeed, disruption of this disulfide bond by the C207<sup>ECL2</sup> A mutation resulted in a marked reduction in receptor activity. In addition, the data set was re-analyzed to improve the local resolution of the extracellular region, and it was shown that the density of ECL2 is not noise (Figure 2. ––figure supplement 2). We are confident about the presence of the disulfide bond, based on the structural analysis data and the conservation.

      The highly flexible extracellular region is greatly affected by experimental conditions and ligands, so we speculate that the ECL2 and the disulfide bond was not observed in other reported structures of GPR30. Then, we have added the following content to the discussion, as “In the GPR30 in the presence of bicarbonate, ECL2 was modelled, suggesting differences in structural flexibility.” (lines 256-257).

      The authors should remove the assignment of bicarbonate in the structure, and tone down the binding site of bicarbonate.

      We cancelled the bicarbonate model.

      Minor:

      (1) The potency of bicarbonate for GPR30 is in the mM range. Although the concentration of bicarbonate in the serum can reach mM range, how about its concentration in the tissues? Given its low potency, it may be not appropriate to claim GPR30 is a bicarbonate receptor at this point, but the authors can claim that GPR30 can be activated by or responds to bicarbonate.

      The physiological concentration of bicarbonate is 22-26 mM in the extracellular fluid, including interstitial fluid and blood, and 10-12 mM in the intracellular fluid. Therefore, GPR30 is activated by physiological concentrations of bicarbonate in the tissues. Also, the bicarbonate concentration alters in various physiological and pathological conditions – metabolic acidosis in chronic kidney disease causes a drop to 2-3 mM, and metabolic alkalosis induced by severe vomiting increases HCO3- concentrations more than 30 mM. Thus, our work clearly shows that GPR30 is activated by physiological concentrations of bicarbonate, whether it is localized intracellularly or on the membrane, and that GPR30 can be deactivated or reactivated in various pathophysiological conditions. According to the reasons above, we claim GPR30 is a bicarbonate receptor (lines 267-278).

      (2) The description that there is no consensus on a drug that targets GPR30 is not accurate, since lys05 has been reported as an agonist of GPR30 and their structure is published (PMID: 38744981). The published structures of GPR30 should be introduced in the paper.

      We added the discussion about the structural comparison with the Lys05-bound structure (Figure 6, lines 249-266)

      (3) BW numbers in Figure 4A should be shown.

      We added BW numbers in the figures of the mutational studies.

    1. eLife Assessment

      The authors present an important approach to identify imported P. falciparum malaria cases, combining genetic and epidemiological/travel data. This tool has the potential to be expanded to other contexts. The data was analyzed using convincing methods, including a novel statistical model. This study may be of interest to researchers in public health and infectious diseases beyond malaria.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents a new Bayesian approach to estimate importation probabilities of malaria combining epidemiological data, travel history, and genetic data through pairwise IBD estimates. Importation is an important factor challenging malaria elimination, especially in low transmission settings. This paper focus on Magude and Matutuine, two districts in south Mozambique with very low malaria transmission. The results show isolation-by-distance in Mozambique, with genetic relatedness decreasing with distances larger than 100 km, and no spatial correlation for distances between 10 and 100 km. But again strong spatial correlation in distances smaller than 10 km. They report high genetic relatedness between Matutuine and Inhambane, higher than between Matutuine and Magude. Inhambane is the main source of importation in Matutuine, accounting for 63.5% of imported cases. Magude, on the other hand, shows smaller importation and travel rates than Matutuine, as it is a rural area with less mobility. Additionally, they report higher levels of importation and travel in the dry season, when transmission is lower. Also, no association with importation was found for occupation, sex and other factors. These data have practical implications for public health strategies aiming malaria elimination, for example, testing and treating travelers from Matutuine in the dry season.

      Strengths:

      The strength of this study relies in the combination of different sources of data - epidemiological, travel and genetic data - to estimate importation probabilities, the statistical analyses.

      Weaknesses:

      The authors recognize the limitations related to sample size and the biases of travel reports.

    3. Reviewer #2 (Public review):

      Summary:

      Based on a detailed dataset, the authors present a novel Bayesian approach to classify malaria cases as either imported or locally acquired.

      Strengths:

      The proposed Bayesian approach for case classification is simple, well justified, and allows the integration of parasite genomics, travel history, and epidemiological data.

      Weakness:

      While the authors aim to classify cases as imported or locally acquired, the work lacks a quantification of the contribution of each case type to overall transmission.

      Comments on revisions:

      All my questions and concerns were satisfactorily addressed.

    4. Reviewer #3 (Public review):

      This work provides a novel statistical model to identify imported malaria cases, which are an important challenge for elimination, particularly in low-transmission areas. This tool was applied in Plasmodium falciparum populations in Mozambique and determined differences in importation rates in 2 low-transmission districts in the South.

      Strengths:

      The study has several strengths, mainly the development of a novel Bayesian model that integrates genomic, epidemiological, and travel data to estimate importation probabilities. The results showed insights into malaria transmission dynamics, particularly identifying importation sources and differences in importation rates in Mozambique. Finally, the relevance of the findings is to suggest interventions focusing on the traveler population to support efforts for malaria elimination.

      Weaknesses:

      The study also has some limitations, although the authors have plans to address them. The sample collection was not representative of some provinces, and not all samples had sufficient metadata for the risk factor analysis. Additionally, the authors used a proxy for transmission intensity and assumed some other conditions to calculate the importation probability for specific scenarios. They plan to conduct a new sample collection and include monthly malaria incidence estimates in the future.

      Comments on revisions:

      - Delete "We added this text to the discussion" in line 302 (Discussion)<br /> - I recommend adding the plans to address limitations indicated in the Response to Reviewers document in the Discussion. This would really strengthen the limitation section.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents a new Bayesian approach to estimate importation probabilities of malaria, combining epidemiological data, travel history, and genetic data through pairwise IBD estimates. Importation is an important factor challenging malaria elimination, especially in low-transmission settings. This paper focuses on Magude and Matutuine, two districts in southern Mozambique with very low malaria transmission. The results show isolation-by-distance in Mozambique, with genetic relatedness decreasing with distances larger than 100 km, and no spatial correlation for distances between 10 and 100 km. But again, strong spatial correlation in distances smaller than 10 km. They report high genetic relatedness between Matutuine and Inhambane, higher than between Matutuine and Magude. Inhambane is the main source of importation in Matutuine, accounting for 63.5% of imported cases. Magude, on the other hand, shows smaller importation and travel rates than Matutuine, as it is a rural area with less mobility. Additionally, they report higher levels of importation and travel in the dry season, when transmission is lower. Also, no association with importation was found for occupation, sex, and other factors. These data have practical implications for public health strategies aiming for malaria elimination, for example, testing and treating travelers from Matutuine in the dry season.

      Strengths:

      The strength of this study lies in the combination of different sources of data - epidemiological, travel, and genetic data - to estimate importation probabilities, and the statistical analyses.

      Weaknesses:

      The authors recognize the limitations related to sample size and the biases of travel reports.

      We appreciate the review and comment about the manuscript.

      Reviewer #2 (Public review):

      Summary:

      Based on a detailed dataset, the authors present a novel Bayesian approach to classify malaria cases as either imported or locally acquired.

      Strengths:

      The proposed Bayesian approach for case classification is simple, well justified, and allows the integration of parasite genomics, travel history, and epidemiological data. The work is well-written, very organized, and brings important contributions both to malaria control efforts in Mozambique and to the scientific community. Understanding the origin of cases is essential for designing more effective control measures and elimination strategies.

      Weakness:

      While the authors aim to classify cases as imported or locally acquired, the work lacks a quantification of the contribution of each case type to overall transmission.

      The method presented here allows for classifying individual cases according to whether the infection occurred locally or was imported during a trip. By definition, it does not look to secondary infections after an importation event. Our next step is to conduct outbreak investigation to quantify the impact of importation events on the overall transmission, but this activity goes beyond the scope of this manuscript. We clarify this in the discussion section.

      The Bayesian rationale is sound and well justified; however, the formulation appears to present an inconsistency that is replicated in both the main text and the Supplementary Material.

      Thank you for pointing out the inconsistency in the final formula. In fact, the final formula corresponds to P(IA | G), instead of P(IA), so:

      instead of

      We have now corrected this error in the new version of the manuscript.

      Reviewer #3 (Public review):

      The authors present an important approach to identify imported P. falciparum malaria cases, combining genetic and epidemiological/travel data. This tool has the potential to be expanded to other contexts. The data was analyzed using convincing methods, including a novel statistical model; although some recognized limitations can be improved. This study will be of interest to researchers in public health and infectious diseases.

      Strengths:

      The study has several strengths, mainly the development of a novel Bayesian model that integrates genomic, epidemiological, and travel data to estimate importation probabilities. The results showed insights into malaria transmission dynamics, particularly identifying importation sources and differences in importation rates in Mozambique. Finally, the relevance of the findings is to suggest interventions focusing on the traveler population to help efforts for malaria elimination.

      Weaknesses:

      The study also has some limitations. The sample collection was not representative of some provinces, and not all samples had sufficient metadata for risk factor analysis, which can also be affected by travel recall bias. Additionally, the authors used a proxy for transmission intensity and assumed some conditions for the genetic variable when calculating the importation probability for specific scenarios. The weaknesses were assessed by the authors.

      We acknowledge the limitations commented by the reviewer. We have the following plans to address the limitations. We will repeat the study for our data collected in 2023, which this time contains a good representation of all the provinces of Mozambique, and completeness of the metadata collection was ensured by implementing a new protocol in January 2023. Regarding the proxy for transmission intensity, we will refine the model by integrating monthly estimates of malaria incidence (previously calibrated to address testing and reporting rates) from the DHIS2 data, taking also into account the date of the reported cases in the analysis.

      Reviewing Editor Comments:

      The reviewers have made specific suggestions that could improve the clarity and accuracy of this report.

      Reviewer #1 (Recommendations for the authors):

      (1) Abstract, lines 36, 37 and 38: "Spatial genetic structure and connectivity were assessed using microhaplotype-based genetic relatedness (identity-by-descent) from 1605 P. falciparum samples collected (...)", but only 540 samples were successfully sequenced, therefore used in spatial genetic structure and connectivity analysis.

      The 540 samples refer to those from Maputo province and are described in Fig. 1. The Spatial and connectivity analyses also included the samples from the rest of the provinces from the multi-cluster sampling scheme. Sample sizes from these provinces are described in Suppl. Table 2, and the total between them and the 540 samples from Maputo are the 1605 samples mentioned in the abstract. We specify this number in the caption of Sup. Fig. 4, and add it now into Fig. 3

      (2) In the Introduction, some epidemiological context about Magude and Matutuine could be added. It is only mentioned in the Discussion section (lines 265-269).

      We have added some context about both districts in the introduction now.

      (3) In the Discussion, lines 241-244, could the lack of structure mean no barriers for gene flow due to high mobility in short distances? Maybe it could only be resolved with a large number of samples.

      This could be an explanation (we mention it in the new version), although it is not something we can prove, or at least in this study.

      Reviewer #2 (Recommendations for the authors):

      The work is well written, very organized, and brings important contributions both to malaria control efforts in Mozambique and to the scientific community. Based on detailed datasets from Mozambique, the authors present a novel Bayesian approach to classify malaria cases as either imported or locally acquired. Understanding the origin of cases is essential for designing more effective control measures and elimination strategies. My review focuses on the Bayesian approach as well as on a few aspects of the presentation of results.

      The authors combine travel history, parasite genetic relatedness, and transmission intensity from different areas to compute the probability of infection occurring in the study area, given the P. falciparum genome. The Bayesian rationale is sound and well justified; however, the formulation appears to present an inconsistency that is replicated in both the main text and the Supplementary Material. According to Bayes' Rule:

      P(I_A |G) = (P(I_A) ∙ P(G|I_A)) / (P(G)),

      with

      P(I_A) = K ∙ T_A ∙ PR_A,

      P(G│I_A) = R'_A,

      and assuming

      P(I_A│G) + P(I_B│G) = 1,

      the expression,

      (T_A ∙ PR_A ∙ R'_A) / (T_A ∙ PR_A ∙ R'_A + T_B ∙ PR_B ∙ R'_B)

      appears to refer to P(I_A│G), not to P(I_A) (as indicated in the main text and Supplementary Material).

      P(I_A│G) + P(I_B│G) = (P(I_A) ∙ P(G|I_A) + P(I_B) ∙ P(G|I_B)) / P(G) = 1

      ⇒P(G) = P(I_A) ∙ P(G|I_A) + P(I_B) ∙ P(G|I_B)

      ⇒P(G) = K ∙ T_A ∙ PR_A ∙ R'_A + K ∙ T_B ∙ PR_B ∙ R'_B

      ⇒P(I_A│G) = (T_A ∙ PR_A ∙ R'_A) / (T_A ∙ PR_A ∙ R'_A + T_B ∙ PR_B ∙ R'_B)

      Please clarify this.

      As mentioned in a previous comment, we acknowledge this point from the reviewer.  In fact, the final formula corresponds to P(IA | G), instead of P(IA), so:

      instead of

      We have now corrected this error in the new version of the manuscript and in the supplementary information.

      Additional comments:

      (1) Figure 3A has a scale that includes negative values, which is not reasonable for R.

      We agree that R estimates are not compatible with negative values. The intention of this scale was to show the overall mean R in the centre, in white, so that blue colours represented values below the average and red values above the average. However, we proceeded to update the figures according to your recommendations.

      (2) I suggest using a common scale from 0 to 0.12 (maximum values among panels) across panels A, C, and D, as well as in Sup Fig 3, to facilitate comparison.

      We updated the figures according to the recommendations.

      (3) The x-axis labels in Figure 3A and Supplementary Figure 2A are not aligned with the x-axis ticks.

      We updated the figures so that the alignment in the x-axis is clear.

      (4) Supplementary Figure 5 would be better presented if the data were divided into four separate panels.

      We have divided the figure into four separate panels.

      (6) Figure 5D is not referenced in the main text.

      We missed the mention, which is now fixed in the new version.

      (7) The authors state: "No significant differences in R were found comparing parasite samples from Magude and the rest of the districts." However, Supplementary Figure 3 shows statistically significant relatedness between parasites from Magude and Matutuine. Please clarify this.

      Answer: we added clarity to this sentence which was indeed confusing.

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction: More background info about malaria in Mozambique would be appreciated.

      We included some contextualisation about malaria in Mozambique and our study districts.

      (2) Why were most of the samples collected from children? Is malaria most prevalent in that group? Information could be added in the introduction.

      Children are usually considered an appropriate sentinel group for malaria surveillance for several reasons. First, most malaria cases reported from symptomatic outpatient visits are children, especially in areas with moderate to high burden. Second (and probably the cause for the first reason), their lower immunity levels, due to lower time of exposure, and their immature system, provides a cleaner scenario of the effects of malaria, since the body response is less adapted from past exposures. Finally, as a vulnerable population, they deserve a stronger focus in surveillance systems. We added a comment in the introduction referring to them as a common sentinel group for surveillance.

      (3) Minor: Check spaces in the text (for example, line 333 and the start of the Discussion).

      Thank you for noticing, we fixed in in the new version

      (4) Minor: In my case, the micro (u) symbol can be observed in Word, but not in PDF.

      One of the symbols produced an error, we hope that the new version is correct now.

      (5) Were COI calculations with MOIRE performed across provinces and regions, or taking all samples as one population?

      Wwe took all samples as one population. However, we validated that the same results (reaching equivalent numbers and the same conclusions) were obtained when run across different populations (regions or provinces). We mention this in the manuscript now.

      (6) Have you tested lower values than 0.04 for PR in Maputo?

      This would not have had any impact in the classification. Only two individuals reported a trip to Maputo city (where we assumed PR=0.04), and none of them were classified as imported. If lower values of PR were assumed, their probabilities of importation would have reduced, so that we would still obtain no imported cases.

      (7) Map (Supplementary Figure 1): Please, improve the resolution (like in the zoom in) and add a scale and a compass rose.

      We improved the resolution of the map. We did not add a scale and a compass rose, but labelled the coordinates as longitude and latitude to clarify the scale and orientation of the map. We added this in the rest of the maps of the manuscript as well.

      (8) In this work, Pimp values were bimodal to 0 or 1, making the classification easy. I wonder in other scenarios, where Pimp values are more intermediate (0.4-0.6), is the threshold at 0.5 still useful? Is there another way, like having a confidence interval of Pimp, to ensure the final classification? A discussion on this topic may be appreciated.

      In this case, we would recommend doing probabilistic analyses, keeping the probability of being imported as the final outcome, and quantifying the importation rates from the weighted sum of probabilities across individuals. We added this clarification in the Methods section: “ In case of obtaining a higher fraction of intermediate values (0.4-0.6), weighted sums of individual probabilities would be more appropriate to better quantify importation rates.”

      (9) Results: More details per panel, not as the whole figure (Figure 2B, Figure 3A, etc) in the manuscript would be appreciated.

      We appreciate the comment and added more details

      (10) Figure 3: Please, add a color legend in panel B (not only in the caption, but in the panel, such as in A, C, D).

      We added a color legend in panel B.

      (11) Do the authors recommend routine surveillance to detect importation in Mozambique, or are these results solid enough to propose strategies? How possible is it that importation rates vary in the future in the south? If so, how feasible is it to implement all this process (including the amplicon sequencing) routinely?

      We added the following text in the discussion: “While these results propose programmatic strategies for the two study districts, routine surveillance to detect importation in Mozambique would allow for identifying new strategies in other districts aiming for elimination, as well as monitoring changes in importation rates in Magude and Matutuine in the future. If scaling molecular surveillance is not feasible, travel reports could be integrated in the routing surveillance to extrapolate the case classification based on the results of this study. “

      (12) Which other proxies of transmission intensity could have been used?

      Better proxies of transmission intensity could be malaria incidence at the monthly level from national surveillance systems, or estimates of force of infection, for example from the use of molecular longitudinal data if available. We added this text in the discussion.

      (13) Can this strategy be applied to P. vivax-endemic areas outside Africa?

      This new method can also be applied to P. vivax-endemic areas outside Africa. Symptomatic P. vivax cases are not necessarily reflecting recent infections, so that travel reports might need to cover longer time periods, which does not require any essential adaptation to the method. We added this text to the discussion.

    1. eLife Assessment

      This study presents an important finding that has identified 27 differentially methylated regions as a signature for non-invasive early cancer detection and predicting prognosis for colorectal cancer. The findings demonstrate promising clinical potential, particularly for improving cancer screening and patient monitoring. In general, the evidence supporting the claims of the authors is solid. A larger sample size will be key to further improving this work in the future. The work will be of interest to researchers interested in cancer diagnosis or colorectal cancer monitoring.

    2. Reviewer #1 (Public review):

      Summary:

      Colorectal cancer (CRC) is the third most common cancer globally and the second leading cause of cancer-related deaths. Colonoscopy and fecal immunohistochemical testing are among the early diagnostic tools that have significantly enhanced patient survival rates in CRC. Methylation dysregulation has been identified in the earliest stages of CRC, offering a promising avenue for screening, prediction, and diagnosis. The manuscript entitled "Early Diagnosis and Prognostic Prediction of Colorectal Cancer through Plasma Methylation Regions" by Zhu et al. presents that a panel of genes with methylation pattern derived from cfDNA (27 DMRs), serving as a noninvasive detection method for CRC early diagnosis and prognosis.

      Strengths:

      The authors provided evidence that the 27 DMRs pattern worked well in predicting CRC distant metastasis, and the methylation score remarkably increased in stages III-IV. Additionally, compared with the traditional tumor marker CEA, 27 DMRs prediction showed a superior sensitivity, highlighting the potential clinical application.

      Weaknesses:

      The major concerns are the design of DMRs screening, the relatively low sensitivity of this DMRs' pattern in detecting early-stage of CRC, the limited size of the cohorts, and the lack of comparison with the traditional diagnosis test.

      Comments on revisions:

      All my concerns have been cleared, and I have no further questions.

    3. Reviewer #2 (Public review):

      In this study, the authors aimed to develop cfDNA markers for comprehensive diagnosis, metastatic assessment, and prognostic prediction of colorectal cancer (CRC). Through integrative analysis of public 450K DNA methylation datasets and in-house targeted bisulfite sequencing (BS-seq) data from CRC and paired normal tissues, as well as plasma samples, they identified a signature comprising 27 differentially methylated regions (DMRs). This signature was subsequently validated for three clinical applications: cancer detection, metastasis prediction, and prognosis assessment.

      Strengths:

      The 27-DMR signature demonstrates value for both diagnosis and prognosis of CRC. Additionally, the datasets generated in this study serve as a valuable resource for the research community.

      Weaknesses:

      The validation cohorts for cancer detection and metastasis prediction were relatively small, which may limit the generalizability of the findings. The cancer detection model's performance does not surpass some published methods or commercial products.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Colorectal cancer (CRC) is the third most common cancer globally and the second leading cause of cancer-related deaths. Colonoscopy and fecal immunohistochemical testing are among the early diagnostic tools that have significantly enhanced patient survival rates in CRC. Methylation dysregulation has been identified in the earliest stages of CRC, offering a promising avenue for screening, prediction, and diagnosis. The manuscript entitled "Early Diagnosis and Prognostic Prediction of Colorectal Cancer through Plasma Methylation Regions" by Zhu et al. presents that a panel of genes with methylation pattern derived from cfDNA (27 DMRs), serving as a noninvasive detection method for CRC early diagnosis and prognosis.

      Strengths:

      The authors provided evidence that the 27 DMRs pattern worked well in predicting CRC distant metastasis, and the methylation score remarkably increased in stage III-IV.

      Weaknesses:

      The major concerns are the design of DMR screening, the relatively low sensitivity of this DMR pattern in detecting early-stage CRC, the limited size of the cohorts, and the lack of comparison with the traditional diagnosis test.

      We sincerely thank the reviewer for their thorough evaluation and constructive feedback on our manuscript. We are encouraged that the reviewer found our 27-DMR panel promising for predicting distant metastasis and for its performance in late-stage CRC. We have carefully considered the weaknesses pointed out and have made revisions to address these concerns, which we believe have significantly strengthened our paper.

      We agree with the reviewer that achieving high sensitivity for early-stage disease is the ultimate goal for any noninvasive screening test. Detecting the minute quantities of cfDNA shed from early-stage tumors is a well-recognized challenge in the field. Although the sensitivity of our current panel for early-stage CRC is modest, its core strengths, lie in its capability to also detect advanced adenomas and its excellent performance in assessing CRC metastasis and prognosis. Furthermore, we have now added a direct comparative analysis of our 27-DMR panel against the most widely used clinical serum biomarker for CRC, carcinoembryonic antigen (CEA), using samples from the same patient cohorts. Our results demonstrate that 27-DMR methylation score significantly outperforms CEA in diagnostic accuracy for early-stage CRC (64% vs. 18%) (Table s7). And in the Discussion section, we have also acknowledged our limitations and suggest that future studies are warranted to combine the cfDNA methylation model with commonly used clinical markers, such as CEA and CA19-9, with the aim of improving the sensitivity for early diagnosis.

      We acknowledge the reviewer's concern regarding the cohort size and validation in larger, prospective, multi-center cohorts is essential before this panel can be considered for clinical application. We have explicitly stated this as a limitation of our study in the Discussion section and have highlighted the need for future large-scale validation studies (Page 18, Lines 367-373). We once again thank the reviewer for their insightful comments, which have allowed us to substantially improve our manuscript. We hope that the revised version is now suitable for publication.

      Reviewer #2 (Public review):

      This work presents a 27-region DMR model for early diagnosis and prognostic prediction of colorectal cancer using plasma methylation markers. While this non-invasive diagnostic and prognostic tool could interest a broad readership, several critical issues require attention.

      Major Concerns:

      (1) Inconsistencies and clarity issues in data presentation

      (a) Sample size discrepancies

      The abstract mentions screening 119 CRC tissue samples, while Figure 1 shows 136 tissues. Please clarify if this represents 119 CRC and 17 normal samples.

      We sincerely thank the reviewer for this careful observation and for pointing out the inconsistency. We apologize for the error and the confusion it caused. Regarding Figure 1: The reviewer is correct. The number 136 in the original Figure 1 was an error. This was due to an inadvertent double-counting of the tumor samples that were used in the differential analysis against adjacent normal tissues. The actual number of tissue samples used in this analysis is 89. We have now corrected this value in the revised Figure 1.

      Regarding the Abstract: The 119 CRC tissue samples mentioned in the abstract represents the total number of unique tumor samples analyzed across all stages of our study. This number is composed of two cohorts: the initial 15 pairs of tissues used for preliminary screening, and the subsequent 89 tissue samples used for validation, totaling 119 samples. We have ensured all sample numbers are now consistent throughout the revised manuscript.

      The plasma sample numbers vary across sections: the abstract cites 161 samples, Figure 1 shows 116 samples, and the Supplementary Methods mentions 77 samples (13 Normal, 15 NAA, 12 AA, 37 CRC).

      We sincerely thank the reviewer for their meticulous review and for identifying these inconsistencies in the plasma sample numbers. We apologize for this oversight and the lack of clarity.

      Figure 1 & Supplementary Methods (77 samples): The number 116 in the original Figure 1 was a clerical error. The correct number is 77, which is the cohort used for our differential methylation analysis. This number is now consistent with the Supplementary Methods. This cohort is composed of 13 Normal, 15 NAA, 12 AA, and 37 CRC samples. The figure has been revised accordingly.

      Abstract (161 samples): The total of 161 plasma samples mentioned in the abstract is the sum of two distinct sample sets used for different stages of our analysis: The 77 samples (13 Normal, 15 NAA, 12 AA, 37 CRC) used for the differential analysis.  An additional 84 samples (33 Normal, 51 CRC) which served as the training set for the LASSO regression model. We have now clarified these distinctions in the text and ensured consistency across the abstract, figures, and methods sections.

      (b) Methodological inconsistencies

      The Supplementary Material reports 477 hypermethylated sites from TCGA data analysis (Δβ>0.20, FDR<0.05), but Figure 1 indicates 499 sites.

      The manuscript states that analyzing TCGA data across six cancer types identified 499 CRC-specific methylation sites, yet Figure 1 shows 477. Please also explain the rationale for selecting these specific cancer types from TCGA.

      We sincerely thank the reviewer for their sharp observation and for highlighting these inconsistencies. We apologize for this clerical error, which occurred when labeling the figure. The numbers 477 and 499 in Figure 1 were inadvertently swapped and the text in Supplementary Material is correct. We have now corrected this error throughout the manuscript to ensure clarity and consistency. We deeply regret the confusion this has caused.

      Regarding the rationale for selecting the cancer types:

      The selection of colorectal, esophageal, gastric, lung, liver, and breast cancers was based on the following strategic criteria to ensure the stringent identification of CRC-specific markers. Firstly, esophageal, gastric, liver, and colorectal cancers all originate from the gastrointestinal tract and share developmental and functional similarities. Comparing CRC against these closely related cancers allowed us to filter out general GI-tract-related methylation patterns and isolate those that are truly unique to colorectal tissue. Secondly, we included lung and breast cancer as they are two of the most common non-GI malignancies worldwide with distinct tissue origins. This helps ensure our identified markers are not just pan-cancer methylation events but are specific to CRC, even when compared against highly prevalent cancers from different lineages. Finally, these six cancer types have some of the largest and most complete datasets available in the TCGA database, including high-quality methylation data. This provided a robust statistical foundation for a reliable cross-cancer comparison. We hope this explanation clarifies our methodology. Thank you again for your valuable feedback.

      "404 CRC-specific DMRs" mentioned in the main text while "404 MCBs" in Figure 1, the authors need to clarify if these terms are interchangeable or how MCBs are defined.

      We sincerely thank the reviewer for pointing out this important inconsistency in terminology. We apologize for the confusion this has caused and for the error in Figure 1. The two terms are closely related in our study. The final 404 markers are technically DMRs that were identified through an analysis of MCBs. To avoid confusion, we have decided to unify the terminology. The manuscript has now been revised to consistently use "DMRs", which is the most accurate final descriptor. The label in Figure 1 has been corrected accordingly.

      (2) Methodological documentation

      The Results section requires a more detailed description of marker identification procedures and justification of methodological choices.

      Figure 3 panels need reordering for sequential citation.

      We thank the reviewer for this valuable suggestion. We agree that the original Results section lacked sufficient detail regarding the marker identification procedures and the justification for our methodological choices. To address this, we have substantially rewritten the "Methylation markers selection" subsection. This revised section provides a clear, step-by-step narrative of our marker discovery. The revised text now integrates the specific methodological details and statistical criteria. For instance, we now explicitly describe the three-pronged approach for the initial TCGA data mining and the specific criteria (Δβ, FDR, log2FC) for each, and the analysis methodology such as Wilcoxon test and LASSO regression analysis. We believe this detailed narrative now provides the necessary description and justification for our methodological choices directly within the results, significantly improving the clarity and logical flow of our manuscript. This revision can be found on (Page 9-11, Lines 180-195, 202-213). We hope these changes fully address the reviewer's concerns.

      We thank the reviewer for pointing out the citation order of the panels in Figure 3. This was a helpful suggestion for improving the clarity of our manuscript. We have now reordered the panels in Figure 3 to ensure they are cited sequentially within the text. These adjustments have been made in the "Development and validation of the CRC diagnosis model" subsection of the Results (Page 11, lines 224-230). We appreciate the reviewer's attention to detail.

      (3) Quality control and data transparency

      No quality control metrics are presented for the in-house sequencing data (e.g., sequencing quality, alignment rate, BS conversion rate, coverage, PCA plots for each cohort).

      The analysis code should be publicly available through GitHub or Zenodo.

      At a minimum, processed data should be made publicly accessible to ensure reproducibility.

      We sincerely thank the reviewer for their valuable and constructive feedback regarding quality control and data transparency. We fully agree that these elements are crucial for ensuring the robustness and reproducibility of our research. As the reviewer suggested, we have made all processed data and the key quality control metrics for each sample including sequencing quality scores, bisulfite (BS) conversion rates, and sequencing coverage publicly available to ensure the reproducibility of our findings. The analysis was performed using standard algorithms as detailed in the Methods section. While we are unable to host the code in a public repository at this time, all analysis scripts are available from the corresponding author upon reasonable request. The data has been deposited in the National Genomics Data Center (NGDC) and is accessible under the accession number OMIX009128. This information is now clearly stated in the "Data and Code Availability" section of the manuscript. We thank the reviewer again for pushing us to improve our manuscript in this critical aspect.

      Reviewer #3 (Public review):

      Summary:

      This article provides a model for early diagnosis and prognostic prediction of Colorectal Cancer and demonstrates its accuracy and usability. However, there are still some minor issues that need to be revised and paid attention to.

      Strengths:

      A large amount of external datasets were used for verification, thus demonstrating robustness and accuracy. Meanwhile, various influencing factors of multiple samples were taken into account, providing usability.

      Weaknesses:

      There are notable language issues that hinder readability, as well as a lack of some key conclusions provided.

      We are very grateful to the reviewer for their positive assessment of our study and for the constructive feedback provided. We are particularly encouraged that the reviewer recognized the strengths of our work, especially the robustness demonstrated through extensive external validation and the practical usability of our model. Regarding the weaknesses, we have taken the comments very seriously and have thoroughly revised the manuscript. We sincerely apologize for the language issues that hindered readability in our initial submission. To address this, the entire manuscript has undergone a comprehensive round of professional language polishing and editing. We have carefully reviewed and revised the text to improve clarity, flow, and grammatical accuracy. Besides, we agree that the conclusions could be stated more explicitly. To rectify this, we have substantially revised the final paragraph of the Discussion and the Conclusion section (Page 14-18, lines 279-305, 319-334, 346-348, 358-360, 367-379). We now more clearly summarize the main findings of our study, emphasize the clinical significance and potential applications of our model, and provide clear take-home messages. We thank you again for your time and insightful comments, which have been invaluable in improving the quality of our paper. We hope the revised manuscript now meets the standards for publication.

      Reviewer #1 (Recommendations for the authors):

      Detail comments are outlined below:

      (1) In this study, the authors have highlighted methylated cfDNA as a noninvasive approach for CRC early diagnosis. However, the small size of cohorts for plasma screening, particularly the sample number of NAA and AA , may cause bias in the selection of DMRs. This bias may lead to inappropriate DMRs for early diagnosis. Furthermore, the similar issues for the training set with a high percentage of late-stage CRC, no AA or NAA samples were included. This absence may be the key factor in screening changed methylated cfDNA that can predict the early stages of CRC.

      We are very grateful to the reviewer for this insightful methodological critique. We agree that cohort composition and sample size are critical factors in the development of robust biomarkers, and we appreciate the opportunity to clarify our study design and the interpretation of our results.

      We agree with the reviewer that the number of precancerous lesion samples (NAA and AA) in our initial plasma screening cohort was limited. This is a valid point. However, it is important to contextualize the role of this step within our overall multi-stage marker selection funnel. The markers evaluated in this plasma cohort were not discovered from this small sample set alone. They were the result of a rigorous pre-selection process based on large-scale public TCGA data and our own tissue-level sequencing. This robust, tissue-based validation ensured that only the most promising CRC-specific markers were advanced for plasma testing. Therefore, while the plasma cohort was modest in size, its purpose was to confirm the circulatory detectability of markers already known to have a strong tissue-of-origin signal, thereby mitigating the potential bias from a smaller discovery set.

      Our primary aim was to first build a model that could robustly and accurately identify a definitive cancer-specific methylation signal. By training the model on clear-cut invasive cancer cases versus healthy controls, we could isolate the most powerful and specific markers for established malignancy. Our working hypothesis was that these strong cancer-specific methylation patterns are initiated during the precursor stages and would therefore be detectable, albeit at lower levels, in precancerous lesions.  Unfortunately, the panel could only identify a limited proportion of precancerous lesions (48.4% in the NAA group and 52.2% in the AA group). We fully agree with the reviewer's sentiment that including a larger and more balanced set of precancerous lesions in future training cohorts could potentially optimize a model specifically for adenoma detection. We have now explicitly added this point to our Discussion section, highlighting it as an important direction for future research (Page 18, lines 367-373).

      (2) The sensitivity of 27 DMRs in the external validation set (for NAA, AA and CRC 0-Ⅱare 48.4%. 52.2% and 66.7%, respectively) were much lower compared with previously published studies, like ColonES assay (DOI: 10.1016/j.eclinm.2022.101717) and ColonSecure test (DOI: 10.1186/s12943-023-01866-z). The 27 DMRs from the layered screening process did not show superior performance in a small population of an external validation cohort. Therefore, it is unlikely that this DMR pattern will be applicable to the general population in the future.

      We sincerely thank the reviewer for their insightful comments and for providing a thorough comparison with the highly relevant ColonES and ColonSecure assays. This has given us an important opportunity to clarify the unique contributions and specific clinical applications of our 27-DMR panel.

      We acknowledge the reviewer's point that the sensitivities of our panel for precancerous lesions (NAA: 48.4%, AA: 52.2%), while substantial, are numerically lower than those reported by the excellent ColonES assay (AA: 79.0%). However, it is important to clarify that while the ColonES and ColonSecure tests are outstanding benchmarks designed primarily for early detection and screening, the primary objective and contribution of our study were slightly different. Our model demonstrated an exceptional ability to predict distant metastasis with an AUC of 0.955 and a strong capacity for predicting overall prognosis with an AUC of 0.867. Our goal was to develop a multi-functional, biologically-rooted biomarker panel that not only contributes to early detection but, more importantly, provides crucial information for post-diagnosis patient management, including staging, risk stratification, and prognostication, from a single preoperative sample. We believe this ability to preoperatively identify high-risk patients who may require more aggressive treatment or intensive surveillance is the key contribution of our work. It provides a distinct clinical utility that complements, rather than directly competes with, pure screening assays.

      We agree with the reviewer that our external validation was performed on a limited cohort, and we have acknowledged this as a limitation in our Discussion section. However, the purpose of this validation was to provide a proof-of-concept for the panel's performance across its multiple functions. The promising and exceptionally high-performing results in the prognostic domain strongly warrant further validation in larger, prospective, multi-center cohorts.

      (3) The 27 DMRs pattern worked well in predicting CRC distant metastasis, and the methylation score remarkably increased in stage III-IV. In contrast, the increase of AA and 0-II groups was very mild in the validation cohort. This observation raises concerns regarding the study design, particularly in the context of the layered screening process and sample assigning.

      We sincerely thank the reviewer for this insightful and critical comment. We agree with the reviewer's observation that the methylation score increased more remarkably in late-stage (III-IV) CRC compared to the milder increase in adenoma (AA) and early-stage (0-II) CRC in the validation cohort. However, the observed pattern is biologically plausible and consistent with the nature of colorectal cancer progression. Carcinogenesis is a multi-step process involving the gradual accumulation of genetic and epigenetic alterations. The methylation changes we identified are likely associated with tumor progression and metastasis. Therefore, it is expected that advanced, metastatic cancers (Stage III-IV), which have undergone significant biological changes, would exhibit a much stronger and more robust methylation signal compared to pre-cancerous lesions (adenomas) or early-stage, non-metastatic cancers (Stage 0-II). The "mild" increase in early stages reflects the initial, more subtle epigenetic alterations, while the "remarkable" increase in late stages reflects the extensive changes required for invasion and metastasis. We believe this graduated increase actually strengthens the validity of our methylation signature, as it mirrors the underlying biological progression of the disease. We hope this response and the corresponding revisions address the reviewer's comments.

      (4) The authors did not provide the 27 DMRs prediction efficacy comparison with other noninvasive CRC assays, like a CEA and a FIT test.

      Thank you for this valuable suggestion. We agree that comparing our model with established non-invasive assays is crucial for demonstrating its clinical potential. Following your advice, we have now included a direct comparison of the diagnostic performance between our model and the traditional tumor marker, carcinoembryonic antigen (CEA), using the external validation cohort. The results show that our model has a significantly higher sensitivity for detecting early-stage colorectal cancer and adenomas compared to CEA. This detailed comparison has been added as Table s7 in the supplementary materials, and the corresponding description has been incorporated into the Results section of our manuscript (Page 12, lines 234-236). Regarding the Fecal Immunochemical Test (FIT), we unfortunately could not perform a direct statistical comparison because very few individuals in our cohort had undergone FIT. A comparison based on such a small sample size would lack statistical power and might not yield meaningful conclusions. We have acknowledged this as a limitation of our study in the Discussion section.We believe these additions and clarifications have substantially strengthened our manuscript. Thank you again for your constructive feedback.

      (5) The authors did not explicitly describe how they assigned the plasma samples to the distinct sets, nor did they specify the criteria for the plasma screen set, training set, and validation set. The detailed information for the patient grouping should be listed.

      Responce: Thank you for this essential feedback. We agree that a transparent and detailed description of the sample allocation process is crucial for the manuscript. We apologize for the previous lack of clarity and have now revised the Methods section to address this. Our patient cohorts were assigned to the screening, training, and validation sets based on a chronological splitting strategy. Specifically, samples were allocated based on the date of collection in a consecutive manner. This approach was chosen to minimize selection bias and to provide a more realistic, forward-looking assessment of the model's performance, simulating a prospective validation scenario. The screening set comprised 89 tissue samples and 77 plasma samples collected between June to December 2020. The primary purpose of this set was for the initial discovery and screening of potential methylation markers. The training set and validation set included 165 plasma samples collected from December 2020 to July 2022. The external validation cohort comprised 166 plasma samples collected from from July 2022 to December 2022. The subsection titled "Study design and samples" within the Methods section of the revised manuscript, which now contains all of this detailed information (Page 6, lines 116-133). We believe this detailed explanation now makes our study design clear and transparent. Thank you again for helping us improve our manuscript.

      Reviewer #2 (Recommendations for the authors):

      The manuscript requires significant language editing to improve clarity and readability. We recommend that the authors seek professional editing services for revision.

      Thank you for your constructive comments on the language of our manuscript. We apologize for any lack of clarity in the previous version. To address this, we have performed a thorough revision of the manuscript. The text has been carefully reviewed and edited by a native English-speaking colleague who is an expert in our research field. We have focused on correcting all grammatical errors, improving sentence structure, and refining the phrasing throughout the document to enhance readability. We are confident that these extensive revisions have significantly improved the clarity of the manuscript. We hope you will find the current version much easier to read and understand.

      Reviewer #3 (Recommendations for the authors):

      (1) However, I think the abstract part of the article is too detailed and should be more concise and shortened. It is not necessary to show detailed values but to summarize the results.

      Thank you for this valuable suggestion. We agree that the previous version of the abstract was overly detailed and that a more concise summary would be more effective for the reader. Following your advice, we have substantially revised the abstract. We have removed the specific numerical values (such as detailed statistics) and have instead focused on summarizing the key findings and their broader implications (Page 3, lines 54-60, 64-66, 70-72). The revised abstract is now shorter and provides a clearer, high-level overview of our study's background, methods, main results, and conclusions. We believe these changes have significantly improved its readability and impact. We hope you will find the current version more appropriate.

      (2) Figure 4, the color in the legend and plot are not the same, and should be revised.

      Thank you for your careful attention to detail and for pointing out the color inconsistency in Figure 4. We apologize for this oversight. We have now corrected the figure as you suggested, ensuring that the colors in the legend perfectly match those in the plot. The revised Figure 4 has been updated in the manuscript. We appreciate your help in improving the quality of our figures.

      (3) Please pay attention to the article format, such as the consistency of fonts and punctuation marks. (For example, Lines 75 and Line 230).

      Thank you for your meticulous review and for pointing out the inconsistencies in our manuscript's formatting. We sincerely apologize for these oversights and any inconvenience they may have caused. Following your feedback, we have carefully corrected the specific issues you highlighted. Furthermore, we have conducted a thorough proofread of the entire manuscript to ensure consistency in all fonts, punctuation marks, and overall adherence to the journal's formatting guidelines. We appreciate your help in improving the presentation and professionalism of our paper.

    1. eLife Assessment

      This important study demonstrates that in Drosophila melanogaster, tachykinin (Tk) expression is regulated by the microbiota. The authors present convincing evidence that axenic flies raised with no microbiota are longer-lived than conventionally reared animals, and that Tk expression and Tk receptors in the nervous system are required for this effect. They further test individual bacterial strains for their role in these effects and connect the effect to loss of lipid stores and suggest that FOXO may be involved in the phenotype, results that are of interest to the fields of environmental perception, host microbiome interactions, and geroscience.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      In this study the authors use a Drosophila model to demonstrate that Tachykinin (Tk) expression is regulated by the microbiota. In Drosophila conventionally reared (CR) flies are typically shorter lived than those raised without a microbiota (axenic). Here, knockdown of Tk expression is found to prevent lifespan shortening by the microbiota and the reduction of lipid stores typically seen in CR flies when compared to axenic counterparts. It does so without reducing food intake or fecundity which are often seen as necessary trade-offs for lifespan extension. Further, the strength of the interaction between Tk and the microbiota is found to be bacteria specific and is stronger in Acetobacter pomorum (Ap) mono-associated flies compared to Levilactobacillus brevis (Lb) mono-association. The impact on lipid storage was also only apparent in Ap-flies.

      Building on these findings the authors show that gut specific knockdown is largely sufficient to explain these phenotypes. Knockdown of the Tk receptor, TkR99D, in neurons recapitulates the lifespan phenotype of intestinal Tk knockdown supporting a model whereby Tk from the gut signals to TkR99D expressing neurons to regulate lifespan. In addition, the authors show that FOXO may have a role in lifespan regulation by the Tk-microbiota interaction. However, they rule out a role for insulin producing cells or Akh-producing cells suggesting the microbiota-Tk interaction regulates lifespan through other, yet unidentified, mechanisms.

      Major comments:

      Overall, I find the key conclusions of the paper convincing. The authors present an extensive amount of experimental work, and their conclusions are well founded in the data. In particular, the impact of TkRNAi on lifespan and lipid levels, the central finding in this study, has been demonstrated multiple times in different experiments and using different genetic tools. As a result, I don't feel that additional experimental work is necessary to support the current conclusions.

      However, I find it hard to assess the robustness of the lifespan data from the other manipulations used (TkR99DRNAi, TkRNAi in dFoxo mutants etc.) because information on the population size and whether these experiments have been replicated is lacking. Can the authors state in the figure legends the numbers of flies used for each lifespan and whether replicates have been done? For all other data it is clear how many replicates have been done, and the methods give enough detail for all experiments to be reproduced.

      Significance:

      Overall, I find the key conclusions of the paper convincing. The authors present an extensive amount of experimental work, and their conclusions are well founded in the data. We have known that the microbiota influence lifespan for some time but the mechanisms by which they do so have remained elusive. This study identifies one such mechanism and as a result opens several avenues for further research. The Tk-microbiota interaction is shown to be important for both lifespan and lipid homeostasis, although it's clear these are independent phenotypes. The fact that the outcome of the Tk-microbiota interaction depends on the bacterial species is of particular interest because it supports the idea that manipulation of the microbiota, or specific aspects of the host-microbiota interaction, may have therapeutic potential.

      These findings will be of interest to a broad readership spanning host-microbiota interactions and their influence on host health. They move forward the study of microbial regulation of host longevity and have relevance to our understanding of microbial regulation of host lipid homeostasis. They will also be of significant interest to those studying the mechanisms of action and physiological roles of Tachykinins.

      Field of expertise: Drosophila, gut, ageing, microbiota, innate immunity

    3. Reviewer #2 (Public review):

      Summary:

      The main finding of this work is that microbiota impacts lifespan though regulating the expression of a gut hormone (Tk) which in turn acts on its receptor expressed on neurons. This conclusion is robust and based on a number of experimental observations, carefully using techniques in fly genetics and physiology: 1) microbiota regulates Tk expression, 2) lifespan reduction by microbiota is absent when Tk is knocked down in gut (specifically in the EEs), 3) Tk knockdown extends lifespan and this is recapitulated by knockdown of a Tk receptor in neurons. These key conclusions are very convincing. Additional data are presented detailing the relationship between Tk and insulin/IGF signalling and Akh in this context. These are two other important endocrine signalling pathways in flies. The presentation and analysis of the data are excellent.

      There are only a few experiments or edits that I would suggest as important to confirm or refine the conclusions of this manuscript. These are:

      (1) When comparing the effects of microbiota (or single bacterial species) in different genetic backgrounds or experimental conditions, I think it would be good to show that the bacterial levels are not impacted by the other intervention(s). For example, the lifespan results observed in Figure 2A are consistent with Tk acting downstream of the microbes but also with Tk RNAi having an impact on the microbiota itself. I think this simple, additional control could be done for a few key experiments. Similarly, the authors could compare the two bacterial species to see if the differences in their effects come from different ability to colonise the flies.

      (2) The effect of Tk RNAi on TAG is opposite in CR and Ax or CR and Ap flies, and the knockdown shows an effect in either case (Figure 2E, Figure 3D). Why is this? Better clarification is required.

      (3) With respect to insulin signalling, all the experiments bar one indicate that insulin is mediating the effects of Tk. The one experiment that does not is using dilpGS to knock down TkR99D. Is it possible that this driver is simply not resulting in an efficient KD of the receptor? I would be inclined to check this, but as a minimum I would be a bit more cautious with the interpretation of these data.

      (4) Is it possible to perform at least one lifespan repeat with the other Tk RNAi line mentioned? This would further clarify that there are no off-target effects that can account for the phenotypes.

      There are a few other experiments that I could suggest as I think they could enrich the current manuscript, but I do not believe they are essential for publication:

      (5) The manuscript could be extended with a little more biochemical/cell biology analysis. For example, is it possible to look at Tk protein levels, Tk levels in circulation, or even TkR receptor activation or activation of its downstream signalling pathways? Comparing Ax and CR or Ap and CR one would expect to find differences consistent with the model proposed. This would add depth to the genetic analysis already conducted. Similarly, for insulin signalling - would it be possible to use some readout of the pathway activity and compare between Ax and CR or Ap and CR?

      (6) The authors use a pan-acetyl-K antibody but are specifically interested in acetylated histones. Would it be possible to use antibodies for acetylated histones? This would have the added benefit that one can confirm the changes are not in the levels of histones themselves.

      (7) I think the presentation of the results could be tightened a bit, with fewer sections and one figure per section.

      Significance:

      The main contribution of this manuscript is the identification of a mechanism that links the microbiota to lifespan. This is very exciting and topical for several reasons:

      (1) The microbiota is very important for overall health but it is still unclear how. Studying the interaction between microbiota and health is an emerging, growing field, and one that has attracted a lot of interest, but one that is often lacking in mechanistic insight. Identifying mechanisms provides opportunities for therapies. The main impact of this study comes from using the fruit fly to identify a mechanism.

      (2) It is very interesting that the authors focus on an endocrine mechanism, especially with the clear clinical relevance of gut hormones to human health recently demonstrated with new, effective therapies (e.g. Wegovy).

      (3) Tk is emerging as an important fly hormone and this study adds a new and interesting dimension by placing TK between microbiota and lifespan.

      I think the manuscript will be of great interest to researchers in ageing, human and animal physiology and in gut endocrinology and gut function.

    4. Reviewer #3 (Public review):

      Summary:

      Marcu et al. demonstrate a gut-neuron axis that is required for the lifespan-shortening effects mediated by gut bacteria. They show that the presence of commensal bacteria-particularly Acetobacter pomorum-promotes Tk expression in the gut, which then binds to neuronal tachykinin receptors to shorten lifespan. Tk has also recently been reported to extend lifespan via adipokinetic hormone (Akh) signaling (Ahrentløv et al., Nat Metab 7, 2025), but the mechanism here appears distinct. The lifespan shortening by Ap via Tk seems to be partially dependent on foxo and independent of both insulin signaling and Akh-mediated lipid mobilization.

      Although the detailed mechanistic link to lifespan is not fully resolved, the experiment and its results clearly show the involvement of the molecules tested. This work adds a valuable dimension to our growing understanding of how gut bacteria influence host longevity. However, there are some points that should be addressed.

      (1) Tk+ EEC activity should be assessed directly, rather than relying solely on transcript levels. Approaches such as CaLexA or GCaMP could be used.

      (2) In Line243, the manuscript states that the reporter activity was not increased in the posterior midgut. However, based on the presented results in Fig4E, there is seemingly not apparent regional specificity. A more detailed explanation is necessary.

      (3) If feasible, assessing foxo activation would add mechanistic depth. This could be done by monitoring foxo nuclear localization or measuring the expression levels of downstream target genes.

      (4) Fig1C uses Adh for normalization. Given the high variability of the result, the authors should (1) check whether Adh expression levels changed via bacterial association and/or (2) compare the results using different genes as internal standard.

      (5) While the difficulty of maintaining lifelong axenic conditions is understandable, it may still be feasible to assess the induction of Tk (i.e.. Tk transcription or EE activity upregulation) by the microbiome on males.

      (6) We also had some concerns regarding the wording of the title.<br /> Fig6B and C suggests that TkR86C, in addition to TkR99D, may be involved in the A. pomorum-lifespan interaction. Consider revising the title to refer more generally to the "tachykinin receptor" rather than only TkR99D.<br /> The difference between "aging" and "lifespan" should also be addressed. While the study shows a role for Tk in lifespan, assessment of aging phenotypes (e.g. Climbing assay, ISC proliferation) beyond the smurf assay is required to make conclusions about aging.

      (7) The statement in Line 82 that EEs express 14 peptide hormones should be supported with an appropriate reference, if available.

      Significance:

      General assessment: The main strength of this study is the careful and extensive lifespan analyses, which convincingly demonstrate the role of gut microbiota in regulating longevity. The authors clarify an important aspect of how microbial factors contribute to lifespan control. The main limitation is that the study primarily confirms the involvement of previously reported signaling pathways, without identifying novel molecular players or previously unrecognized mechanisms of lifespan regulation.

      Advance: The lifespan-shortening effect of Acetobacter pomorum (Ap) has been reported previously, as has the lifespan-shortening effect of Tachykinin (Tk). However, this study is the first to link these two factors mechanistically, which represents a significant and original contribution to the field. The advance is primarily mechanistic, providing new insight into how microbial cues converge on host signaling pathways to influence ageing.

      Audience: This work will be of particular interest to a specialized audience of basic researchers in ageing biology. It will also attract interest from microbiome researchers who are investigating host-microbe interactions and their physiological consequences. The findings will be useful as a conceptual framework for future mechanistic studies in this area.

      Field of expertise: Drosophila ageing, lifespan, microbiome, metabolism

    5. Author response:

      (1) General Statements

      The goal of our study was to mechanistically connect microbiota to host longevity. We have done so using a combination of genetic and physiological experiments, which outline a role for a neuroendocrine relay mediated by the intestinal neuropeptide Tachykinin, and its receptor TkR99D in neurons. We also show a requirement for these genes in metabolic and healthspan effects of microbiota.

      The referees' comments suggest they find the data novel and technically sound. We have added data in response to numerous points, which we feel enhance the manuscript further, and we have clarified text as requested. Reviewer #3 identified an error in Figure 4, which we have rectified. We felt that some specific experiments suggested in review would not add significant further depth, as we articulate below.

      Altogether our reviewers appear to agree that our manuscript makes a significant contribution to both the microbiome and ageing fields, using a large number of experiments to mechanistically outline the role(s) of various pathways and tissues. We thank the reviewers for their positive contributions to the publication process.

      (2) Description of the planned revisions

      Reviewer #2:

      Not…essential for publication…is it possible to look at Tk protein levels?

      We have acquired a small amount of anti-TK antibody and we will attempt to immunostain guts associated with A. pomorum and L. brevis. We are also attempting the equivalent experiment in mouse colon reared with/without a defined microbiota. These experiments are ongoing, but we note that the referee feels that the manuscript is a publishable unit whether these stainings succeed or not.

      (3) Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1:

      Can the authors state in the figure legends the numbers of flies used for each lifespan and whether replicates have been done?

      We have incorporated the requested information into legends for lifespan experiments.

      Do the interventions shorten lifespan relative to the axenic cohort? Or do they prevent lifespan extension by axenic conditions? Both statements are valid, and the authors need to be consistent in which one they use to avoid confusing the reader.

      We read these statements differently. The only experiment in which a genetic intervention prevented lifespan extension by axenic conditions is neuronal TkR86C knockdown (Figure 6B-C). Otherwise, microbiota shortened lifespan relative to axenic conditions, and genetic knockdowns extend blocked this effect (e.g. see lines 131-133). We have ensured that the framing is consistent throughout, with text edited at lines 198-199, 298-299, 311-312, 345-347, 407-408, 424-425, 450, 497-503.

      TkRNAi consistently reduces lipid levels in axenic flies (Figs 2E, 3D), essentially phenocopying the loss of lipid stores seen in control conventionally reared (CR) flies relative to control axenic. This suggests that the previously reported role of Tk in lipid storage - demonstrated through increased lipid levels in TkRNAi flies (Song et al (2014) Cell Rep 9(1): 40) - is dependent on the microbiota. In the absence of the microbiota TkRNAi reduces lipid levels. The lack of acknowledgement of this in the text is confusing

      We have added text at lines 219-222 to address this point. We agree that this effect is hard to interpret biologically, since expressing RNAi in axenics has no additional effect on Tk expression (Figure S7). Consequently we can only interpret this unexpected effect as a possible off-target effect of RU feeding on TAG, specific to axenic flies. However, this possibility does not void our conclusion, because an off-target dimunition of TAG cannot explain why CR flies accumulate TAG following Tk<sup>RNAi</sup> induction. We hope that our added text clarifies.

      I have struggled to follow the authors logic in ablating the IPCs and feel a clear statement on what they expected the outcome to be would help the reader.

      We have added the requested statement at lines 423-424, explaining that we expected the IPC ablation to render flies constitutively long-lived and non-responsive to A pomorum.

      Can the authors clarify their logic in concluding a role for insulin signalling, and qualify this conclusion with appropriate consideration of alternative hypotheses?

      We have added our logic at lines 449-454. In brief, we conclude involvement for insulin signalling because FoxO mutant lifespan does not respond to Tk<sup>RNAi</sup>, and diminishes the lifespan-shortening effect of A. pomorum. However, we cannot state that the effects are direct because we do not have data that mechanistically connects Tk/TkR99D signalling directly in insulin-producing cells. The current evidence is most consistent with insulin signalling priming responses to microbiota/Tk/TkR99D, as per the newly-added text.

      Typographical errors

      We have remedied the highlighted errors, at lines 128-140.

      Reviewer #2:

      it would be good to show that the bacterial levels are not impacted [by TkRNAi]

      We have quantified CFUs in CR flies upon ubiquitous TkRNAi (Figure S5), finding that the RNAi does not affect bacterial load. New text at lines 138-139 articulates this point.

      The effect of Tk RNAi on TAG is opposite in CR and Ax or CR and Ap flies, and the knockdown shows an effect in either case (Figure 2E, Figure 3D). Why is this?

      As per response to Reviewer #1, we have added text at lines 219-222 to address this point.

      Is it possible to perform at least one lifespan repeat with the other Tk RNAi line mentioned?

      We have added another experiment showing longevity upon knockdown in conventional flies, using an independent TkRNAi line (Figure S3).

      Reviewer #3:

      In Line243, the manuscript states that the reporter activity was not increased in the posterior midgut. However, based on the presented results in Fig4E, there is seemingly not apparent regional specificity. A more detailed explanation is necessary.

      We thank the reviewer sincerely for their keen eye, which has highlighted an error in the previous version of the figure. In revisiting this figure we have noticed, to our dismay, that the figures for GFP quantification were actually re-plots of the figures for (ac)K quantification. This error led to the discrepancy between statistics and graphics, which thankfully the reviewer noticed. We have revised the figure to remedy our error, and the statistics now match the boxplots and results text.

      Fig1C uses Adh for normalization. Given the high variability of the result, the authors should (1) check whether Adh expression levels changed via bacterial association

      We selected Adh on the basis of our RNAseq analysis, which showed it was not different between AX and CV guts, whereas many commonly-used “housekeeping” genes were. We have now added a plot to demonstrate (Figure S2).

      The statement in Line 82 that EEs express 14 peptide hormones should be supported with an appropriate reference

      We have added the requested reference (Hung et al, 2020) at line 86.

      (4) Description of analyses that authors prefer not to carry out

      Reviewer #1:

      I'd encourage the authors to provide lifespan plots that enable comparison between all conditions

      We have avoided this approach because the number of survival curves that would need to be presented on the same axis (e.g. 16 for Figure 5) is not legible. However we have ensured that axes on faceted plots are equivalent and with grid lines for comparison. Moreover, our approach using statistical coefficients (EMMs) enables direct quantitative comparison of the differences among conditions.

      Reviewer #2:

      Is it possible that this driver is simply not resulting in an efficient KD of the receptor? I would be inclined to check this

      This comment relates to Figure 7G. We do see an effect of the knockdown in this experiment, so we believe that the knockdown is effective. However the direction of response is not consistent with our hypothesis so the experiment is not informative about the role of these cells. We therefore feel there is little to be gained by testing efficacy of knockdown, which would also be technically challenging because the cells are a small population in a larger tissue which expresses the same transcripts elsewhere (i.e. necessitating FISH).

      Would it be possible to use antibodies for acetylated histones?

      The comment relates to Figure 4C-E. The proposed studies would be a significant amount of work because, to our knowledge, the specific histone marks which drive activation in TK+ cells remain unknown. On the other hand, we do not see how this information would enrich the present story, rather such experiments would appear to be the beginning of something new. We therefore agree with Reviewer #1 (in cross-commenting) that this additional work is not justified.

      Reviewer #3:

      Tk+ EEC activity should be assessed directly, rather than relying solely on transcript levels. Approaches such as CaLexA or GCaMP could be used.

      We agree with reviewers 1-2 (in cross-commenting) that this proposal is non-trivial and not justified by the additional insight that would be gained. As described above, we are attempting to immunostain Tk, which if successful will provide a third line of evidence for regulation of Tk+ cells. However we note that we already have the strongest possible evidence for a role of these cells via genetic analysis (Figure 5).

      While the difficulty of maintaining lifelong axenic conditions is understandable, it may still be feasible to assess the induction of Tk (ie. Tk transcription or EE activity upregulation) by the microbiome on males.

      As the reviewer recognises, maintaining axenic experiments for months on end is not trivial. Given the tendency for males either to simply mirror female responses to lifespan-extending interventions, or to not respond at all, we made the decision in our work to only study females. We have instead emphasised in the manuscript that results are from female flies.

      TkR86C, in addition to TkR99D, may be involved in the A. pomorum-lifespan interaction. Consider revising the title to refer more generally to the "tachykinin receptor" rather than only TkR99D.

      We disagree with this interpretation: the results do not show that TkR86C-RNAi recapitulates the effect of enteric Tk-RNAi. A potentially interesting interaction is apparent, but the data do not support a causal role for TkR86C. A causal role is supported only for TkR99D, knockdown of which recapitulates the longevity of axenic flies and Tk<sup>RNAi</sup> flies_._ Therefore we feel that our current title is therefore justified by the data, and a more generic version would misrepresent our findings.

      The difference between "aging" and "lifespan" should also be addressed.

      The smurf phenotype is a well-established metric of healthspan. Moreover, lifespan is the leading aggregate measure of ageing. We therefore feel that the use of “ageing” in the title is appropriate.

      If feasible, assessing foxo activation would add mechanistic depth. This could be done by monitoring foxo nuclear localization or measuring the expression levels of downstream target genes.

      Foxo nuclear localisation has already been shown in axenic flies (Shin et al, 2011). We have added text and citation at lines 401-402.

    1. eLife Assessment

      This study identifies the uncharacterised protein FAM53C as a novel, potential regulator of the G1/S cell cycle transition, linking its function to the DYRK1A kinase and the RB/p53 pathways. The work is valuable and of interest to the cell cycle field, leveraging a strong computational screen to identify a new candidate. The findings are solid, although confidence in the siRNA depletion phenotypes would have been higher with rescue experiments using an siRNA-resistant cDNA and more robust quantification of some immunoassay data.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      Taylar Hammond and colleagues identified new regulators of the G1/S transition of the cell cycle. They did so by screening publicly available data from the Cancer Dependency Map and identified FAM53C as a positive regulator of the G1/S transition. Using biochemical assays they then show that FAM53 interacts with the DYRK1A kinase to inhibit its function. They show in RPE1 cells that loss of FAMC53 leads to a DYRK1A + P53-dependent cell cycle arrest. Combined inactivation of FAM53C and DYRK1A in a TP53-null background caused S-phase entry with subsequent apoptosis. Finally the authors assess the effect of FAM53C deletion in a cortical organoid model, and in Fam53c knockout mice. Whereas proliferation of the organoids is indeed inhibited, mice show virtually no phenotype.

      The authors have revised the manuscript, and I respond here point-by-point to indicate which parts of the revision I found compelling, and which parts were less convincing. So the numbering is consistent with the numbering in my first review report.

      (1) The p21 knockdowns are a valuable addition, and the claim that other p53 targets than p21 are involved in the FAMC53 RNAi-mediated arrest is now much more solid. Minor detail: if S4D is a quantification of S4C, it is hard to believe that the quantification was done properly (at least the DYRK1Ai conditions). Perhaps S4C is not the best representative example, or some error was made?

      (2a) I appreciate the decision to remove the cyclin D1 phosphorylation data. A more nuanced model now emerges. It is not clear to me however why the Protein Simple immunoassay was used for experiments with RPE cells, and not the cortical organoids. Even though no direct claims are made based on the phospho-cyclin D data in Figure 5E+G, showing these data suggests that FAM53C deletion increases DYRK1A-mediated cyclin D1 phosphorylation. I find it tricky to show these data, while knowing now that this effect could not be shown in the RPE1 cells.<br /> (2b) The quantifications of the immunoassays are not convincing. In multiple experiments, the HSP90 levels vary wildly, which indicates big differences in protein loading if HSP90 is a proper loading control. This is for example problematic for the interpretation of figure 3F and S3I. The cyclin D1 "bands" look extremely similar between siCtrl and siFAM53C (Fig S3I), in fact the two series of 6 samples with different dosages of DYRK1Ai look seem an identical repetition of each other. I did not have to option to overlay them, but it would be important to check if a mistake was made here. The cyclin D1 signals aside, the change in cycD1/HSP90 ratios seems to be entirely caused by differences in HSP90 levels. Careful re-analysis of the raw data and more equal loading seem necessary. The same goes (to a lesser extent) for S3J+K.<br /> (2c) the new model in Fig S4L: what do the arrows at the right FAM53C and p53 that merge a point straight towards S-phase mean? They suggest that p53 (and FAM53C) directly promote S-phase progression, but most likely this is not what the authors intended with it.

      (3) Clear; nicely addressed.

      (4) Thank you for correcting.

      (5) I appreciate that the authors are now more careful to call the IMPC analysis data preliminary. This is acceptable to me, but nevertheless, I suggest the authors to seriously consider taking this part entirely out. The risk of chance finding and the extremely skewed group sizes (as reviewer #2 had pointed out) hamper the credibility of this statistical analysis.

    3. Reviewer #2 (Public review):

      The authors sought to identify new regulators of the G1/S transition by mining the Cancer Dependency Map (DepMap) co-dependency dataset. This analysis successfully identified FAM53C, a poorly characterized protein, as a candidate. The strength of the paper lies in this initial discovery and the subsequent biochemical work convincingly showing that FAM53C can directly interact with the kinase DYRK1A, a known cell cycle regulator.

      The authors then present evidence, primarily from acute siRNA knockdown in RPE-1 cells, that loss of FAM53C induces a strong G1 cell cycle arrest. Their follow-up investigation proposes a model where FAM53C normally inhibits DYRK1A, thereby protecting Cyclin D from degradation and preventing p53 activation, to allow for G1/S progression. The authors have commendably addressed some concerns from the initial review: they have now demonstrated the G1 arrest using two independent siRNAs (an improvement over the initial pool), shown the effect in several additional cancer cell lines (U2OS, A549, HCT-116), and developed a more nuanced model that incorporates p53 activation, which helps to explain some of the complex data.

      However, a central and critical weakness persists. The entire functional model is built upon the very strong G1 arrest phenotype observed in vitro following acute knockdown. This finding is in stark contrast to data from other contexts. As the authors note, the knockout of Fam53c in mice results in minimal phenotypes, and the DepMap data itself suggests the gene is largely non-essential in most cancer cell lines.

      This major discrepancy creates two competing interpretations:

      As the authors suggest, FAM53C has a critical role in the cell cycle, but its loss is rapidly masked by compensatory mechanisms in long-term knockout models (like iPSCs and mice) or in established cancer cell lines.

      The strong acute G1 arrest is an experimental artifact of the siRNA-mediated knockdown, and not a true reflection of FAM53C's primary function.

      The authors' new controls (using two individual siRNAs and showing the arrest is RB-dependent) make an off-target effect less likely, but they do not definitively rule it out. The gold-standard experiment to distinguish between these two possibilities-a rescue of the phenotype using an siRNA-resistant cDNA-has not been performed.

      Because this key control is missing, the foundation of the paper's functional claims is not as solid as it needs to be. While the study provides an interesting and valuable new candidate for the cell cycle field to investigate, readers should be cautious in accepting the strength of FAM53C's role in the G1/S transition until this central discrepancy is definitively resolved.

    4. Reviewer #3 (Public review):

      Summary:

      In this study Hammond et al. investigated the role of Dual-specificity Tyrosine Phosphorylation regulated Kinase 1A (DYRK1) in G1/S transition. By exploiting Dependency Map portal, they identified a previously unexplored protein FAM53C as potential regulator of G1/S transition. Using RNAi, they confirmed that depletion of FAM53C suppressed proliferation of human RPE1 cells and that this phenotype was dependent on the presence protein RB. In addition, they noted increased level of CDKN1A transcript and p21 protein that could explain G1 arrest of FAM53C-depleted cells but surprisingly, they did not observe activation of other p53 target genes. Proteomic analysis identified DYRK1 as one of the main interactors of FAM53C and the interaction was confirmed in vitro. Further, they showed that purified FAM53C blocked the ability of DYRK1 to phosphorylate cyclin D in vitro although the activity of DYRK1 was likely not inhibited (judging from the modification of FAM53C itself). Instead, it seems more likely that FAM53C competes with cyclin D in this assay. Authors claim that the G1 arrest caused by depletion of FAM53C was rescued by inhibition of DYRK1 but this was true only in cells lacking functional p53. This is quite confusing as DYRK1 inhibition reduced the fraction of G1 cells in p53 wild type cells as well as in p53 knock-outs, suggesting that FAM53C may not be required for regulation of DYRK1 function. Instead of focusing on the impact of FAM53C on cell cycle progression, authors moved towards investigating its potential (and perhaps more complex) roles in differentiation of IPSCs into cortical organoids and in mice. They observed a lower level of proliferating cells in the organoids but if that reflects an increased activity of DYRK1 or if it is just an off-target effect of the genetic manipulation remains unclear. Even less clear is the phenotype in FAM53C knock-out mice. Authors did not observe any significant changes in survival nor in organ development but they noted some behavioral differences. Weather and how these are connected to the rate of cellular proliferation was not explored. In the summary, the study identified previously unknown role of FAM53C in proliferation but failed to explain the mechanism and its physiological relevance at the level of tissues and organism. Although some of the data might be of interest, in current form the data is too preliminary to justify publication.

      Major comments:

      (1) Whole study is based on one siRNA to Fam53C and its specificity was not validated. Level of the knock down was shown only in the first figure and not in the other experiments. The observed phenotypes in the cell cycle progression may be affected by variable knock-down efficiency and/or potential off target effects.

      (2) Experiments focusing on the cell cycle progression were done in a single cell line RPE1 that showed a strong sensitivity to FAM53C depletion. In contrast, phenotypes in IPSCs and in mice were only mild suggesting that there might be large differences across various cell types in the expression and function of FAM53C. Therefore, it is important to reproduce the observations in other cell types.

      (3) Authors state that FAM53C is a direct inhibitor of DYRK1A kinase activity (Line 203), however this model is not supported by the data in Fig 4A. FAM53C seems to be a good substrate of DYRK1 even at high concentrations when phosphorylations of cyclin D is reduced. It rather suggests that DYRK1 is not inhibited by FAM53C but perhaps FAM53C competes with cyclin D. Further, authors should address if the phosphorylation of cyclin D is responsible for the observed cell cycle phenotype. Is this Cyclin D-Thr286 phosphorylation, or are there other sites involved?

      (4) At many places, information on statistical tests is missing and SDs are not shown in the plots. For instance, what statistics was used in Fig 4C? Impact of FAM53C on cyclin D phosphorylation does not seem to be significant. IN the same experiment, does DYRK1 inhibitor prevent modification of cyclin D?

      (5) Validation of SM13797 compound in terms of specificity to DYRK1 was not performed.

      (6) A fraction of cells in G1 is a very easy readout but it does not measure progression through the G1 phase. Extension of the S phase or G2 delay would indirectly also result in reduction of the G1 fraction. Instead, authors could measure the dynamics of entry to S phase in cells released from a G1 block or from mitotic shake off.

      Comments to the revised manuscript:

      In the revised version of the manuscript, authors addressed most of the critical points. They now include new data with depletion of FAM53C using single siRNAs that show small but significant enrichment of population of the G1 cells. This G1 arrest is likely caused by a combined effects on induction of p21 expression and decreased levels of cyclin D1. Authors observed that inhibition of DYRK1 rescued cyclin D1 levels in FAM53 depleted cells suggesting that FAM53C may inhibit DYRK1. This possibility is also supported by in vitro experiments. On the other hand, inhibition of DYRK1 did not rescue the G1 arrest upon depletion of FAM53C, suggesting that FAM53C may have also DYRK1-independent role in G1. Functional rescue experiments with cyclin D1 mutants and detection of DYRK1 activity in cells would be necessary to conclusively explain the function of FAM53C in progression through G1 phase but unfortunately these experiments were technically not possible. Knock out of FAM53C in iPSCs and in mice suggest that FAM53C may have additional functions besides the cell cycle control and/or that adaptation may have occurred in these model systems. Overall, the study implicated FAM53C in fine tuning DYRK1 activity in cells that may to some extent influence the progression through G1 phase. In addition, FAM53C may also have DYRK1 and cell cycle independent functions that remain to be addressed by future studies.

    5. Author response:

      (1) General Statements

      We thank the Reviewers for a fair review of our work and helpful suggestions. We have significantly revised the manuscript in response to these suggestions. We provide a point-by-point response to the Reviewers below but wanted to highlight in our response a recurring concern related to the strong cell cycle arrest observed upon the acute FAM53C knock-down being different than the limited phenotypes in other contexts, including the knockout mice and DepMap data.

      First, we now show that we can recapitulate the strong G1 arrest resulting from the FAM53C knock-down using two independent siRNAs in RPE-1 cells, supporting the specificity of the effects.

      Second, the G1 arrest that results from the FAM53C knock-down is also observed in cells with inactive p53, suggesting it is not due to a non-specific stress response due to “toxic” siRNAs. In addition, the arrest is dependent on RB, which fits with the genetic and biochemical data placing FAM53C upstream of RB, further supporting a specific phenotype.

      Third, we have performed experiments in other human cells, including cancer cell lines. As would be expected for cancer cells, the G1 arrest is less pronounced but is still significant, indicating that the G1 arrest is not unique to RPE-1 cells.

      Fourth, it is not unexpected that compensatory mechanisms would be activated upon loss of FAM53C during development or in cancer – which may explain the lack of phenotypes in vivo or upon long-term knockout. This has been true for many cell cycle regulators, either because of compensation by other family members that have overlapping functions, or by a larger scale rewiring of signaling pathways. 

      (2) Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity): 

      Summary: 

      Taylar Hammond and colleagues identified new regulators of the G1/S transition of the cell cycle.

      They did so by screening public available data from the Cancer Dependency Map, and identified FAM53C as a positive regulator of the G1/S transition. Using biochemical assays they then show that FAM53 interacts with the DYRK1A kinase to inhibit its function. DYRK1A in its is known to induce degradation of cyclin D, leading the authors to propose a model in which DYRK1Adependent cyclin D degradation is inhibited by FAM53C to permit S-phase entry. Finally the authors assess the effect of FAM53C deletion in a cortical organoid model, and in Fam53c knockout mice. Whereas proliferation of the organoids is indeed inhibited, mice show virtually no phenotype.  

      Major comments: 

      The authors show convincing evidence that FAM53C loss can reduce S-phase entry in cell cultures, and that it can bind to DYRK1A. However, FAM53 has multiple other binding partners and I am not entirely convinced that negative regulation of DYRK1A is the predominant mechanism to explain its effects on S-phase entry. Some of the claims that are made based on the biochemical assays, and on the physiological effects of FAM53C are overstated. In addition, some choices made methodology and data representation need further attention. 

      (1) The authors do note that P21 levels increase upon FAM53C. They show convincing evidence that this is not a P53-dependent response. But the claim that " p21 upregulation alone cannot explain the G1 arrest in FAM53C-deficient cells (line 138-139) is misleading. A p53-independent p21 response could still be highly relevant. The authors could test if FAM53C knockdown inhibits proliferation after p21 knockdown or p21 deletion in RPE1 cells. 

      The Reviewer raises a great point. Our initial statement needed to be clarified and also need more experimental support. We have performed experiments where we knocked down FAM53C and p21 individually, as well as in combination, in RPE-1 cells. These experiment show that p21 knock-down is not sufficient to negate the cell cycle arrest resulting from the FAM53C knockdown in RPE-1 cells (Figure 4B,C and Figure S4C,D).

      We now extended these experiments to conditions where we inhibited DYRK1A, and we also compared these data to experiments in p53-null RPE-1 cells. Altogether, these experiments point to activation of p53 downstream of DYRK1A activation upon FAM53C knock-down, and indicate that p21 is not the only critical p53 target in the cell cycle arrest observed in FAM53C knock-down cells (Figure 4 and Figure S4).

      (2) The authors do not convincingly show that FAM53C acts as a DYRK1A inhibitor in cells. Figures 4B+C and S4B+C show extremely faint P-CycD1 bands, and tiny differences in ratios. The P values are hovering around the 0.05, so n=3 is clearly underpowered here. Total CycD1 levels also correlate with FAM53C levels, which seems to affect the ratios more than the tiny pCycD1 bands. Why is there still a pCycD1 band visible in 4B in the GFP + BTZ + DYRK1Ai condition? And if I look at the data points I honestly don't understand how the authors can conclude from S4C that knockdown of siFAM53C increases (DYRK1A dependent) increases in pCycD1 (relative to total CycD1). In figure 5C, no blot scans are even shown, and again the differences look tiny. So the authors should either find a way to make these assays more robust, or alter their claims appropriately. 

      We appreciate these comments from the Reviewer and have significantly revised the manuscript to address them.

      The analysis of Cyclin D phosphorylation and stability are complicated by the upregulation of p21 upon FAM53C knock-down, in particular because p21 can be part of Cyclin D complexes, which may affect its protein levels in cells (as was nicely showed in a previous study from the lab of Tobias Meyer – Chen et al., Mol Cell, 2013). Instead of focusing on Cyclin D levels and stability, we refocused the manuscript on RB and p53 downstream of FAM53C loss.

      We removed previous panel 4B from the revised manuscript. For panels 4E and S4B (now panels S3J and S3K)), we used a true “immunoassay” (as indicated in the legend – not an immunoblot), which is much more quantitative and avoids error-prone steps in standard immunoblots (“Western blots”). Briefly, this system was developed by ProteinSimple. It uses capillary transfer of proteins and ELISA-like quantification with up to 6 logs of dynamic range (see their web site https://www.proteinsimple.com/wes.html). The “bands” we show are just a representation of the luminescence signals in capillaries. We made sure to further clarify the figure legends in the revised manuscript.

      The representative Western blot images for 5C-D (now 5F-G) in the original submission are shown in Figure 5E, we apologize if this was not clear. The differences are small, which we acknowledge in the revised manuscript. Note that several factors can affect Cyclin D levels in cells, including the growth rate and the stage of the cell cycle. Our FACS analysis shows that normal organoids have ~63% of cells in G1 and ~13% in S phase; the overall lower proportion of S-phase cells in organoids may make the immunoblot difference appear smaller, with fewer cycling cells resulting in decreased Cyclin D phosphorylation.

      Nevertheless, the Reviewer brings up a good point and comments from this Reviewer and the others made us re-think how to best interpret our results. As discussed above, we re-read carefully the Meyer paper and think that FAM53C’s role and DYRK1A activity in cells may be understood when considering levels of both CycD and p21 at the same time in a continuum. While our genetic and biochemical data support a role for FAM53C in DYRK1A inhibition, it is likely that the regulation of cell cycle progression by FAM53C is not exclusively due to this inhibition. As discussed above and below, we noted an upregulation of p21 upon FAM53C knock-down, and activation of p53 and its targets likely contributes significantly to the phenotypes observed. We added new experiments to support this more complex model (Figure 4 and Figure S4, with new model in S4L).

      (3) The experiments to test if DYRK1A inhibition could rescue the G1 arrest observed upon FAM53C knockdown are not entirely convincing either. It would be much more convincing if they also perform cell counting experiments as they have done in Figures 1F and 1G, to complement the flow cytometry assays. I suggest that the authors do these cell counting experiments in RPE1 +/- P53 cells as well as HCT116 cells. In addition, did the authors test if P21 is induced by DYRK1Ai in HCT116 cells? 

      We repeated the experiments with the DYRK1A inhibitor and counted the cells. In p53-null RPE1 cells, we found that cell numbers do not increase in these conditions where we had observed a cell cycle re-entry (Fig. 4E), which was accompanied by apoptotic cell death (Fig. S4I). Thus, cells re-enter the cell cycle but die as they progress through S-phase and G2/M. We note that inhibition of DYRK1A has been shown to decrease expression of G2/M regulators (PMID: 38839871), which may contribute to the inability of cells treated to DYRK1Ai to divide. Because our data in RPE-1 cells showed that p21 knock-down was not sufficient to allow the FAM53C knock-down cells to re-enter the cell cycle, we did not further analyze p21 in HCT-116 cells.

      (4) The data in Figure 5C and 5D are identical, although they are supposed to represent either pCycD1 ratios or p21 levels. This is a problem because at least one of the two cannot be true. Please provide the proper data and show (representative) images of both data types.

      We apologize for these duplicated panels in the original submission. We now replaced the wrong panel with the correct data (Fig. 5F,G). 

      (5) Line 246: "Fam53c knockout mice display developmental and behavioral defects." I don't agree with this claim. The mutant mice are born at almost the expected Mendelian ratios, the body weight development is not consistently altered. But more importantly, no differences in adult survival or microscopic pathology were seen. The authors put strong emphasis on the IMPC behavioral analysis, but they should be more cautious. The IMPC mouse cohorts are tested for many other phenotypes related to behavior and neurological symptoms and apparently none of these other traits were changed in the IMPC Famc53c-/- cohort. Thus, the decreased exploration in a new environment could very well be a chance finding. The authors need to take away claims about developmental and behavioral defects from the abstract, results and discussion sections; the data are just too weak to justify this. 

      We agree with the Reviewer that, although we observed significant p-values, this original statement may not be appropriate in the biological sense. We made sure in the revised manuscript to carefully present these data.

      Minor comments: 

      (6) Can the authors provide a rationale for each of the proteins they chose to generate the list of the 38 proteins in the DepMap analysis? I looked at the list and it seems to me that they do not all have described functions in the G1/S transition. The analysis may thus be biased. 

      To address this point, we updated Table S1 (2nd tab) to provide a better rationale for the 38 factors chosen. Our focus was on the canonical RB pathway and we included RB binding proteins whose function had suggested they may also be playing a role in the G1/S transition. We do agree that there is some bias in this selection (e.g., there are more RB binding factors described) but we hope the Reviewer will agree with us that this list and the subsequent analysis identified expected factors, including FAM53C. Future studies using this approach and others will certainly identify new regulators of cell cycle progression.

      (7) Figure 1B is confusing to me. Are these just some (arbitrarily) chosen examples? Consider leaving this heatmap out altogether, of explain in more detail. 

      We agree with the Reviewer that this panel was not necessarily useful and possibly in the wrong place, and we removed it from the manuscript. We replaced it with a cartoon of top hits in the screen.

      (8) The y-axes in Figures 2C, 2D, 2E, and 4D are misleading because they do not start at 0. Please let the axis start at 0, or make axis breaks. 

      We re-graphed these panels.

      (9) Line 229: " Consequences ... brain development." This subheader is misleading, because the in vitro cortical organoid system is a rather simplistic model for brain development, and far away from physiological brain development. Please alter the header. 

      We changed the header to “Consequences of FAM53C inactivation in human cortical organoids in culture”.

      (10) Figure S5F: the gating strategy is not clear to me. In particular, how do the authors know the difference between subG1 and G1 DAPI signals? Do they interpret the subG1 as apoptotic cells? If yes, why are there so many? Are the culturing or harvesting conditions of these organoids suboptimal? Perhaps the authors could consider doing IF stainings on EdU or BrdU on paraffin sections of organoids to obtain cleaner data?

      Thank you for your feedback. The subG1 population in the original Figure S5F represents cells that died during the dissociation step of the organoids for FACS analysis. To address this point, we performed live & dead staining to exclude dead cells and provide clearer data. We refined gating strategy for better clarity in the new S5F panel.

      (11) Figure S6A; the labeling seems incorrect. I would think that red is heterozygous here, and grey mutant. 

      We fixed this mistake, thank you. 

      Reviewer #1 (Significance): 

      The finding that the poorly studied gene FAM53C controls the G1/S transition in cell lines is novel and interesting for the cell cycle field. However, the lack of phenotypes in Famc53-/- mice makes this finding less interesting for a broader audience. Furthermore, the mechanisms are incompletely dissected. The importance of a p53-indepent induction of p21 is not ruled out. And while the direct inhibitory interaction between FAM53C and DYRK1A is convincing (and also reported by others; PMID: 37802655), the authors do not (yet) convincingly show that DYRK1A inhibition can rescue a cell proliferation defect in FAM53C-deficient cells. 

      Altogether, this study can be of interest to basic researchers in the cell cycle field. 

      I am a cell biologist studying cell cycle fate decisions, and adaptation of cancer cells & stem cells to (drug-induced) stress. My technical expertise aligns well with the work presented throughout this paper, although I am not familiar with biolayer interferometry. 

      Reviewer #2 (Evidence, reproducibility and clarity): 

      Summary 

      In this study Hammond et al. investigated the role of Dual-specificity Tyrosine Phosphorylation regulated Kinase 1A (DYRK1) in G1/S transition. By exploiting Dependency Map portal, they identified a previously unexplored protein FAM53C as potential regulator of G1/S transition. Using RNAi, they confirmed that depletion of FAM53C suppressed proliferation of human RPE1 cells and that this phenotype was dependent on the presence protein RB. In addition, they noted increased level of CDKN1A transcript and p21 protein that could explain G1 arrest of FAM53Cdepleted cells but surprisingly, they did not observe activation of other p53 target genes. Proteomic analysis identified DYRK1 as one of the main interactors of FAM53C and the interaction was confirmed in vitro. Further, they showed that purified FAM53C blocked the ability of DYRK1 to phosphorylate cyclin D in vitro although the activity of DYRK1 was likely not inhibited (judging from the modification of FAM53C itself). Instead, it seems more likely that FAM53C competes with cyclin D in this assay. Authors claim that the G1 arrest caused by depletion of FAM53C was rescued by inhibition of DYRK1 but this was true only in cells lacking functional p53. This is quite confusing as DYRK1 inhibition reduced the fraction of G1 cells in p53 wild type cells as well as in p53 knock-outs, suggesting that FAM53C may not be required for regulation of DYRK1 function. Instead of focusing on the impact of FAM53C on cell cycle progression, authors moved towards investigating its potential (and perhaps more complex) roles in differentiation of IPSCs into cortical organoids and in mice. They observed a lower level of proliferating cells in the organoids but if that reflects an increased activity of DYRK1 or if it is just an off target effect of the genetic manipulation remains unclear. Even less clear is the phenotype in FAM53C knock-out mice. Authors did not observe any significant changes in survival nor in organ development but they noted some behavioral differences. Weather and how these are connected to the rate of cellular proliferation was not explored. In the summary, the study identified previously unknown role of FAM53C in proliferation but failed to explain the mechanism and its physiological relevance at the level of tissues and organism. Although some of the data might be of interest, in current form the data is too preliminary to justify publication.

      Major points 

      (1) Whole study is based on one siRNA to Fam53C and its specificity was not validated. Level of the knock down was shown only in the first figure and not in the other experiments. The observed phenotypes in the cell cycle progression may be affected by variable knock-down efficiency and/or potential off target effects. 

      We thank the Reviewer for raising this important point. First, we need to clarify that our experiments were performed with a pool of siRNAs (not one siRNA). Second, commercial antibodies against FAM53C are not of the best quality and it has been challenging to detect FAM53C using these antibodies in our hands – the results are often variable. In addition, to better address the Reviewer’s point and control for the phenotypes we have observed, we performed two additional series of experiments: first, we have confirmed G1 arrest in RPE-1 cells with individual siRNAs, providing more confidence for the specificity of this arrest (Fig. S1B); second, we have new data indicating that other cell lines arrest in G1 upon FAM53C knock-down (Fig. S1E,F and Fig. 4F).

      (2) Experiments focusing on the cell cycle progression were done in a single cell line RPE1 that showed a strong sensitivity to FAM53C depletion. In contrast, phenotypes in IPSCs and in mice were only mild suggesting that there might be large differences across various cell types in the expression and function of FAM53C. Therefore, it is important to reproduce the observations in other cell types. 

      As mentioned above, we have new data indicating that other cell lines arrest in G1 upon FAM53C knock-down (three cancer cell lines) (Fig. S1E,F and Fig. 4F).

      (3) Authors state that FAM53C is a direct inhibitor of DYRK1A kinase activity (Line 203), however this model is not supported by the data in Fig 4A. FAM53C seems to be a good substrate of DYRK1 even at high concentrations when phosphorylations of cyclin D is reduced. It rather suggests that DYRK1 is not inhibited by FAM53C but perhaps FAM53C competes with cyclin D. Further, authors should address if the phosphorylation of cyclin D is responsible for the observed cell cycle phenotype. Is this Cyclin D-Thr286 phosphorylation, or are there other sites involved? 

      We revised the text of the manuscript to include the possibility that FAM53C could act as a competitive substrate and/or an inhibitor.

      We removed most of the Cyclin D phosphorylation/stability data from the revised manuscript. As the Reviewers pointed out, some of these data were statistically significant but the biological effects were small. As discussed above in our response to Reviewer #1, the analysis of Cyclin D phosphorylation and stability are complicated by the upregulation of p21 upon FAM53C knockdown, in particular because p21 can be part of Cyclin D complexes, which may affect its protein levels in cells (as was nicely showed in a previous study from the lab of Tobias Meyer – Chen et al., Mol Cell, 2013). Instead of focusing on Cyclin D levels and stability, we refocused the manuscript on RB and p53 downstream of FAM53C loss.

      We note, however, that we used specific Thr286 phospho-antibodies, which have been used extensively in the field. Our data in Figure 1 with palbociclib place FAM53C upstream of Cyclin D/CDK4,6. We performed Cyclin D overexpression experiments but RPE-1 cells did not tolerate high expression of Cyclin D1 (T286A mutant) and we have not been able to conduct more ‘genetic’ studies. 

      (4) At many places, information on statistical tests is missing and SDs are not shown in the plots. For instance, what statistics was used in Fig 4C? Impact of FAM53C on cyclin D phosphorylation does not seem to be significant. In the same experiment, does DYRK1 inhibitor prevent modification of cyclin D? 

      As discussed above, we removed some of these data and re-focused the manuscript on p53-p21 as a second pathway activated by loss of FAM53C.

      (5) Validation of SM13797 compound in terms of specificity to DYRK1 was not performed. 

      This is an important point. We had cited an abstract from the company (Biosplice) but we agree that providing data is critical. We have now revised the manuscript with a new analysis of the compound’s specificity using kinase assays. These data are shown in Fig. S3F-H.

      (6) A fraction of cells in G1 is a very easy readout but it does not measure progression through the G1 phase. Extension of the S phase or G2 delay would indirectly also result in reduction of the G1 fraction. Instead, authors could measure the dynamics of entry to S phase in cells released from a G1 block or from mitotic shake off. 

      The Reviewer made a good point. As discussed in our response to Reviewer #1, with p53-null RPE-1 cells, we found that cell numbers do not increase in these conditions where we had observed a cell cycle re-entry (Fig. 4E), which was accompanied by apoptotic cell death (Fig. S4I). Thus, cells re-enter the cell cycle but die as they progress through S-phase and G2/M. We note that inhibition of DYRK1A has been shown to decrease expression of G2/M regulators (PMID: 38839871), which may contribute to the inability of cells treated to DYRK1Ai to divide.

      Because our data in RPE-1 cells showed that p21 knock-down was not sufficient to allow the FAM53C knock-down cells to re-enter the cell cycle, we did not further analyze p21 in HCT-116 cells. These data indicate that G1 entry by flow cytometry will not always translate into proliferation.

      Other points:

      (7) Fig. 2C, 2D, 2E graphs should begin with 0 

      We remade these graphs.

      (8) Fig. 5D shows that the difference in p21 levels is not significant in FAM53C-KO cells but difference is mentioned in the text. 

      We replaced the panel by the correct panel; we apologize for this error.

      (9) Fig. 6D comparison of datasets of extremely different sizes does not seem to be appropriate

      We agree and revised the text. We hope that the Reviewer will agree with us that it is worth showing these data, which are clearly preliminary but provide evidence of a possible role for FAM53C in the brain.

      (10) Could there be alternative splicing in mice generating a partially functional protein without exon 4? Did authors confirm that the animal model does not express FAM53C? 

      We performed RNA sequencing of mouse embryonic fibroblasts derived from control and mutant mice. We clearly identified fewer reads in exon 4 in the knockout cells, and no other obvious change in the transcript (data not shown). However, immunoblot with mouse cells for FAM53C never worked well in our hands. We made sure to add this caveat to the revised manuscript.

      Reviewer #2 (Significance): 

      Main problem of this study is that the advanced experimental models in IPSCs and mice did not confirm the observations in the cell lines and thus the whole manuscript does not hold together. Although I acknowledge the effort the authors invested in these experiments, the data do not contribute to the main conclusion of the paper that FAM53C/DYRK1 regulates G1/S transition. 

      Reviewer #3 (Evidence, reproducibility and clarity: 

      This paper identifies FAM53C as a novel regulator of cell cycle progression, particularly at the G1/S transition, by inhibiting DYRK1A. Using data from the Cancer Dependency Map, the authors suggest that FAM53C acts upstream of the Cyclin D-CDK4/6-RB axis by inhibiting DYRK1A.  Specifically, their experiments suggest that FAM53C Knockdown induces G1 arrest in cells, reducing proliferation without triggering apoptosis. DYRK1A Inhibition rescues G1 arrest in P53KO cells, suggesting FAM53C normally suppresses DYRK1A activity. Mass Spectrometry and biochemical assays confirm that FAM53C directly interacts with and inhibits DYRK1A. FAM53C Knockout in Human Cortical Organoids and Mice leads to cell cycle defects, growth impairments, and behavioral changes, reinforcing its biological importance. 

      Strength of the paper: 

      The study introduces a novel cell cycle control signalling module upstream of CDK4/6 in G1/S regulation which could have significant impact. The identification of FAM53C using a depmap correlation analysis is a nice example of the power of this dataset. The experiments are carried out mostly in a convincing manner and support the conclusions of the manuscript. 

      Critique: 

      (1) The experiments rely heavily on siRNA transfections without the appropriate controls. There are so many cases of off-target effects of siRNA in the literature, and specifically for a strong phenotype on S-phase as described here, I would expect to see solid results by additional experiments. This is especially important since the ko mice do not show any significant developmental cell cycle phenotypes. Moreover, FAM53C does not show a strong fitness effect in the depmap dataset, suggesting that it is largely non-essential in most cancer cell lines. For this paper to reach publication in a high-standard journal, I would expect that the authors show a rescue of the S-phase phenotype using an siRNA-resistant cDNA, and show similar S-phase defects using an acute knock out approach with lentiviral gRNA/Cas9 delivery. 

      We thank the Reviewer for this comment. Please refer to the initial response to the three Reviewers, where we discuss our use of single siRNAs and our results in multiple cell lines. Briefly, we can recapitulate the G1 arrest upon FAM53C knock-down using two independent siRNAs in RPE-1 cells. We also observe the same G1 arrest in p53 knockout cells, suggesting it is not due to a non-specific stress response. In addition, the arrest is dependent on RB, which fits with the genetic and biochemical data placing FAM53C upstream of RB, further supporting a specific phenotype. Human cancer cell lines also arrest in G1 upon FAM53C knock-down, not just RPE-1 cells. Finally, we hope the Reviewer will agree with us that compensatory mechanisms are very common in the cell cycle – which may explain the lack of phenotypes in vivo or upon long-term knockout of FAM53C.

      (2) The S-phase phenotype following FAM53C should be demonstrated in a larger variety of TP53WT and mutant cell lines. Given that this paper introduces a new G1/S control element, I think this is important for credibility. Ideally, this should be done with acute gRNA/Cas9 gene deletion using a lentiviral delivery system; but if the siRNA rescue experiments work and validate an on-target effect, siRNA would be an appropriate alternative. 

      We now show data with three cancer cell lines (U2OS, A549, and HCT-116 – Fig. S1E,F and Fig. 4F), in addition to our results in RPE-1 cells and in human cortical organoids. We note that the knock-down experiments are complemented by overexpression data (Fig. 1G-I), by genetic data (our original DepMap screen), and our biochemical data (showing direct binding of FAM53C to DYRK1A).

      (3) The western blot images shown in the MS appear heavily over-processed and saturated (See for example S4B, 4A, B, and E). Perhaps the authors should provide the original un-processed data of the entire gels? 

      For several of our panels (e.g., 4E and S4B, now panels S3J and S3K)), we used a true “immunoassay” (as indicated in the legend – not an immunoblot), which is much more quantitative and avoids error-prone steps in standard immunoblots (“Western blots”). Briefly, this system was developed by ProteinSimple. It uses capillary transfer of proteins and ELISA-like quantification with up to 6 logs of dynamic range (see their web site https://www.proteinsimple.com/wes.html). The “bands” we show are just a representation of the luminescence signals in capillaries. We made sure to further clarify the figure legends in the revised manuscript.

      Data in 4A are also not a western blot but a radiograph.

      For immunoblots, we will provide all the source data with uncropped blots with the final submission.

      (4) A critical experiment for the proposed mechanism is the rescue of the FAM53C S-phase reduction using DYRK1A inhibition shown in Figure 4. The legend here states that the data were extracted from BrdU incorporation assays, but in Figure S4D only the PI histograms are shown, and the S-phase population is not quantified. The authors should show the BrdU scatterplot and quantify the phenotype using the S-phase population in these plots. G1 measurements from PI histograms are not precise enough to allow for conclusions. Also, why are the intensities of the PI peaks so variable in these plots? Compare, for example, the HCT116 upper and lower panels where the siRNA appears to have caused an increase in ploidy. 

      We apologize for the confusion and we fixed these errors, for most of the analyses, we used PI to measure G1 and S-phase entry. We added relevant flow cytometry plots to supplemental figures (Fig. S1G, H, I, as well as Fig. S4E and S4K, and Fig. S5F).

      (5) There's an apparent contradiction in how RB deletion rescues the G1 arrest (Figure 2) while p21 seems to maintain the arrest even when DYRK1A is inhibited. Is p21 not induced when FAM53C is depleted in RB ko cells? This should be measured and discussed. 

      This comment and comments from the two other Reviewers made us reconsider our model. We re-read carefully the Meyer paper and think that DYRK1A activity may be understood when considering levels of both CycD and p21 at the same time in a continuum (as was nicely showed in a previous study from the lab of Tobias Meyer – Chen et al., Mol Cell, 2013). While our genetic and biochemical data support a role for FAM53C in DYRK1A inhibition, it is obvious that the regulation of cell cycle progression by FAM53C is not exclusively due to this inhibition. As discussed above and below, we noted an upregulation of p21 upon FAM53C knock-down, and activation of p53 and its targets likely contributes significantly to the phenotypes observed. We added new experiments to support this more complex model (Figure 4 and Figure S4, with new model in S4L).

      Reviewer #3 (Significance): 

      In conclusion, I believe that this MS could potentially be important for the cell cycle field and also provide a new target pathway that could be relevant for cancer therapy. However, the paper has quite a few gaps and inconsistencies that need to be addressed with further experiments. My main worry is that the acute depletion phenotypes appear so strong, while the gene is nonessential in mice and shows only a minor fitness effect in the depmap screens. More convincing controls are necessary to rule out experimental artefacts that misguide the interpretation of the results.

      We appreciate this comment and hope that the Reviewer will agree it is still important to share our data with the field, even if the phenotypes in mice are modest.

    1. eLife Assessment

      This fundamental work examines how tRNA modifications influence antibiotic tolerance, providing novel insights that may have therapeutic uses. The evidence supporting the conclusions is convincing. Strengths of the manuscript include the mechanism of tRNA modification influencing antibiotic tolerance and the precise measurement techniques used throughout. Further analysis of growth rate impacts and specific identification of the proteins responsible for the effect would further strengthen the manuscript.

    2. Reviewer #1 (Public review):

      Summary:

      Cotton et al. investigated the role of tusB in antibiotic tolerance in Yersinia pseudotuberculosis. They used the IP2226 strain and introduced appropriate mutations and complementation constructs. Assays were performed to measure growth rates, antibiotic tolerance, tRNA modification, gene expression and proteomic profiles. In addition, experiments to measure ribosome pausing and bioinformatic analysis of codon usage in ribosomal proteins provided in-depth mechanistic support for the conclusions.

      Strengths:

      The findings are consistent with the authors having uncovered new mechanistic insights into bacterial antibiotic tolerance mediated by reducing ribosomal protein abundance.

      Weaknesses:

      Since the WT strain grows faster than the tusB mutant, there is a question of how growth rate, per se, impacts some of the analysis done. The authors should address this issue. In addition, it may not be essential, but would analysis of another slow-growing mutant (in some other antibiotic tolerance pathway if available) serve as a good control in this context?

    3. Reviewer #2 (Public review):

      Summary:

      This study addresses a critical clinical challenge-bacterial antibiotic tolerance (a key driver of treatment failure distinct from genetic resistance)-by uncovering a novel regulatory role of the conserved s2U tRNA modification in Yersinia pseudotuberculosis. Its strengths are notable and lay a solid foundation for understanding phenotypic drug tolerance. The study is the first to link s2U tRNA modification loss to antibiotic tolerance, specifically targeting translation/transcription-inhibiting antibiotics (doxycycline, gentamicin, rifampicin). By establishing a causal chain - s2U deficiency → codon-specific ribosome pausing (at AAA/CAA/GAA) → reduced ribosomal protein translation → global translational suppression → tolerance - it expands the functional landscape of tRNA modifications beyond canonical translation fidelity, filling a gap in how RNA epigenetics shapes bacterial stress adaptation.

      Strengths:

      This study makes a valuable contribution to understanding tRNA modification-mediated antibiotic tolerance.

      Weaknesses:

      There are several limitations that weaken the robustness of the study's mechanistic conclusions. Addressing these gaps would significantly enhance its impact and translational potential.

    4. Reviewer #3 (Public review):

      Summary:

      In the manuscript of Cotten et al., the authors study the 2-thiolation of tRNA in bacterial antibiotic resistance. The wildtype organism, Yersinia pseudotuberculosis, downregulates 2-thiolation as a response to antibiotics targeting the ribosome. In this manuscript, the authors show that a knockout of tusB causes slower translation. They provide evidence on the mechanisms of the slowing by determining transcription and translation, ribosome profiling and performing codon-usage analysis. They successfully determined that 2 codons are drivers of the translation slowdown, and the data is highly conclusive. Technically, I have nothing to criticize.

      Strengths:

      All in all, the study is very well made, and the writing is clear and concise. It covers a wide array of state-of-the-art analyses to unravel the interplay of tRNA modifications in translation.

      Weaknesses:

      The only question that remains to be asked is why the slowed translation leads to a better survival of the bacteria under antibiotic stress. In my opinion, the mechanism itself remains unclear. Thus, the statement that "We expect that this reduction in ribosomal proteins is globally reducing the translational capacity of the cell and is responsible for inducing tolerance to ribosome and RNA polymerase-targeting antibiotics" does not truly emphasize the remaining open question of why slowed translation favors survival. Therefore, I would recommend a minor text revision.

    5. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      Cotton et al. investigated the role of tusB in antibiotic tolerance in Yersinia pseudotuberculosis. They used the IP2226 strain and introduced appropriate mutations and complementation constructs. Assays were performed to measure growth rates, antibiotic tolerance, tRNA modification, gene expression and proteomic profiles. In addition, experiments to measure ribosome pausing and bioinformatic analysis of codon usage in ribosomal proteins provided in-depth mechanistic support for the conclusions. 

      Strengths: 

      The findings are consistent with the authors having uncovered new mechanistic insights into bacterial antibiotic tolerance mediated by reducing ribosomal protein abundance. 

      Weaknesses: 

      Since the WT strain grows faster than the tusB mutant, there is a question of how growth rate, per se, impacts some of the analysis done. The authors should address this issue. In addition, it may not be essential, but would analysis of another slow-growing mutant (in some other antibiotic tolerance pathway if available) serve as a good control in this context? 

      We would like to thank the reviewer for their time spent reviewing our manuscript and for their positive review. We plan to address their comment as to how growth rate impacts the analyses and plan to incorporate another slow-growing mutant in the revised version of the manuscript.

      Reviewer #2 (Public review): 

      Summary: 

      This study addresses a critical clinical challenge-bacterial antibiotic tolerance (a key driver of treatment failure distinct from genetic resistance)-by uncovering a novel regulatory role of the conserved s2U tRNA modification in Yersinia pseudotuberculosis. Its strengths are notable and lay a solid foundation for understanding phenotypic drug tolerance. The study is the first to link s2U tRNA modification loss to antibiotic tolerance, specifically targeting translation/transcription-inhibiting antibiotics (doxycycline, gentamicin, rifampicin). By establishing a causal chain - s2U deficiency → codon-specific ribosome pausing (at AAA/CAA/GAA) → reduced ribosomal protein translation → global translational suppression → tolerance - it expands the functional landscape of tRNA modifications beyond canonical translation fidelity, filling a gap in how RNA epigenetics shapes bacterial stress adaptation. 

      Strengths: 

      This study makes a valuable contribution to understanding tRNA modification-mediated antibiotic tolerance. 

      Weaknesses: 

      There are several limitations that weaken the robustness of the study's mechanistic conclusions. Addressing these gaps would significantly enhance its impact and translational potential. 

      We would like to thank the reviewer for their time spent reviewing our manuscript, and for both their positive comments about the significance and novelty of this work as well as their critiques. We plan to address their specific recommendations in the revised manuscript by focusing on the contribution of specific ribosomal proteins (i.e. the 30S subunit protein, S13) through overexpression, codon replacement, and stability experiments. We also plan to design experiments to assess in vivo relevance and assess possible impacts on other pathways involved in antibiotic tolerance.

      Reviewer #3 (Public review): 

      Summary: 

      In the manuscript of Cotten et al., the authors study the 2-thiolation of tRNA in bacterial antibiotic resistance. The wildtype organism, Yersinia pseudotuberculosis, downregulates 2-thiolation as a response to antibiotics targeting the ribosome. In this manuscript, the authors show that a knockout of tusB causes slower translation. They provide evidence on the mechanisms of the slowing by determining transcription and translation, ribosome profiling and performing codon-usage analysis. They successfully determined that 2 codons are drivers of the translation slowdown, and the data is highly conclusive. Technically, I have nothing to criticize. 

      Strengths: 

      All in all, the study is very well made, and the writing is clear and concise. It covers a wide array of state-of-the-art analyses to unravel the interplay of tRNA modifications in translation. 

      Weaknesses: 

      The only question that remains to be asked is why the slowed translation leads to a better survival of the bacteria under antibiotic stress. In my opinion, the mechanism itself remains unclear. Thus, the statement that "We expect that this reduction in ribosomal proteins is globally reducing the translational capacity of the cell and is responsible for inducing tolerance to ribosome and RNA polymerase-targeting antibiotics" does not truly emphasize the remaining open question of why slowed translation favors survival. Therefore, I would recommend a minor text revision. 

      We would like to thank the reviewer for their time spent reviewing our manuscript and for their positive review of the technical aspects, experimental design, and writing. We will incorporate their suggested text revision into the revised manuscript, and will add to this statement if additional planned experiments shed light on this remaining question.

    1. eLife Assessment

      This valuable study examines how mammals descend effectively and securely along vertical substrates. The conclusions from comparative analyses based on behavioral data and morphological measurements collected from 21 species across a wide range of taxa are convincing, making the work of interest to all biologists studying animal locomotion.

    2. Reviewer #1 (Public review):

      Summary:

      This unique study reports original and extensive behavioral data collected by the authors on 21 living mammal taxa in zoo conditions (primates, tree shrew, rodents, carnivorans, and marsupials) on how descent along a vertical substrate can be done effectively and securely using gait variables. Ten morphological variables reflecting head size and limb proportions are examined in relationship to vertical descent strategies and then applied to reconstruct modes of vertical descent in fossil mammals.

      Strengths:

      This is a broad and data-rich comparative study, which requires a good understanding of the mammal groups being compared and how they are interrelated, the kinematic variables that underlie the locomotion used by the animals during vertical descent, and the morphological variables that are associated with vertical descent styles. Thankfully, the study presents data in a cogent way with clear hypotheses at the beginning, followed by results and a discussion that addresses each of those hypotheses using the relevant behavioral and morphological variables, always keeping in mind the relationships of the mammal groups under investigation. As pointed out in the study, there is a clear phylogenetic signal associated with vertical descent style. Strepsirrhine primates much prefer descending tail first, platyrrhine primates descend sideways when given a choice, whereas all other mammals (with the exception of the raccoon) descend head first. Not surprisingly, all mammals descending a vertical substrate do so in a more deliberate way, by reducing speed, and by keeping the limbs in contact for a longer period (i.e., higher duty factors).

    3. Reviewer #2 (Public review):

      Summary:

      This paper contains kinematic analyses of a large comparative sample of small to medium-sized arboreal mammals (n = 21 species) traveling on near-vertical arboreal supports of varying diameter. This data is paired with morphological measures from the extant sample to reconstruct potential behaviors in a selection of fossil euarchontaglires. This research is valuable to anyone working in mammal locomotion and primate evolution.

      Strengths:

      The experimental data collection methods align with best research practices in this field and are presented with enough detail to allow for reproducibility of the study as well as comparison with similar datasets. The four predictions in the introduction are well aligned with the design of the study to allow for hypothesis testing. Behaviors are well described and documented, and Figure 1 does an excellent job in conveying the variety of locomotor behaviors observed in this sample. I think the authors took an interesting and unique angle by considering the influence of encephalization quotient on descent and the experience of forward pitch in animals with very large heads.

      Comment from the Reviewing Editor on the revised version:

      The authors responded to many comments of the reviewers, and I would be happy to see the authors make this version the Version of Record.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment:

      This valuable study examines how mammals descend effectively and securely along vertical substrates. The conclusions from comparative analyses based on behavioral data and morphological measurements collected from 21 species across a wide range of taxa are convincing, making the work of interest to all biologists studying animal locomotion.

      We would like to greatly thank the two reviewers for their time in reviewing this work, and for their valuable comments and suggestions that will help to improve this manuscript.

      Overall, we agree with the weaknesses raised, which are mainly areas for consideration in future studies: to study more species, and in a natural habitat context.

      We will nevertheless add a few modifications to improve the manuscript, notably by making certain figures more readable, and adding definitions and bibliography in the main text concerning gait characteristics.

      We also provide brief comments on each point of weakness raised by the reviewers below, in blue.

      Reviewer #1 (Public review):

      Summary:

      This unique study reports original and extensive behavioral data collected by the authors on 21 living mammal taxa in zoo conditions (primates, tree shrew, rodents, carnivorans, and marsupials) on how descent along a vertical substrate can be done effectively and securely using gait variables. Ten morphological variables reflecting head size and limb proportions are examined in relationship to vertical descent strategies and then applied to reconstruct modes of vertical descent in fossil mammals.

      Strengths:

      This is a broad and data-rich comparative study, which requires a good understanding of the mammal groups being compared and how they are interrelated, the kinematic variables that underlie the locomotion used by the animals during vertical descent, and the morphological variables that are associated with vertical descent styles. Thankfully, the study presents data in a cogent way with clear hypotheses at the beginning, followed by results and a discussion that addresses each of those hypotheses using the relevant behavioral and morphological variables, always keeping in mind the relationships of the mammal groups under investigation. As pointed out in the study, there is a clear phylogenetic signal associated with vertical descent style. Strepsirrhine primates much prefer descending tail first, platyrrhine primates descend sideways when given a choice, whereas all other mammals (with the exception of the raccoon) descend head first. Not surprisingly, all mammals descending a vertical substrate do so in a more deliberate way, by reducing speed, and by keeping the limbs in contact for a longer period (i.e., higher duty factors).

      Weaknesses:

      The different gait patterns used by mammals during vertical descent are a bit more difficult to interpret. It is somewhat paradoxical that asymmetrical gaits such as bounds, half bounds, and gallops are more common during descent since they are associated with higher speeds and lower duty factors. Also, the arguments about the limb support polygons provided by DSDC vs. LSDC gaits apply for horizontal substrates, but perhaps not as much for vertical substrates.

      We analyzed gait patterns using methods commonly found in the literature and discussed our results accordingly. However, the study of limbs support polygons was indeed developed specifically for studying locomotion on horizontal supports, and may not be applicable for studying vertical locomotion, which is in fact a type of locomotion shared by all arboreal species. In the future, it would be interesting to consider new methods for analyzing vertical gaits.

      The importance of body mass cannot be overemphasized as it affects all aspects of an animal's biology. In this case, larger mammals with larger heads avoid descending head-first. Variation in trunk/tail and limb proportions also covaries with different vertical descent strategies. For example, a lower intermembral index is associated with tail-first descent. That said, the authors are quick to acknowledge that the five lemur species of their sample are driving this correlation. There is a wide range of intermembral indices among primates, and this simple measure of forelimb over hindlimb has vital functional implications for locomotion: primates with relatively long hindlimbs tend to emphasize leaping, primates with more even limb proportions are typically pronograde quadrupeds, and primates with relatively long forelimbs tend to emphasize suspensory locomotion and brachiation. Equally important is the fact that the intermembral index has been shown to increase with body mass in many primate families as a way to keep functional equivalence for (ascending) climbing behavior (see Jungers, 1985). Therefore, the manner in which a primate descends a vertical substrate may just be a by-product of limb proportions that evolved for different locomotor purposes. Clearly, more vertical descent data within a wider array of primate intermembral indices would clarify these relationships. Similarly, vertical descent data for other primate groups with longer tails, such as arboreal cercopithecoids, and particularly atelines with very long and prehensile tails, should provide more insights into the relationship between longer tail length and tail-first descent observed in the five lemurs. The relatively longer hallux of lemurs correlates with tail-first descent, whereas the more evenly grasping autopods of platyrrhines allow for all four limbs to be used for sideways descent. In that context, the pygmy loris offers a striking contrast. Here is a small primate equipped with four pincer-like, highly grasping autopods and a tail reduced to a short stub. Interestingly, this primate is unique within the sample in showing the strongest preference for head-first descent, just like other non-primate mammals. Again, a wider sample of primates should go a long way in clarifying the morphological and behavioral relationships reported in this study.

      We agree with this statement. In the future, we plan to study other species, particularly large-bodied ones with varied intermembral indexes.

      Reconstruction of the ancient lifestyles, including preferred locomotor behaviors, is a formidable task that requires careful documentation of strong form-function relationships from extant species that can be used as analogs to infer behavior in extinct species. The fossil record offers challenges of its own, as complete and undistorted skulls and postcranial skeletons are rare occurrences. When more complete remains are available, the entire evidence should be considered to reconstruct the adaptive profile of a fossil species rather than a single ("magic") trait.

      We completely agree with this, and we would like to emphasize that our intention here was simply to conduct a modest inference test, the purpose of which is to provide food for thought for future studies, and whose results should be considered in light of a comprehensive evolutionary model.

      Reviewer #2 (Public review):

      Summary:

      This paper contains kinematic analyses of a large comparative sample of small to medium-sized arboreal mammals (n = 21 species) traveling on near-vertical arboreal supports of varying diameter. This data is paired with morphological measures from the extant sample to reconstruct potential behaviors in a selection of fossil euarchontaglires. This research is valuable to anyone working in mammal locomotion and primate evolution.

      Strengths:

      The experimental data collection methods align with best research practices in this field and are presented with enough detail to allow for reproducibility of the study as well as comparison with similar datasets. The four predictions in the introduction are well aligned with the design of the study to allow for hypothesis testing. Behaviors are well described and documented, and Figure 1 does an excellent job in conveying the variety of locomotor behaviors observed in this sample. I think the authors took an interesting and unique angle by considering the influence of encephalization quotient on descent and the experience of forward pitch in animals with very large heads.

      Weaknesses:

      The authors acknowledge the challenges that are inherent with working with captive animals in enclosures and how that might influence observed behaviors compared to these species' wild counterparts. The number of individuals per species in this sample is low; however, this is consistent with the majority of experimental papers in this area of research because of the difficulties in attaining larger sample sizes.

      Yes, that is indeed the main cost/benefit trade-off with this type of study. Working with captive animals allows for large comparative studies, but there is a risk of variations in locomotor behavior among individuals in the natural environment, as well as few individuals per species in the dataset. That is why we plan and encourage colleagues to conduct studies in the natural environment to compare with these results. However, this type of study is very time-consuming and requires focusing on a single species at a time, which limits the comparative aspect.

      Figure 2 is difficult to interpret because of the large amount of information it is trying to convey.

      We agree that this figure is dense. One possible solution would be to combine species by phylogenetic groups to reduce the amount of information, as we did with Fig. 3 on the dataset relating to gaits. However, we believe that this would be unfortunate in the case of speed and duty factor because we would have to provide the complete figure in SI anyway, as the species-level information is valuable. We therefore prefer to keep this comprehensive figure here and we will enlarge the data points to improve their visibility, and provide the figure with a sufficiently high resolution to allow zooming in on the details.

      Reviewer #1 (Recommendations for the authors):

      As indicated in the first section above, this is a strong comparative study that addresses important questions, relative to the evolution of arboreal locomotion in primates and close mammal relatives. My recommendations should be taken in the context of improving a manuscript that is already generally acceptable.

      (1) The terms symmetrical and asymmetrical gaits should be briefly defined in the main text (not just in the Methods section) by citing work done by Hildebrand and other relevant studies. To that effect, the statement on lines 96-97 about the convergence of symmetrical gaits is unclear. What does "Symmetrical gaits have evolved convergently in rodents, scandentians, carnivorans, and marsupials" mean? Symmetrical gaits such as the walk, run, trot, etc., are pretty the norm in most mammals and were likely found in metatherians and basal eutherians. This needs clarification. On line 239, the term "ambling" is used in the context of related asymmetrical gaits. To be clear, the amble is a type of running gait involving no whole-body aerial phase and is therefore a symmetrical gait (see Schmitt et al., 2006).

      We have added a definition of the terms symmetrical and asymmetrical gaits and added references in the introduction such as: “Symmetrical gaits are defined as locomotor patterns in which the footfalls of a girdle (a pair of fore- or hindlimbs) are evenly spaced in time, with the right and left limbs of a pair of limbs being approximately 50% out of phase with each other (Hildebrand, 1966, 1967). Symmetrical gaits can be further divided into two types: diagonal-sequence gaits, in which a hindlimb footfall is followed by that of the contralateral forelimb, and lateral-sequence gaits, in which a hindlimb footfall is followed by that of the ipsilateral forelimb (Hildebrand, 1967; Shapiro and Raichlen, 2005; Cartmill et al., 2007b). In contrast, asymmetrical gaits are characterized by unevenly spaced footfalls within a girdle, with the right and left limbs moving in near synchrony (Hildebrand, 1977).” Now found in lines 87-94.

      We corrected the sentence such as “Symmetrical gaits are also common in rodents, scandentians, etc..” Now found in line 107.

      Thank you for pointing this out. We indeed did not use the right term to mention related asymmetrical gaits with increased duty factors. We removed the term « ambling » and the associated reference here. Now found in line 256.

      (2) Correlations are used in the paper to examine how brain mass scales with body mass. It is correct to assume that a correlation significantly different from 0 is indicative of allometry (in this case, positive). That said, lines are used in Figure S2 that go through the bivariate scatter plot. The vast majority of scaling studies rely on regression techniques to calculate and compare slopes, which are different statistically from correlations. In this case, a slope not significantly different from 1.0 would support the hypothesis of isometry based on geometric similarity (as brain mass and body mass are two volumes). The authors could refer to the work of Bob Martin and the 1985 edited book by Jungers and contributions therein. These studies should also be cited in the paper.

      Thank you for recommending us this better suited method. We replaced the correlations with major axis orthogonal regressions, as recommended by Martin and Barbour 1989. We found a positive slope for all species significantly different from 1 (0.36), indicating a negative allometry (we realized we were mistaken about the allometry terminology, initially reporting a “positive allometry” instead of a positive correlation).

      We corrected in the manuscript in the Results and Methods sections, and cited Martin and Barbour 1989 such as:

      “To ensure that the EQs of the different species studied are comparable and meaningful, we tested the allometry between the brain and body masses in our dataset following [84] and found a significant and positive slope for all species (major axis orthogonal regression on log transformed values: slope = 0.36, r<sup>2</sup> = 0.92, p = 5.0.10<sup>-12</sup>), indicating a negative allometry (r = 0.97, df = 19, p = 2.0.10<sup>-13</sup>), and similar allometric coefficients when restricting the analysis to phylogenetic groups (Fig. S2).” Now found in lines 289-298.

      - “To control that brain allometry is homogeneous among all phylogenetic groups, to be able to compare EQ between species, we computed major axis orthogonal regressions, following the recommendation of Martin and Barbour [84], between the Log transformed brain and body masses, over all species and by phylogenetic group using the sma package in R (Fig. S2).” Now found in lines 336-338.

      We also changed Figure S2 in Supplementary Information accordingly.

      (3) Trunk length is used as the denominator for many of the indices used in the study. In this way, trunk length is considered to be a proxy for body size. There should be a demonstration that trunk length scales isometrically with body mass in all of the mammals compared. If not the case, some of the indices may not be directly comparable.

      We did not use trunk length as a proxy for body mass, but to compute geometric body proportions in order to test whether intrinsic body proportions could be related to vertical descent behaviors, namely the length of the tail and of the fore- and hindlimbs relative to the animal. We chose those indices to quantify the capability of limbs to act as levers or counterweights to rotate the animals for this specific question of vertical descent behavior. We therefore do not think that body mass allometry with respect to trunk length is relevant to compare these indices across species here. Also, we don’t expect that trunk length (which is a single dimension) would scale isometrically with body mass, which scales more as a volume.

      (4) Given the numerous comparisons done in this study, a Bonferroni correction method should be considered to mitigate type I error (accepting a false positive).

      We had already corrected all our statistical tests using the Benjamini-Hochberg method to control for false positives; see the SuppTables Excel file for the complete results of the statistical analyses. We chose this method over the Bonferroni correction because the more modern and balanced Benjamini-Hochberg procedure is better suited for analyses involving a large number of hypotheses.

      (5) The terms "arm" and "leg" used in the main text and Table 1 are anatomically incorrect. Instead, the terms "forelimb" and hindlimb" should be used as they include the length sum of the stylopod, zeugopod, and autopod.

      Indeed, thank you for pointing that out. We have corrected this error within the manuscript as well as in the figures 4 and S3.

      (6) On p. 14, the authors make the statement that the postcranial anatomy of Adapis and Notharctus remains undescribed. The authors should consult the work of Dagosto, Covert, Godinot and others.

      We did not state that the postcranial remains of Adapis and Notharctus have not been described. However, we were unfortunately unable to find published illustrations of the known postcranial elements that could be reliably used in this study. To avoid any misunderstanding, we removed the sentence such as: “However, we could not find suitable illustrations of the known postcranial elements of these species in the literature that could be reliably incorporated into this study. Thus, we only included their reconstructed body mass and EQ,..”. Now found in lines 393-397.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 65/69 - Perchalski et al. 2021 is a single-author publication, so no et al. or w/ colleagues.

      Indeed. This has been corrected in the manuscript, now found in lines 65 and 70.

      (2) Lines 96-98 - Is it appropriate to say that the use of symmetrical gaits are examples of convergent evolution? There's less burden of evidence to state that these are shared behaviors, rather than suggesting they independently evolved across all those groups.

      We agree with this and corrected the sentence such as “Symmetrical gaits are also common in rodents, scandentians, etc..” Now found in line 107.

      (3) Line 198 - I am confused by how to interpret (-16,36 %) compared to how other numbers are presented in the rest of the paragraph.

      To avoid confusion, we rephrased this sentence such as: “In contrast, primates did not significantly reduce their speed compared to ascents when descending sideways or tail-first (Fig. 2A, SuppTables B).”  Now found in lines 207-209.

    1. eLife Assessment

      This valuable study identifies asymmetric dimethylarginine (ADMA) modification of histones as a potential key determinant of the initial genomic binding of Rhino, a Drosophila-specific chromatin protein essential for piRNA cluster specification. The authors provide correlative genomic and imaging data to support their model, although functional validation of the proposed mechanism remains incomplete. Testing the redundancy between dART4 and dART1, which together could affect the prominent piRNA loci, in addition to the minor ones investigated in the manuscript, may change our assessment.

    2. Reviewer #2 (Public review):

      The Revision title and abstract are not updated enough to distinguish the special niche piRNA clusters from the more prominent major dual strand piRNA clusters that are widely known in the field for Drosophila, like 42AB and 38C. This revision mainly adds the term "piRNA source loci (piSL)" that is too vague and not a well-accepted name that would distinguish just these particularly niche piRNA clusters from major dual strand piRNA clusters like 42AB and 38C. This piSL term is problematic because it seems to imply these piSL's are connected to or would eventually become major dual strand piRNA clusters, but there is zero evidence in this study for any genetic or evolutionary connection between these two distinct types of piRNA sources. This revision still lacks the necessary changes needed to point out like in the abstract that major dual strand piRNA clusters like 42AB, 38C, 80F, and 102F in Drosophila that make up the bulk of piRNAs cannot be shown to be impacted by changes aimed at depleting ADMA-histones from these loci, and the authors' current evidence is still only limited to showing in these few 'niche' piRNA clusters that ADMA-histones may exhibit a direct interaction with Rhino as supported only by the knockdown of Drosophila Art4.

      The author's rebuttal letter argues that 42AB and 38C are just conserved piRNA clusters that may no longer be regulated by ADMA. This is still a weak claim for dismissing the potential genetic redundancy problem when this study can only report strong knockdown of Art4. First, the dual strand 42AB piRNA cluster's conservation as a Drosophilid piRNA cluster is actually still a relatively recent evolutionary innovation in just D.simulans and D.melanogaster that are less than 3MYA diverged. This 42AB cluster is no longer conserved in D.sechelia and is also younger than the uni-strand Flamenco piRNA cluster that is conserve to 7MYA. The evolutionary arguments by the authors are not well-grounded. Second, the 42AB and 38C are the largest major dual strand piRNA clusters with very significant localization of Rhino and impact from Rhino loss of function, and if this paper's central thesis is that ADMA-histones directed by Art1 or Art4 is critical for the expression of dual-strand piRNA cluster loci by impacting Rhino, the current data still remain weak with no new experiments to help bolster their claims.

      The author's rebuttal letter argues that the challenges they faced in trying to knock down Art1 in the fly was thwarted by reagent issues, and the explanations are unsatisfactory. They claim they only tested two RNAi cross lines to try to knock down Art1: the strain BDSC #36891, y[1] sc[*] v[1] sev[21]; P{y[+t7.7], v[+t1.8]=TRiP.GL01072}attP2/TM3, Sb[1] that they said they could not obtain this strain to be alive from the stock center? And then testing an alternative line VDRC #v110391P{KK101196}VIE-260B that displayed mediocre knockdown, the authors seemed to suggest they have given up trying to make this very important experiment work? They should have tried to figure out with the BDSC, a venerable stock center for Drosophila genetic tools, why they could not receive that fly strain alive (shipping flies at the economy rate internationally may be cheaper but often is too strenuous for flies to survive), and the authors have not acknowledged testing two other available knockdown lines for Art1: BDSC #31348, y[1] v[1]; P{y[+t7.7] v[+t1.8]=TRiP.JF01306}attP2 dsRNA and VDRC #w1118 P{GD11959}v40388. Trying to get good knockdown of Art1 would be a critical must-have experiment to address whether this arginine methyltransferase has an in vivo impact on ADMA-histones in the Drosophila ovary and showing an impact on 42AB and 38C. The revision does not address this major deficiency in impact on these two major dual strand piRNA clusters, only the very few niche piRNA clusters that are responsive to Art4 knockdown.

      The rebuttal letter argues that "Therefore, conserved clusters such as 42AB and 38C may no longer be regulated by ADMA." but then the revision discussion is still speculating much too wildly that the piRNA source loci are then precursors for the eventual large piRNA clusters of 42AB and 38C. This renaming of the term piRNA source loci and the model in Fig. 7C is still misleading because 42AB and 38C are the main largest dual-strand piRNA clusters, and the pictures depict the ADMA-histones as recruiting Rhino and then Kipferl at a piRNA cluster. The term "piRNA source loci" does not sound distinct enough to separate it from the main piRNA clusters of 42AB and 38C, and I had suggested calling them 'niche piRNA clusters' to denote they are very special and distinct to only be responsive to Drosophila Art4 knockdown.

      In regards to the revision's changing of gene names, the convention for gene names is to use the previous name designation. Rather than calling the gene DART1, the conventional name of this gene in Flybase is Art1 (CG6554). There is the same problem with using the new name DART4 when in Flybase the gene is called Art4 (CG5358). Alternatively, the authors should clarify the re-naming up front and make it consistent with Drosophila genetics nomenclature, perhaps dArt1 or dArt4 would be more appropriate.

    3. Reviewer #3 (Public review):

      Summary:

      This study investigates how Rhino, a chromatin-associated HP1-family protein essential for germline piRNA biogenesis in Drosophila, is initially recruited to specific genomic loci. Although canonical dual-strand piRNA clusters such as 42AB, 38C, 80F, and 102F produce the majority of germline piRNAs, the mechanisms guiding Rhino to these regions remain poorly understood. To explore the earliest steps of Rhino loading, the authors use a doxycycline-inducible Rhino transgene in OSC cells, a system that expresses only the primary Piwi pathway and therefore provides an experimentally accessible, epigenetically naïve context distinct from the endogenous germline environment. Through a combination of inducible Rhino expression, knockdown of selected Drosophila PRMTs (DARTs), ChIP-seq, small RNA sequencing, and imaging, the authors propose that asymmetric arginine-methylated histones, particularly those deposited by DART4, contribute to defining initial sites of Rhino association. They identify a subset of Rhino-bound loci, termed DART4-dependent piRNA source loci (piSL), which lose Rhino, Kipferl, and piRNA production upon DART4 depletion and may represent nascent or transitional piRNA clusters. Overall, the study provides intriguing evidence for a link between ADMA histone marks and de novo Rhino recruitment, particularly in the simplified OSC context, and offers new candidate loci for further exploration of early piRNA-cluster chromatin dynamics.

      Strengths:

      This study offers important insights into how asymmetric dimethylarginine (ADMA) histone marks contribute to the initial recruitment of Rhino, a Drosophila HP1-family protein essential for dual-strand piRNA cluster specification. Using an integrative approach that includes ectopic expression of a Rhino transgene in OSC cells, germline knockdown of DART4 in Drosophila ovaries, ChIP-seq, small RNA-seq, and imaging, the authors show that ADMA marks particularly H3R17me2a and H4R3me2acorrelate with Rhino binding at the boundaries of canonical piRNA clusters and at DART4-dependent piRNA source loci (piSL). These piSL may represent nascent or transitional piRNA-generating regions. Overall, the dataset presented here provides a valuable resource for understanding the chromatin features associated with the emergence and maturation of piRNA clusters.

      Weaknesses:

      Despite the strengths of the study, several important limitations remain. Although Rhino binding correlates with ADMA-enriched boundaries, the data do not directly demonstrate that these histone marks are required for Rhino spreading, leaving the mechanistic relationship correlative rather than causal. The DART4-dependent piRNA source loci identified here produce only low levels of piRNAs, and their functional contribution remains uncertain. In addition, redundancy among DART family methyltransferases remains unresolved: only DART4 was tested in the germline, and effective knockdown of DART1 or other DARTs could not be achieved, limiting the ability to evaluate whether ADMA-histones more broadly regulate Rhino recruitment at canonical clusters. Consequently, the current dataset primarily supports DART4-dependent effects at a small subset of evolutionarily young loci, and both the model and the title may overstate the generality of this mechanism across the full repertoire of dual-strand piRNA clusters.

      In conclusion, this study is carefully executed and puts forward compelling hypotheses regarding the early chromatin environment that may underlie piRNA cluster formation. The findings will be relevant to researchers interested in genome regulation, small RNA biology, and chromatin-mediated transposon control.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1(Public review):

      Summary:

      In this study, the authors aim to understand how Rhino, a chromatin protein essential for small RNA production in fruit flies, is initially recruited to specific regions of the genome. They propose that asymmetric arginine methylation of histones, particularly mediated by the enzyme DART4, plays a key role in defining the first genomic sites of Rhino localization. Using a combination of inducible expression systems, chromatin immunoprecipitation, and genetic knockdowns, the authors identify a new class of Rhinobound loci, termed DART4 clusters, that may represent nascent or transitional piRNA clusters.

      Strengths:

      One of the main strengths of this work lies in its comprehensive use of genomic data to reveal a correlation between ADMA histones and Rhino enrichment at the border of known piRNA clusters. The use of both cultured cells and ovaries adds robustness to this observation. The knockdown of DART4 supports a role for H3R17me2a in shaping Rhino binding at a subset of genomic regions.

      Weaknesses:

      However, Rhino binding at, and piRNA production from, canonical piRNA clusters appears largely unaffected by DART4 depletion, and spreading of Rhino from ADMArich boundaries was not directly demonstrated. Therefore, while the correlation is clearly documented, further investigation would be needed to determine the functional requirement of these histone marks in piRNA cluster specification.

      The study identify piRNA cluster-like regions called DART4 clusters. While the model proposes that DART4 clusters represent evolutionary precursors of mature piRNA clusters, the functional output of these clusters remains limited. Additional experiments could help clarify whether low-level piRNA production from these loci is sufficient to guide Piwi-dependent silencing.

      In summary, the authors present a well-executed study that raises intriguing hypotheses about the early chromatin context of piRNA cluster formation. The work will be of interest to researchers studying genome regulation, small RNA pathways, and the chromatin mechanisms of transposon control. It provides useful resources and new candidate loci for follow-up studies, while also highlighting the need for further functional validation to fully support the proposed model.

      We sincerely thank Reviewer #1 for the thoughtful and constructive summary of our work. We appreciate the reviewer’s recognition that our study provides a comprehensive analysis of the relationship between ADMA-histones and Rhino localization, and that it raises intriguing hypotheses about the early chromatin context of piRNA cluster formation.

      We fully agree with the reviewer that our data primarily demonstrate correlation between ADMA-histones and Rhino localization, rather than direct causation. In response, we have carefully revised the text throughout the manuscript to avoid overstatements implying causality (details provided below).

      We also acknowledge the reviewer’s important point that the functional requirement of ADMA-histones for piRNA clusters specification remains to be further established. We have now added the discussion about our experimental limitations (page 18).

      Overall, we have revised the manuscript to present our findings more cautiously and transparently, emphasizing that our data reveal a correlation between ADMA-histone marks and the initial localization of Rhino, rather than proving a direct mechanistic requirement. We thank the reviewer again for highlighting these important distinctions.

      Reviewer #2 (Public review):

      This study seeks to understand how the Rhino factor knows how to localize to specific transposon loci and to specific piRNA clusters to direct the correct formation of specialized heterochromatin that promotes piRNA biogenesis in the fly germline. In particular, these dual-strand piRNA clusters with names like 42AB, 38C, 80F, and 102F generate the bulk of ovarian piRNAs in the nurse cells of the fly ovary, but the evolutionary significance of these dual-strand piRNA clusters remains mysterious since triple null mutants of these dual-strand piRNA clusters still allows fly ovaries to develop and remain fertile. Nevertheless, mutants of Rhino and its interactors Deadlock, Cutoff, Kipferl and Moonshiner, etc, causes more piRNA loss beyond these dual-strand clusters and exhibit the phenotype of major female infertility, so the impact of proper assembly of Rhino, the RDC, Kipferl etc onto proper piRNA chromatin is an important and interesting biological question that is not fully understood.

      This study tries to first test ectopic expression of Rhino via engineering a Dox-inducible Rhino transgene in the OSC line that only expresses the primary Piwi pathway that reflects the natural single pathway expression the follicle cells and is quite distinct from the nurse cell germline piRNA pathway that is promoted by Rhino, Moonshiner, etc. The authors present some compelling evidence that this ectopic Rhino expression in OSCs may reveal how Rhino can initiate de novo binding via ADMA histone marks, a feat that would be much more challenging to demonstrate in the germline where this epigenetic naïve state cannot be modeled since germ cell collapse would likely ensue. In the OSC, the authors have tested the knockdown of four of the 11 known Drosophila PRMTs (DARTs), and comparing to ectopic Rhino foci that they observe in HP1a knockdown (KD), they conclude DART1 and DART4 are the prime factors to study further in looking for disruption of ADMA histone marks. The authors also test KD of DART8 and CG17726 in OSCs, but in the fly, the authors only test Germ Line KD of DART4 only, they do not explain why these other DARTs are not tested in GLKD, the UAS-RNAi resources in Drosophila strain repositories should be very complete and have reagents for these knockdowns to be accessible.

      The authors only characterize some particular ADMA marks of H3R17me2a as showing strong decrease after DART4 GLKD, and then they see some small subset of piRNA clusters go down in piRNA production as shown in Figure 6B and Figure 6F and Supplementary Figure 7. This small subset of DART4-dependent piRNA clusters does lose Rhino and Kipferl recruitment, which is an interesting result.

      However, the biggest issue with this study is the mystery that the set of the most prominent dual-strand piRNA clusters. 42AB, 38C, 80F, and 102F, are the prime genomic loci subjected to Rhino regulation, and they do not show any change in piRNA production in the GLKD of DART4. The authors bury this surprising negative result in Supplementary Figure 5E, but this is also evident in no decrease (actually an n.s. increase) in Rhino association in Figure 5D. Since these main piRNA clusters involve the RDC, Kipferl, Moonshiner, etc, and it does not change in ADMA status and piRNA loss after DART4 GLKD, this poses a problem with the model in Figure 7C. In this study, there is only a GLKD of DART4 and no GLKD of the other DARTs in fly ovaries.

      One way the authors rationalize this peculiar exception is the argument that DART4 is only acting on evolutionarily "young" piRNA clusters like the bx, CG14629, and CG31612, but the lack of any change on the majority of other piRNA clusters in Figure 6F leaves upon the unsatisfying concern that there is much functional redundancy remaining with other DARTs not being tested by GLKD in the fly that would have a bigger impact on the other main dual-strand piRNA clusters being regulated by Rhino and ADMA-histone marks.

      Also, the current data does not provide convincing enough support for the model Figure 7C and the paper title of ADMA-histones being the key determinant in the fly ovary for Rhino recognition of the dual-strand piRNA clusters. Although much of this study's data is well constructed and presented, there remains a large gap that no other DARTs were tested in GLKD that would show a big loss of piRNAs from the main dual-strand piRNA clusters of 42AB, 38C, 80F, and 102F, where Rhino has prominent spreading in these regions.

      As the manuscript currently stands, I do not think the authors present enough data to conclude that "ADMA-histones [As a Major new histone mark class] does play a crucial role in the initial recognition of dual-strand piRNA cluster regions by Rhino" because the data here mainly just show a small subset of evolutionarily young piRNA clusters have a strong effect from GLKD of DART4. The authors could extensively revise the study to be much more specific in the title and conclusion that they have uncovered this very unique niche of a small subset of DART4-dependent piRNA clusters, but this niche finding may dampen the impact and significance of this study since other major dual-strand piRNA clusters do not change during DART4 GLKD, and the authors do not show data GLKD of any other DARTs. The niche finding of just a small subset of DART-4-dependent piRNA clusters might make another specialized genetics forum a more appropriate venue.

      We are deeply grateful to Reviewer #2 for the detailed and insightful review that carefully situates our study in the broader context of Rhino-mediated piRNA cluster regulation. We appreciate the reviewer’s recognition that our inducible Rhino expression system in OSCs provides a valuable model to explore de novo Rhino recruitment under a simplified chromatin environment.

      At the same time, we agree that the current data mainly support a role for DART4 in regulating a subset of evolutionarily young piRNA clusters, and do not demonstrate a requirement for ADMA-histones at the major dual-strand piRNA clusters such as 42AB or 38C. We have therefore revised the title and main conclusions to more accurately reflect the scope of our findings.

      We agree with the reviewer that functional redundancy among DARTs may explain why major dual-strand piRNA clusters are unaffected by DART4 GLKD. Indeed, we have tried DART1 GLKD in the germline, which shows collapse of Rhino foci in OSCs.For DART1 GLKD, two approaches were possible:

      (1) Crossing the BDSC UAS-RNAi line (ID: 36891) with nos-GAL4.

      (2) Crossing the VDRC UAS-RNAi line (ID: 110391) with nos-GAL4 and UAS-Dcr2.

      The first approach was not feasible because the UAS-RNAi line always arrived as dead on arrival (DOA) and could not be maintained in our laboratory. The second approach did not yield effective and stable knockdown (as follows).

      DART8 and CG17726 did not alter Rhino foci in OSC knockdown experiments; therefore, we did not attempt germline knockdown (GLKD) of these DARTs in the ovary.  We agree with the reviewer’s opinion that there are piRNA source loci where Rhino localization depends on DART1, and that simultaneous depletion of multiple DARTs may indeed reveal additional positive results because ADMA-histones such as H3R8me2a may be completely eliminated by the knockdown of multiple DARTs. At the same time, we note that many evolutionarily conserved piRNA clusters show a loss of ADMA accumulation compared with evolutionarily young piRNA clusters, with levels that are comparable to the background input in ChIP-seq reads. Therefore, conserved clusters such as 42AB and 38C may no longer be regulated by ADMA. Even if multiple DARTs function redundantly to regulate ADMA, it may be difficult to disrupt Rhino localization at such conserved piRNA clusters by depletion of DARTs. While disruption of Rhino localization at conserved clusters like 42AB and 38C may be challenging, we cannot exclude the possibility that DART depletion affects Rhino binding at less conserved piRNA clusters, where ADMA modification remains detectable. We added clarifications in the Discussion to acknowledge the potential redundancy with other DARTs and to note that further knockdown experiments in the germline will be necessary to test this model comprehensively (page 18).

      We appreciate the reviewer’s critical feedback, which has helped us refine the message and strengthen the interpretative balance of the paper.

      Reviewer #1 (Recommendations for the authors):

      In multiple places, the link between ADMA histones and Rhino recruitment is presented in terms that imply causality. Please revise these statements to reflect that, in most cases, the evidence supports correlation rather than direct functional necessity. Similarly, statements suggesting that ADMA histones promote Rhino spreading should be revised unless supported by direct evidence.

      We sincerely thank the reviewer for the insightful comments. We recognize that these suggestions are crucial for improving the manuscript, and we have revised it accordingly to address the concerns. The specific revisions we made are detailed below.

      (1) Page 1, line 14: The original sentence “in establishing the sites” was changed to “may establish the potential sites.”

      (2) Page 4, lines 11-12: The original sentence “genomic regions where Rhino binds at the ends and propagates in the areas in a DART4-dependent manner, but not stably anchored” was changed to “genomic regions that have ADMA-histones at their ends and exhibit broad Rhino spreading across their internal regions in a DART4dependent manner”

      (3) Page4, lines 12-15: The original sentence “Kipferl is present at the regions but not sufficient to stabilize Rhino-genomic binding after Rhino propagates.” was changed to “In contrast to authentic piRNA clusters, Kipferl was lost together with Rhino upon DART4 depletion in these regions, suggesting that Kipferl by itself is not sufficient to stabilize Rhino binding; rather, their localization depends on DART4.”

      (4) Page4, lines17-18: The original sentence “are considered to be primitive clusters” was changed to “might be nascent dual-strand piRNA source loci”.

      (5) Page 8, line 7: The original sentence “Involvement of ADMA-histones in the genomic localization of Rhino was implicated.” was changed to “Correlation of ADMA-histones in the genomic localization of Rhino was implicated.”

      (6) Page 8, lines 19-21: The original sentence “These results suggest that ADMAhistones, together with H3K9me3, contribute significantly and specifically to the recruitment of Rhino to the ends of dual-strand clusters in OSCs.” was changed to “These results raise the possibility that ADMA-histones, together with H3K9me3, may contribute specifically to the recruitment of Rhino to the ends of dual-strand clusters in OSCs.”

      (7) Page 10, lines 11-13: The original sentence “These results suggest that DART1 and DART4 are involved in Rhino recruitment at distinct genomic sites through the decreases in ADMA-histones in each of their KD conditions (H4R3me2a and H3R17me2a, respectively).” was changed to ”These results suggest that DART1 and DART4 could contribute to Rhino recruitment at distinct genomic sites through the decreases in ADMA-histones in each of their KD conditions (H4R3me2a and H3R17me2a, respectively).”

      (8) Page 13, line 2: The original sentence “Genomic regions where Rhino spreads in a DART4-dependent manner, but not stably anchored, produce some piRNAs“ was changed to “Genomic regions where Rhino binds broadly in a DART4-dependent manner, but not stably anchored, produce some piRNAs”

      (9) Page 13, lines 21-22: The original sentence “These results support the hypothesis that ADMA-histones are involved in the genomic binding of Rhino both before and after Rhino spreading, resulting in stable genome binding.” was changed to “These results raise the possibility that a subset of Rhino localized to genomic regions correlating with ADMA-histones may serve as origins of spreading.”

      (10) Page 16, lines 6-8: The original sentence “In this study, we took advantage of cultured OSCs for our analysis and found that chromatin marks (i.e., ADMA-histones) play a crucial role in the loading of Rhino onto the genome.” was changed to “In this study, we took advantage of cultured OSCs for our analysis and found that chromatin marks (i.e., bivalent nucleosomes containing H3K9me3 and ADMA-histones) appear to contribute to the initial loading of Rhino onto the genome.”

      (11) Page16, line 12: The original sentence “We propose that the process of piRNA cluster formation begins with the initial loading of Rhino onto bivalent nucleosomes containing H3K9me3 and ADMA-histones (Fig. 7C). In OSCs, the absence of Kipferl and other necessary factors means that Rhino loading into the genome does not proceed to the next step.” was removed.

      Major points

      (1)  Clarify the limited colocalization between Rhino and H3K9me3 in OSCs. The observation that FLAG-Rhino foci show minimal overlap with H3K9me3 in OSCs appears inconsistent with the proposed model by the authors in the discussion, in which Rhino is initially recruited to bivalent nucleosomes bearing both H3K9me3 and ADMA marks. This discrepancy should be addressed. 

      We thank the reviewer’s insightful comments. Indeed, ChIP-seq shows that Rhino partially overlaps with H3K9me3 (Fig. 1F), but immunofluorescence did not reveal any detectable overlap (Fig. 1A). We interpret this discrepancy as arising from the fact that immunofluorescence primarily visualizes H3K9me3 foci that are localized as broad domains in the genome, such as those at centromeres, pericentromeres, or telomeres (named chromocenters), whereas the sharp and interspersed H3K9me3 signals along chromosome arms are difficult to detect by immunofluorescence. We now have these explanations in the revised text (page 6).

      (2)  Please indicate whether the FLAG-Rhino used in OSCs has been tested for functionality in vivo-for example, by rescuing Rhino mutant phenotypes. This is particularly relevant given that no spreading is observed with this construct.

      We thank the reviewer for raising this important point. We have not directly tested the functionality of FLAG-Rhino construct used in OSCs in living Drosophila fly; i.e., it has not been used to rescue Rhino mutant phenotypes in flies. We acknowledge that FLAGRhino has not previously been expressed in OSCs, and that its localization pattern in OSCs differs from that observed in ovaries, where Rhino is endogenously expressed. However, several lines of evidence suggest that the addition of the N-terminal FLAG tag is unlikely to compromise Rhino function

      (1) In previous studies, N-terminally tagged Rhino (e.g., 3xFLAG-V5-Precision-GFPRhino) was expressed in a living Drosophila ovary and was shown to localize properly to piRNA clusters, indicating that the tag does not prevent Rhino from binding its genomic targets (Baumgartner et al., 2022; eLife. Fig. 3 supplement 1G).

      (2) In Drosophila S2 cells, FLAG-tagged tandem Rhino chromodomains construct was shown to bind H3K9me3/H3K27me3 bivalent chromatin, demonstrating that the FLAG tag does not impair this fundamental chromatin interaction (Akkouche et al., 2025; Nat Struct Mol Biol. Fig. 4b).

      (3) GFP-tagged Rhino has been demonstrated to rescue the transposon derepression phenotype of Rhino mutant flies, further supporting that the addition of tags does not abolish its in vivo function. (Parhad et al., 2017; Dev Cell. Fig.1D).

      Therefore, we interpret the partial localization of FLAG-Rhino in OSCs as reflecting the specific chromatin environment and regulatory context of OSCs rather than functional impairment due to the FLAG tag.

      (3) Given the low levels of piRNA production and the absence of measurable effects on transposon expression or fertility upon DART4 knockdown, the rationale for classifying these regions as piRNA clusters should be clearly stated. Additional experiments could help clarify whether low-level piRNA production from these loci is sufficient to guide Piwidependent silencing. The authors should also consider and discuss the possibility that some of these differences may reflect background-specific genomic variation rather than DART4-dependent regulation per see.

      We thank the reviewer for the insightful comments. As noted, DART4 knockdown did not measurably affect transposon expression or fertility. piRNAs generated from DART4associated clusters associate with Piwi but are insufficient for target repression. Although loss of DART4 largely eliminated piRNAs from these clusters, the cluster-derived transcripts themselves were unchanged. To clarify this point, we now refer to these regions as DART4-dependent piRNA-source loci (DART4 piSLs) in the revised text. We also acknowledge that some observed differences may reflect strain-specific genomic variation and have added this caveat on page 16.

      (4)  The authors should describe the genomic context of DART4 clusters in more detail. Specifically, it would be helpful to indicate whether these regions overlap with known transposable elements, gene bodies, or intergenic regions, and to report the typical size range of the clusters. Are any of the piRNAs produced from these clusters predicted to target known transcripts? 

      We thank the reviewer’s insightful comments. The overlap of DART4 piSL with transposable elements, gene bodies, and intergenic regions is shown in the right panel of Supplementary Fig. 6E (denoted as “Rhino reduced regions in DART4 GLKD” in the figure). The typical size range of these clusters is presented in Supplementary Fig. 6G. The annotation of piRNA reads derived from these piSL is shown in the right panel of Supplementary Fig. 6F, indicating that most of them appear to target host genes. The specific genes and transposons matched by the piRNAs produced from DART4 piSL are listed in Supplementary Table 8.

      (5)  While correlations between Rhino and ADMA histone marks (especially H3R8me2a,H3R17me2a, H4R3me2a) are robust, many ADMA-enriched regions do not recruit Rhino. Please discuss this observation and consider the possible involvement of additional factors.

      We thank the reviewer’s insightful comments. As pointed out, not all ADMA-enriched regions recruit Rhino; rather, Rhino is recruited only at sites where ADMAs overlap with H3K9me3. Furthermore, the combination of H3K9me3 and ADMAs alone does not fully account for the specificity of Rhino recruitment, suggesting the involvement of additional co-factors (for example, other ADMA marks such as H3R42me2a, or chromatininteracting proteins). In addition, since histone modifications—including arginine methylation—have the possibility that they are secondary consequences of modifications on other proteins rather than primary regulatory events, it is possible that DART1/4 contribute to Rhino recruitment not only through histone methylation but also via arginine methylation of non-histone chromatin-interacting factors. However, methylation of HP1a does not appear to be involved (Supplementary Fig. 3G). We have added new sentences about these points in the Discussion section (page 18).

      (6) The manuscript states that Kipferl is present at DART4 clusters but does not stabilize Rhino binding. Please specify which experimental results support this conclusion and explain.

      We apologize for the lack of clarity regarding Kipferl data. Supplementary Fig. 7A and 7B show that Kipferl localizes at major DART4 piSL. This Kipferl localization is lost together with Rhino upon DART4 GLKD, indicating that Rhino localization at DART4 piSL depends on DART4 rather than on Kipferl. From these results, we infer that, unlike at authentic piRNA clusters, Kipferl may not be sufficient to stabilize the association of Rhino with the genome at DART4 piSL. We have added this interpretation on page 14.

      Minor points

      (1) Figure 1D: Please specify which piRNA clusters are included in the metaplot - all clusters, or only the major producers? 

      We thank the reviewer for the question. The metaplot was not generated from a predefined list of “all” piRNA clusters or only the “major producers.” Instead, it was constructed from Rhino ChIP–seq peaks (“Rhino domains”) that are ≥1.5 kb in length.These Rhino domains mainly correspond to the subregions within major dual-strand clusters (e.g., 42AB, 38C) as well as additional clusters such as 80F, 102F, and eyeless, among others. We have provided the full list of domains and their corresponding piRNA clusters (with genomic coordinates) in Supplementary Table 9 and added the additional explanation in Fig. 1d legend.

      (2) Supplemental Figure 5E is referred to as 5D in the main text.

      We corrected the figure citations on pages 11-12: the reference to Supplementary Fig. 5E has been changed to 5D, and the reference to Supplementary Fig. 5F has been changed to 5E.

      (3) Supplemental Figure 7C: The color legend does not match the pie chart, which may confuse readers.

      We thank the reviewer for the helpful comment. We are afraid we were not entirely sure what specific aspect of the legend was confusing, but to avoid any possible misunderstanding, we revised Supplemental Fig. 7C so that the color boxes in the legend now exactly match the corresponding colors in the pie chart. We hope this modification improves clarity.

      (4) Since the manuscript focuses on the roles of DART1 and DART4, including their expression profiles in OSCs and ovaries would help contextualize the observed phenotypes. Please consider adding this information if available.

      We thank the reviewer for the suggestion. We have now included a scatter plot comparing RNA-seq expression in OSCs and ovaries (Supplementary Fig. 3H). In these datasets, DART1 is strongly expressed in both tissues, whereas DART4 shows no detectable reads. Notably, ref. 28 reports strong expression of both DART1 and DART4 in ovaries by western blot and northern blot. In our own qPCR analysis in OSCs, DART4 expression is about 3% of DART1, which, although low, may still be sufficient for functional roles such as modification of H3R17me2a (Fig. 3C, Supplementary Fig. 3F and 3I). We have added these new data and additional explanation in the revised manuscript (page 11).

      (5) Several of the genome browser snapshots, particularly scale and genome coordinates, are difficult to read. 

      We apologize for the difficulty in reading several of the genome browser snapshots in the original submission. We have re-generated the relevant figures using IGV, which provides clearer visualization of scale and genome coordinates. The previous images have been replaced with the improved versions in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors need to elaborate on what this sentence means, as it is very unclear what they are describing about Rhino residency: "The results show that Rhino in OSCs tends to reside in the genome where Rhino binds locally in the ovary (Fig. 1C)." 

      We apologize for the lack of clarity in the original sentence. The text has been revised as follows:

      ”Rhino expressed in OSCs bound predominantly to genomic sites exhibiting sharp and interspersed Rhino localization patterns in the ovary, while showing little localization within broad Rhino domains, including major piRNA clusters.”

      In addition, to clarify the behavior of Rhino at broad domains, we have added the phrase “the terminal regions of broad domains, such as major piRNA clusters” to the subsequent sentence.

      (2) The red correlation line is very confusing in Figure 5F. What sort of line does this mean in this scatter plot? 

      We apologize for the lack of clarity regarding the red line in Fig. 5F. The red line represents the least-squares linear regression fit to the data points, calculated using the lm() function in R, and was added with abline() to illustrate the correlation between ctrl GLKD and DART4 GLKD values. In the revised figure, we have clarified this in the legend by specifying that it is a regression line.

      (3) There is no confirmation of the successful knockdown of the various DARTs in the OSCs.

      We thank the reviewer for the comment. The knockdown efficiency of the various DARTs in OSCs was confirmed by RT–qPCR. The data are now shown in Supplementary Fig. 3J. 

      (4) What is the purpose of an unnumbered "Method Figure" in the supplementary data file? Why not just give it a number and mention it properly in the text? 

      We thank the reviewer for the suggestion. We have now assigned a number to the previously unnumbered "Method Figure" and have included it as Supplementary Fig. 9.

      The figure is now properly cited in the Methods section.

      (5) For Figure 5A, those fly strain numbers in the labels are better reserved in the Methods, and a more appropriate label is to describe the GAL4 driver and the UAS-RNAi construct by their conventional names.

      We thank the reviewer for the suggestion. The labels in Fig. 5A have been updated to use the conventional names of the GAL4 drivers and UAS-RNAi constructs. Specifically, they now read Ctrl GLKD (nos-GAL4 > UAS-emp) and DART4 GLKD (nos-GAL4 > UASDART4). The original fly strain numbers are listed in the Methods section.

    1. eLife Assessment

      This useful study presents the potentially interesting idea that LRRK2 regulates cellular BMP levels and their release via extracellular vesicles, with GCase activity further modulating this process in mutant LRRK2-expressing cells. However, some of the evidence supporting these conclusions remains incomplete, and additional work is suggested under certain conditions. Overall, the study will be of interest to cell biologists working on Parkinson's disease.

    2. Reviewer #1 (Public review):

      Summary:

      Even though mutations in LRRK2 and GBA1 (which encodes the protein GCase) increase the risk of developing Parkinson's disease (PD), the specific mechanisms driving neurodegeneration remain unclear. Given their known roles in lysosomal function, the authors investigate how LRRK2 and GCase activity influence the exocytosis of the lysosomal lipid BMP via extracellular vesicles (EVs). They use fibroblasts carrying the PD-associated LRRK2-R1441G mutation and pharmacologically modulate LRRK2 and GCase activity.

      Strengths:

      The authors examine both proteins at endogenous levels, using MEFs instead of cancer cells. The study's scope is potentially interesting and could yield relevant insights into PD disease mechanisms.

      Weaknesses:

      Many of the authors' conclusions are overstated and not sufficiently supported by the data. Several statistical errors undermine their claims. Pharmacological treatment is very long, leading to potential off target effects. Additionally, the authors should be more rigorous when using EV markers.

      Comments on revisions:

      The authors have not addressed most of my concerns. For example, instead of trying with a 1-2 hour MLi2 treatment, they cited all the papers that use extremely long time points for LRRK2 inhibition; the fact that other groups do it does not mean it is biologically correct. They also refused to quantify their western blots in a proper manner, without the "hyper-normalization" claiming that it is an accepted way to quantify western blots. Again, it is statistically incorrect and biologically impossible. They also do not have a satisfactory explanation as to why the R1441G cells (which increase LRRK2 kinase activity) have no effect on EV release, but they still claim it is LRRK2 kinase activity dependent.

      Overall, I am very confused by the model proposed by the authors. They only see increased EV release in the G2019S expressing cells, but not the R1441G cells, yet they claim that the increase of EV release is LRRK2 kinase activity dependent. Then, they claim that the presence of BMP (unchanged in R1441G vs CTL) in EVs is also LRRK2 kinase activity dependent. Finally, they perform TIRF with pHluorin-CD63 construct and observed an increase in G2019S cells vs CTL "further confirming that BMP release is associated with EV secretion". First, I could not see the increase in BMP release in G2019S cells (if I missed it, I apologize). And second, why didn't they do this experiment in R1441G cells? As, the R1441G cells have not displayed an increase in EV release compared to CTL cells, it could also be possible that the BMP release might be more abundant through lysosomal exocytosis (which could explain the pHluorin results) than EVs. Overall, the authors nicely demonstrate that the R1441G cells have more BMP species, likely due to increase CLN5 expression, but the release of the BMP is still not clear to this reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, authors used MEFs expressing the R1441G mutant of leucine-rich repeat kinase 2 (LRRK2), a mutant associated with the early onset of Parkinson's disease. They report that in these cells LAMP2 fluorescence is higher but BMP fluorescence is lower, MVE size is reduced and that MVEs contain less ILVs. They also report that LAMP2-positive EVs are increased in mutant cells in a process sensitive to LRRK2 kinase inhibition but are further increased by glucocerebrosidase (GCase) inhibition, and that total di-22:6-BMP and total di-18:1-BMP are increased in mutant LRRK2 MEFs compared to WT cells by mass spectrometry. They also report that LRRK2 kinase inhibition partially restores cellular BMP levels, and that GCase inhibition further increased BMP levels, and that in EVs from the LRRK2 mutant, LRRK2 inhibition decreases BMP while GCase inhibition has the opposite effect. Moreover, they report that BMP increase is not due to increased BMP synthesis, although authors observe that CLN5 is increased in LRRK2 mutant cells. Finally, they report that GW4869 decreases EV release and exosomal BMP, while bafilomycin A1 increases EV release. They conclude that LRRK2 regulates BMP levels (in cells) and release (via EVs). They also conclude that the process is modulated by GCase in LRRK2 mutant cells, and that these studies may contribute to the use of BMP-positive EVs as a biomarker for Parkinson's disease and associated treatments.

      Strengths:

      This is a potentially interesting paper,. However, I had comments that authors needed to address to clarify some aspects of their study.

      Weaknesses:

      (1) The authors seem to have missed the point in their reply to my first comment. They mention the paper by Stuffers et al., who reports that endosome biogenesis continues without ESCRT. This is a nice paper, but it is irrelevant to the subject at hand. In my initial comment, I drew the author's attention to an apparent contradiction: higher LAMP2 staining in R1441G LRRK2 knock-in MEFs and yet smaller MVEs with a reduced surface area. LAMP2 being one of the major glycoproteins of MVE's limiting membrane, one would have expected lower LAMP2 staining if cells contain fewer and smaller MVEs. Authors now state that elevated LAMP2 expression in cells expressing R1441G reflects a cell type-specific effect (differential penetrance of LRRK2 signaling on lysosomal biogenesis), because amounts of LAMP1 and CD63 are similar in cells from LRRK2 G2019S PD patients and control cells (new Fig 7A-F). However, authors still conclude that LRRK2 modulates the lysosomal network, including LAMP2 and CLN5. Does it?

      Similarly, the mass spec analysis of BMP (Fig S1H) does not support the data in Fig 1. Does this Table include all major isoforms found in these cells? If so, the dominant isoform is by far the di-18:1 isoform in wt and R1441G cells (at least 10X more abundant than other isoforms). Now, di-18:1-BMP is roughly 4X more abundant in R1441G cells when compared to wt cells, while BMP is reduced by half in R1441G cells (light microscopy in Fig 1). Authors argue that light microscopy may only detects a so-called antibody accessible pool. What is this? And why would this pool decrease in R1441G cells when LAMP2 is higher? Alternatively, they argue that the anti-BMP antibody may be less specific and detect other analytes. As I had already mentioned, this makes no sense, since the observed signal is lower and not higher. If authors do not trust their light microscopy analysis, why show the data?

      (2) Cells contain 3 LAMP2 isoforms. Which one is upregulated and/or secreted in exosomes?

      (3) The new Fig S4A is far from convincing. How were cells fractionated and what are the gradients (not described in Methods)? CD63 (presumably endolysosomes) is spread over fractions 8 - 13. LRRK2 (fractions 8-9) does not copurify with CD63. The bulk of LRRK2 is at the bottom (presumably cytosol if this is a floatation gradient), and a minor fraction moves into the gradient. CLN5 is even less clear since the bulk is also at the bottom with a tiny fraction only between LRRK2 and CD63. Also, why do authors conclude that a considerable pool of newly synthesized CLN5 did not reach its final destination at the endolysosome and may instead be retained in the ER? Where is the ER on the gradient?

      (4) Fig S4B shows blots of whole cell lysates from CTRL and LRRK2 mutant-derived fibroblasts: 6 lanes are shown but without captions, containing varying amounts of calnexin and CD63. In addition, the blots look very dirty. Where is CD63? Is it the minor band at ≈37 kD (as in Fig S4A)? Or the major band below the 50kD marker? What are the other bands on these blots? As a result, the quantification shown in the bar graph does not mean much.

      (5) The cell content of 18.1-BMP is increased approx. 5X by BafA1 (Fig 6C) but amounts of 18.1-BMP secreted in EVs hardly changes (Fig 6E). Since BMP is mostly present as 18.1 isoform (22:6-BMP being only a minor species, Fig S1H), does it mean that BafA1 does not increase BMP secretion and/or only a minor fraction of total cellular BMP is secreted in exosomes?

      Comments on revisions:

      How come 0.2 mmol/L of 22:6 and 18:1 fatty acid both correspond to 65 µg/mL (Fig 4A)?

      It is stated in the Legend of Fig4 that long (B-C) and short (D) chase time points are shown as fold change. There is no panel D in the figure.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful study presents the potentially interesting concept that LRRK2 regulates cellular BMP levels and their release via extracellular vesicles, with GCase activity further modulating this process in mutant LRRK2-expressing cells. However, the evidence supporting the conclusions remains incomplete, and certain statistical analyses are inadequate. This work would be of interest to cell biologists working on Parkinson's disease.

      Reviewer #1 (Public review):

      Summary:

      Even though mutations in LRRK2 and GBA1 (which encodes the protein GCase) increase the risk of developing Parkinson's disease (PD), the specific mechanisms driving neurodegeneration remain unclear. Given their known roles in lysosomal function, the authors investigate how LRRK2 and GCase activity influence the exocytosis of the lysosomal lipid BMP via extracellular vesicles (EVs). They use fibroblasts carrying the PDassociated LRRK2-R1441G mutation and pharmacologically modulate LRRK2 and GCase activity.

      Strengths:

      The authors examine both proteins at endogenous levels, using MEFs instead of cancer cells. The study's scope is potentially interesting and could yield relevant insights into PD disease mechanisms.

      Weaknesses:

      Many of the authors' conclusions are overstated and not sufficiently supported by the data. Several statistical errors undermine their claims. Pharmacological treatment is very long, leading to potential off-target effects. Additionally, the authors should be more rigorous when using EV markers.

      We thank the reviewer for these valuable observations. In the revised manuscript, we have addressed each of these points as follows:

      (1) Conclusions and data support – We carefully revised our text throughout the manuscript to ensure that all conclusions are better supported by the presented data. For instance, we now explicitly state that while pharmacological modulation supports the regulatory role of LRRK2 activity in EV-mediated BMP release, we have softened our conclusions concerning the contribution of GCase in this model (see revised Results and Discussion sections).

      (2) Statistical analyses – We reanalyzed experiments involving more than two groups and replaced simple t-tests with non-parametric Kruskal-Wallis tests followed by Dunn’s post hoc comparisons. This approach, described in the updated figure legends (e.g., Figure 2D-F and H-J), provides a more rigorous statistical framework that accounts for small sample sizes and variability typical of EV quantifications.

      (3) Pharmacological treatment duration – Prolonged MLi-2 treatments have been extensively used in the field without evidence of significant off-target effects. Several studies, including Fell et al. (2015, J Pharmacol Exp Ther 355:397-409), De Wit et al. (2019, Mol Neurobiol 56:5273-5286), Ho et al. (2022, NPJ Parkinson’s Dis 8:115),Tengberg et al. (2024, Neurobiol Dis 202:106728), and Jaimon et al. (2025, Sci Signal 18:eads5761), have applied long-term (24-48 h) MLi-2 treatments at comparable concentrations without detecting toxicity or off-target alterations, including in MEFs (Ho et al., 2022; Dhekne et al., 2018, eLife 7:e40202).  In our study, 48-hour incubations were necessary to sustain full LRRK2 inhibition throughout the extracellular vesicle (EV) collection period. EV biogenesis, BMP biosynthesis, and packaging into EVs are timedependent processes; therefore, extended incubation and collection periods (48 h) were required to allow downstream effects of LRRK2 inhibition on BMP production and release to manifest, and to obtain sufficient EV material for biochemical and lipidomic analyses. This experimental design also reflects our and others’ previous observations in humans and non-human primates, where urinary BMP changes are associated with chronic or subchronic LRRK2 inhibitor treatment (Baptista MAS, Merchant K, et al. Sci Transl Med. 2020, 12:eaav0820; Jennings D, et al. Sci Transl Med. 2022, 14:eabj2658; Maloney MT, et al. Mol Neurodegener. 2025, 20:89). Importantly, under these conditions, we did not observe significant changes in cell viability or morphology, supporting that the treatment was well tolerated.  We have clarified this rationale in the revised Methods section to emphasize that the prolonged incubation reflects the experimental design for EV isolation rather than a requirement for achieving LRRK2 inhibition.

      (4) EV markers – We and others have reported enrichment of Flotillin-1 and LAMP proteins in isolated small EV fractions (Kowal et al., 2016; Lu et al., 2018; Mathieu et al., 2021; Ferreira et al., 2022). Moreover, LAMP proteins have been reported to be more enriched in EVs of endolysosomal origin (Mathieu et al., 2021). To further strengthen this point, we performed new experiments using a CD63-pHluorin sensor combined with TIRF microscopy, which allowed real-time visualization of CD63-positive exosome release. These new data (now presented in Figure 7, Panels G-I; Videos 1 and 2) confirm increased CD63-positive EV release in LRRK2 mutant fibroblasts, which was reversed by LRRK2 inhibition with MLi-2. The CD63-positive compartment was also largely BMPpositive (new Figure 7D, F, G), reinforcing our conclusions and providing additional rigor in EV marker validation.

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors used MEFs expressing the R1441G mutant of leucine-rich repeat kinase 2 (LRRK2), a mutant associated with the early onset of Parkinson's disease. They report that in these cells LAMP2 fluorescence is higher but BMP fluorescence is lower, MVE size is reduced, and that MVEs contain less ILVs. They also report that LAMP2-positive EVs are increased in mutant cells in a process sensitive to LRRK2 kinase inhibition but are further increased by glucocerebrosidase (GCase) inhibition, and that total di-22:6-BMP and total di-18:1-BMP are increased in mutant LRRK2 MEFs compared to WT cells by mass spectrometry. They also report that LRRK2 kinase inhibition partially restores cellular BMP levels, and that GCase inhibition further increases BMP levels, and that in EVs from the LRRK2 mutant, LRRK2 inhibition decreases BMP while GCase inhibition has the opposite effect. Moreover, they report that the BMP increase is not due to increased BMP synthesis, although the authors observe that CLN5 is increased in LRRK2 mutant cells. Finally, they report that GW4869 decreases EV release and exosomal BMP, while bafilomycin A1 increases EV release. They conclude that LRRK2 regulates BMP levels (in cells) and release (via EVs). They also conclude that the process is modulated by GCase in LRRK2 mutant cells, and that these studies may contribute to the use of BMP-positive EVs as a biomarker for Parkinson's disease and associated treatments.

      Strengths:

      This is an interesting paper, which provides novel insights into the biogenesis of exosomes with exciting biomedical potential. However, I have comments that authors need to address to clarify some aspects of their study.

      Weaknesses:

      (1) The intensity of LAMP2 staining is increased significantly in cells expressing the R1441G mutant of LRRK2 when compared to WT cells (Figure 1C). Yet mutant cells contain significantly smaller MVEs with fewer ILVs, and the MVE surface area is reduced (Figure 1D-F). This is quite surprising since LAMP2 is a major component of the limiting membrane of late endosomes. Are other proteins of endo-lysosomes (eg, LAMP1, CD63, RAB7) or markers (lysotracker) also decreased (see also below)?

      As referenced in our original manuscript, several previous studies have reported endolysosomal morphological and homeostatic defects in cells harboring pathogenic LRRK2 mutations. LAMP2 can be upregulated as part of a lysosomal biogenesis or stress response (e.g., via MiT/TFE transcription factors such as TFEB; Sardiello et al., Science 2009, 325:473-477), whereas ILV biogenesis is primarily controlled by ESCRT- and SMPD3-dependent pathways that are regulated independently of MiT/TFE-driven transcriptional programs. Indeed, Stuffers et al. (Traffic 2009, 10:925-937) demonstrated that depletion of key ESCRT subunits markedly inhibited ILV formation while concomitantly increasing LAMP2 expression, highlighting the mechanistic dissociation between LAMP2 abundance and ILV number. In our study, we observed a similar pattern in R1441G LRRK2 MEFs, in which elevated LAMP2 staining and protein levels occurred despite a reduction in MVE size and ILV number. We interpret this as a compensatory lysosomal biogenesis response.

      Our revised manuscript now includes new immunofluorescence data for BMP, LAMP1 and CD63 (New Figure 7, Panels A-F) together with biochemical analysis of CD63 protein levels (New Supplemental Figure 4, Panel B) in human skin fibroblasts derived from healthy donors and LRRK2 G2019S PD patients. Quantitative analysis of these experiments revealed no statistically significant differences in total cellular levels of either LAMP1 or CD63 between groups. However, we observed a consistent decrease in BMP immunostaining intensity (New Figure 7, Panel A and B), in agreement with our findings in mouse fibroblasts. We therefore propose that the elevated LAMP2 expression observed in the engineered MEF clone expressing R1441G may reflect a cell type-specific effect, potentially linked to differential penetrance of LRRK2 signaling on the lysosomal biogenesis response. We have updated the Results and Discussion section of the manuscript to incorporate and clarify these findings.

      (2) LRRK2 has been reported to interact with endolysosomal membranes. Does the R1441G mutant bind LAMP2- and/or BMP-positive membranes? 

      We agree that LRRK2 has been reported to associate dynamically with endolysosomal membranes, particularly under conditions of endolysosomal stress or damage (Eguchi T, et al. PNAS 2018, 115:E9115-E9124; Bonet-Ponce L, et al. Sci Adv. 2020, 6:eabb2454; Wang X, et al. Elife. 2023, 12:e87255).

      Nevertheless, to explore whether LRRK2 associates with BMP-positive endolysosomes, we performed subcellular fractionation followed by biochemical analysis of endolysosomal fractions, since our available LRRK2 antibodies did not provide reliable immunofluorescence signals. These experiments were carried out using human skin fibroblasts derived from both healthy controls and Parkinson’s disease patients carrying the LRRK2-G2019S mutation. In both control and mutant fibroblasts, a pool of LRRK2 was detected in fractions positive for the BMP synthase CLN5 and the endolysosomal marker CD63 (New Supplementary Figure 4, Panel A), supporting the localization of LRRK2 to endolysosomal membranes that are likely BMP-enriched. Our manuscript’s Results and Methods sections have been updated accordingly.

      Does the mutant affect endolysosomes?

      As referenced in our original manuscript, several studies have reported that pathogenic LRRK2 mutations can lead to endolysosomal defects. Consistent with these reports, we also observed morphological alterations in endolysosomes of cells expressing mutant LRRK2, including reduced MVE size and fewer ILVs, as shown in Figure 1D–F. These observations are in agreement with previously described phenotypes associated with pathogenic LRRK2 variants. Furthermore, in mutant LRRK2 MEFs, and now in humanderived fibroblasts (see new Figure 7, Panel A and B), we observed a decrease in BMP immunostaining signal.

      (3) Immunofluorescence data indicate that BMP is decreased in mutant LRRK2expressing cells compared to WT (Figure 1A-B), but mass spec data indicate that di-22:6BMP and di-18:1-BMP are increased (Figure 3). Authors conclude that the BMP pool detected by mass spec in mutant cells is less antibody-accessible than that present in wt cells, or that the anti-BMP antibody is less specific and that it detects other analytes. This is an awkward conclusion, since the IF signal with the antibody is lower (not higher): why would the antibody be less specific? Could it be that the antibody does not see all BMP isoforms equally well? Moreover, the observations that mutant cells contain smaller MVEs (Figure 1D-F) with fewer ILVs are consistent with the IF data and reduced BMP amounts. This needs to be clarified.

      As previously reported by us (Lu et al., J Cell Biol 2022;221:e202105060) and others (Berg AL, et al. Cancer Lett. 2023, 557:216090), discrepancies can occur between BMP levels detected by immunofluorescence and those quantified by mass spectrometry. This is because immunostaining reflects the pool of antibody-accessible BMP, whereas lipidomics measures the total cellular content of all BMP molecular species, irrespective of their distribution or accessibility.

      We agree that the anti-BMP antibody may not detect all BMP isoforms equally well. Differences in acyl chain composition (such as the degree of saturation or chain length) can alter the stereochemistry of BMP and, consequently, epitope accessibility to antibody binding.

      In addition, in a personal communication with Monther Abu-Remaileh (Stanford University), we were informed that the antibody may also cross-react with other lipid species in endolysosomes. Nevertheless, since there is no formal evidence supporting this, we have removed the sentence in the Discussion section stating “Alternatively, the antibody may also detect non-BMP analytes” to avoid any potential misinterpretations. In its place, we have added a short statement noting that “not all BMP isoforms may be detected equally well”.

      Mass spectrometry data are only shown for two BMP species (di-22:6, di-18:1). What are the major BMP isoforms in WT cells? The authors should show the complete analysis for all BMP species if they wish to draw quantitative conclusions about the amounts of BMP in wt and mutant cells. Finally, BMP and PG are isobaric lipids. Fragmentation of BMPs or PGs results in characteristic fingerprints, but the presence of each daughter ion is not absolutely specific for either lipid. This should be clarified, e.g., were BMP and PG separated before mass spec analysis? Was PG affected? The authors should also compare the BMP data with mass spec data obtained with a control lipid, e.g., PC.

      Regarding BMP isoforms, our targeted UPLC-MS/MS analyses revealed that 2,2′-di-22:6-BMP (sn2/sn2′) and 2,2′-di-18:1-BMP (sn2/sn2′) are the predominant BMP isoforms in MEF cells, consistent with previous reports showing docosahexaenoyl (22:6; DHA) and oleoyl (18:1) BMP as the most abundant isoforms. Across diverse mammalian cells and tissues, BMP typically exhibits a fatty acid composition dominated by oleoyl, with polyunsaturated fatty acids (particularly DHA) also contributing substantially. Enrichment of DHA-containing BMP species has been observed in multiple systems, including rat uterine stromal cells, PC12 cells, THP-1 and RAW macrophages, as well as in rat and human liver. This consistent presence of oleoyl- and docosahexaenoyl-containing BMP species across tissues indicates that these acyl chains are conserved features influencing the lipid’s structural and functional characteristics (Kobayashi et al. J Biol Chem, 2002; Hullin-Matsuda et al. Prostaglandins Leukotriens Essent Fatty Acids, 2009; Thompson et al. Int J Toxicol. 2012; Delton-Vandenbroucke et al. J Lipid Res, 2019).

      Nevertheless, we have included a Table (Panel H in updated Supplemental Figure 1) showing other BMP species that were also detected in our lipidomics analysis. Overall, dioleoyl (18:1)- and di-docosahexaenoyl (22:6)-BMP species were the most abundant in MEF cells, whereas di-arachidonoyl (20:4)- and di-linoleoyl (18:2)-BMP isoforms were present at lower levels. Consistently, R1441G LRRK2 MEFs displayed higher levels of dioleoyl- and di-docosahexaenoyl-BMP compared with WT cells, and these elevations were reduced following LRRK2 kinase inhibition with MLi-2. Data from three independent representative experiments are shown, and the manuscript has been revised accordingly to include these results.

      Regarding the separation of BMP and PG species, we confirm that BMP and PG were chromatographically resolved prior to MS/MS detection using a validated UPLC-MS/MS method developed by Nextcea, Inc. PG exhibits a substantially longer LC retention time than BMP, ensuring complete baseline separation. This approach (established by Nextcea nearly two decades ago and later validated through a multi-year collaboration with the U.S. FDA to clinically qualify di-22:6-BMP as a biomarker) prevents any ambiguity arising from the isobaric nature of BMP and PG species. No changes in PG levels were detected under any experimental conditions.

      Finally, we employed isotope-labeled BMP as an internal standard to ensure robust normalization across samples. These additional details and references cited above have been included in the revised Methods and References sections to further clarify the analytical rigor of our lipidomics workflow.

      (4) It is quite surprising that the amounts of labeled BMP continue to increase for up to 24h after a short 25min pulse with heavy BMP precursors (Figure 4B).

      In these isotope-labeling experiments, it is important to note (as described in our original manuscript) that two distinct pools of metabolically labeled BMP species were detected: semi-labeled BMP (with only one heavy isotope-labeled fatty acyl chain) and fully-labeled BMP (with both fatty acyl chains labeled). We consider the fully-labeled BMP pool to provide the most reliable readout for BMP turnover, as it showed a rapid decline after a 1h chase (decreasing by more than 50% within 8 h in all conditions), reaching its lowest levels at the end of the 48-h chase period.

      The apparent increase in semi-labeled BMP species over time may be explained by continued incorporation of labeled precursors following the initial pulse. Specifically, once existing semi-labeled and fully-labeled BMP molecules are degraded by PLA2G15 (Nyame K, et al. Nature 2025, 642:474-483), the resulting isotope-labeled lysophosphatidylglycerol (LPG) and fatty acids could be recycled and re-enter a new round of BMP biosynthesis, leading to a gradual accumulation of semi-labeled BMP such as di-18:1-BMP. Why would this reasoning not also apply to the fully-labeled species? Once the pulse is completed, newly incorporated non-labeled fatty acyl chains present in the cellular pool can compete with labeled ones during subsequent rounds of lipid remodeling or synthesis. As a result, the probability of generating semi-labeled BMP molecules becomes higher than that of forming fully-labeled species. Consistent with this, our data show an increase in only semi-labeled BMP species (but not in fully-labeled ones) up to 24 hours after the pulse. We have added a clarification regarding this point in the revised manuscript.

      (5) It is argued that upregulation of CLN5 may be due to an overall upregulation of lysosomal enzymes, as LAMP2 levels were also increased (Figure 2A, C, E). Again, this is not consistent with the observed decrease in MVE size and number (Figure 1D-F). As mentioned above, other independent markers of endo-lysosomes should be analyzed (eg, LAMP1, CD63, RAB7), and/or other lysosomal enzymes (e.g. cathepsin. D).

      Our revised manuscript now includes new immunofluorescence data for BMP, LAMP1 and CD63 (New Figure 7, Panels A-F) together with biochemical analysis of CD63 protein levels (New Supplemental Figure 4, Panel B) in human skin fibroblasts derived from healthy controls and LRRK2 G2019S PD patients. Quantitative analysis of these experiments revealed no statistically significant differences in total cellular levels of either LAMP1 or CD63 between groups. However, our results consistently show increased CLN5 protein levels in both mouse and human fibroblast cell lines harboring pathogenic LRRK2 mutations. Upregulation of CLN5 may reflect a compensatory effect from loss of BMP via EV exocytosis. As discussed above, the elevated LAMP2 signal observed in the engineered MEF clone expressing R1441G could represent a cell type-specific effect, potentially linked to differential penetrance of LRRK2 signaling on the lysosomal biogenesis response. Our Results and Discussion sections have been updated accordingly.

      (6) The authors report that the increase in BMP is not due to an increase in BMP synthesis (Figure 4), although they observe a significant increase in CLN5 (Figure 5A) in LRRK2 mutant cells. Some clarification is needed.

      In our original manuscript, we proposed that although CLN5 protein levels are increased in R1441G LRRK2 MEFs, the absence of significant changes in BMP synthesis rates (Figure 4B, C) may reflect either limited substrate availability or that CLN5 is already operating near its maximal enzymatic capacity. Our new subcellular fractionation data (new Figure 7, Panel A) further indicate that, despite a relative increase in total CLN5 levels in G2019S LRRK2 human fibroblasts, the amount of CLN5 associated with endolysosomes remains comparable between mutant LRRK2 and control cells. This suggests that a considerable fraction of upregulated CLN5 may not localize to endolysosomes, potentially accumulating in the endoplasmic reticulum due to enhanced translation or impaired trafficking. Unfortunately, the available anti-CLN5 antibody did not yield reliable immunofluorescence signals, preventing us from directly confirming this possibility. Nevertheless, in light of our new data (new Supplemental Figure 4A), we have included a clarification in the revised manuscript discussing this possibility as well.

      (7) Authors observe that both LAMP2 and BMP are decreased in EVs by GW4869 and increased by bafilomycin (Figure 6). Given my comments above on Figure 1, it would also be nice to illustrate/quantify the effects of these compounds on cells by immunofluorescence.

      We appreciate the reviewer’s suggestion. We have previously published immunofluorescence data showing increased BMP accumulation in endolysosomes following treatment with bafilomycin A1 Lu A, et al. J Cell Biol. 2009, 184:863-879). However, in the present study, our lipidomics analyses revealed a decrease in both di22:6-BMP and di-18:1-BMP species in cells treated with this compound. As discussed above, this apparent discrepancy likely reflects methodological differences between immunofluorescence, which detects only antibody-accessible BMP pools, and lipidomics, which quantifies total cellular BMP content. 

      Moreover, in a recent study (Andreu Z, et al. Nanotheranostics 2023, 7:1-21), BMP levels were analyzed by immunofluorescence in cells treated with spiroepoxide, a potent and selective irreversible inhibitor of nSMase (different from GW4869) known to block EV release. Spiroepoxide-treated cells showed decreased BMP immunostaining; a result that, again, does not align with mass spectrometry data revealing increased cellular BMP levels upon GW4869 treatment. Notably, in that study, spiroepoxide was used instead of GW4869 because the intrinsic autofluorescence of GW4869 could potentially interfere with the immunofluorescence BMP signal.

      We therefore consider lipidomics measurements to provide a more reliable and quantitative representation of BMP dynamics under these conditions.

      Reviewer #1 (Recommendations for the authors):

      Major concerns:

      (1) 48 h for MLi2 treatment seems too long. LRRK2 kinase activity is inhibited with much shorter incubation times. The longer the incubation, the more likely off-target effects are. The authors should repeat these experiments with 1-2 h of MLi2.

      We thank the reviewer for this valuable comment. We acknowledge that MLi-2 is a potent and selective LRRK2 kinase inhibitor that achieves near-complete target engagement within a few hours of treatment. However, prolonged exposure has been widely used in the field without evidence of significant off-target effects. Several studies, including Fell et al. (2015, J Pharmacol Exp Ther 355:397-409), De Wit et al. (2019, Mol Neurobiol 56:5273-5286), Ho et al. (2022, NPJ Parkinson’s Dis 8:115), Tengberg et al. (2024, Neurobiol Dis 202:106728), and Jaimon et al. (2025, Sci Signal 18:eads5761), have employed long-term (24-48 h) MLi-2 treatments at comparable concentrations without detecting toxicity or off-target alterations, including in MEFs (Ho et al., 2022; Dhekne et al., 2018, eLife 7:e40202).

      In our study, 48-hour incubations were necessary to sustain full LRRK2 inhibition throughout the extracellular vesicle (EV) collection period. EV biogenesis, BMP biosynthesis, and packaging into EVs are time-dependent processes; therefore, extended incubation and collection periods (48 h) were required to allow downstream effects of LRRK2 inhibition on BMP production and release to manifest, and to obtain sufficient EV material for biochemical and lipidomic analyses. This experimental design also reflects our and others’ previous observations in humans and non-human primates, where urinary BMP changes are associated with chronic or subchronic LRRK2 inhibitor treatment (Baptista MAS, Merchant K, et al. Sci Transl Med. 2020, 12:eaav0820; Jennings D, et al. Sci Transl Med. 2022, 14:eabj2658; Maloney MT, et al. Mol Neurodegener. 2025, 20:89). Importantly, under these conditions, we did not observe significant changes in cell viability or morphology, supporting that the treatment was well tolerated.

      We have clarified this rationale in the revised Methods section to emphasize that the prolonged incubation reflects the experimental design for EV isolation rather than a requirement for achieving LRRK2 inhibition.

      (2) Is there a reason why the authors don't include CD81, CD63, and Syntenin-1 in their study as an EV marker? Using solely Flotilin-1 does not seem to be enough to justify their claims.

      We actually used not only Flotillin-1 but also LAMP2 as EV markers in our study. While both Flotillin-1 and LAMP2 detection on EVs may vary depending on the cell type, we and others have reported enrichment of Flotillin-1 and LAMP proteins in isolated small EV fractions (Kowal et al., 2016; Lu et al., 2018; Mathieu et al., 2021; Ferreira et al., 2022). In particular, one of these studies reported that “LAMP1-positive subpopulations of EVs represent MVB/lysosome-derived exosomes, which also contain syntenin-1.” Therefore, our choice of EV markers (LAMP2 and Flotillin-1) is consistent with those previously and reliably used to characterize small EVs.

      Nevertheless, to further address the reviewer’s concern, we performed additional experiments using a CD63-based fluorescence sensor (CD63-pHluorin), which, combined with TIRF microscopy, enables real-time visualization of CD63-positive exosome release. These experiments were conducted in control and LRRK2-mutant fibroblasts, and the data are presented in new Figure 7 (Panels G-I; Videos 1 and 2). We have also included all relevant references and clarified this point in the revised manuscript.

      (3) Indeed, to quantify the amount of certain proteins in EVs, the authors should normalize them by CD63 or CD81.

      Protein normalization in isolated EV fractions is indeed challenging. Although tetraspanins such as CD63 and CD81 are commonly enriched in EVs, their abundance can vary considerably across EV subpopulations, cell types, and experimental conditions, making them unreliable as universal normalization markers (Théry et al., J Extracell Vesicles, 2018; Margolis & Sadovsky, Nat Rev Mol Cell Biol, 2019).  Current guidelines from the International Society for Extracellular Vesicles (ISEV), as described in the Minimal Information for Studies of Extracellular Vesicles 2018 (MISEV2018; Théry C, et al. JExtracell Vesicles. 2018, 7:1535750) and updated in MISEV2024 (Welsh JA, et al. J Extracell Vesicles. 2024, 13:e12404), recommend reporting multiple EV markers rather than relying on a single protein for normalization. They also suggest ensuring comparable experimental conditions by using the same number of cells at the start of the experiment and normalizing EV data to cell number or whole-cell lysate protein content at the end of the experiment, among other approaches.

      In our study, we normalized EV data to whole-cell lysate (WCL) protein content, as this approach accounts for differences in EV production due to variations in cell number or treatment conditions and is commonly used in the field (Kowal et al., PNAS, 2016; Mathieu et al., Nat Commun, 2021). We also included Flotillin-1 and LAMP2 as EV markers, both of which have been validated as molecular markers of small EV subpopulations.

      (4) Hyper normalization in WB quantification in Figure 2E-G is statistically incorrect, as it assumes that one group (in this case, R1441G ctrl) has no variability at all, which is not biologically possible. The authors should repeat the quantification without hypernormalizing one of their groups. This issue is prevalent across the whole manuscript.

      We understand the concern regarding “hyper-normalization” (i.e., expressing all values relative to one condition set to 1), which may mask variability in the reference group. However, it is standard practice in immunoblotting analysis to express data relative to a control condition for comparison, as variations in membrane transfer, exposure time, and signal development can differ across blots. In our case, the data are expressed as relative levels (arbitrary units) rather than absolute quantitative values. To facilitate comparison between datasets and account for inter-experimental variation, we continued to express values relative to the mutant LRRK2 MEF condition.

      On the other hand, in lipidomics experiments, despite using the same number of seeded cells and identical extraction and analysis protocols, minor biological and technical variability was observed across independent replicates. This variability is inherent to the experimental system and is now explicitly represented in the new table included in Supplemental Figure 1F, which compiles three independent representative lipidomics experiments showing quantitative BMP levels across different conditions.

      (5) The authors perform a t-test in Figure 2E-G when comparing more than 2 groups, which is wrong. The authors should use a two-way ANOVA as they are comparing genotype and treatment.

      We appreciate the reviewer’s comment and agree with this observation. The MLi-2 and CBE experiments were performed independently and in separate experimental runs; therefore, we have reanalyzed these datasets separately rather than combining them in a two-way ANOVA. To properly compare more than two groups within each dataset, we have now applied a Kruskal-Wallis test followed by an uncorrected Dunn’s post hoc test (Figure 2 D-F and H-J). This non-parametric approach is more appropriate for our data structure, as EV experiments are usually subject to high variability and immunoblot quantifications involving small sample sizes (n≈6) do not always meet the assumptions of normality or equal variance. The Kruskal-Wallis test does not assume normality or equal variances, making it more robust for small, variable biological datasets. The statistical analyses and figure legend have been updated in the revised manuscript accordingly.

      In addition, since our CBE treatments yielded statistically non-significant data, we have softened our conclusions throughout the manuscript concerning the contribution of GCase activity to EV-mediated BMP release modulation.

      (6) There is a very strong reduction in flotillin-1 in R1441G cells vs WT (Figure 2G) in the EV fraction. That reduction is further exacerbated with MLi2, which likely means it is not kinase activity dependent. Can the authors comment on that?

      We agree with the reviewer that Flotillin-1 showed a different behavior compared with LAMP2 in these experiments. As recommended by the MISEV guidelines (Théry C, et al. J Extracell Vesicles. 2018;  7:1535750; Welsh JA, et al. J Extracell Vesicles. 2024, 13:e12404), it is important to analyze more than one EV-associated protein marker. We examined LAMP2, which, together with LAMP1, has been reported to be specifically enriched in EVs of endolysosomal origin (exosomes; Mathieu et al., Nat Commun. 2021, 12:4389 ). In contrast, Flotillin-1 is also associated with small EVs but may represent a distinct EV subpopulation from those positive for LAMP proteins (Kowal J, et al. PNAS 2016, 113:E968-E977).

      Nevertheless, the biochemical analysis of isolated EV fractions was complemented by our lipidomics data and, in the revised version, by TIRF microscopy analysis of exosome release in control and G2019S LRRK2 human fibroblasts (new Figure 7, Panels G-I; Videos 1 and 2). In this analysis, we confirmed increased exocytosis of CD63-pHluorin– positive endolysosomes in G2019S LRRK2 human fibroblasts compared to controls, an effect that was reversed by MLi-2 treatment. The CD63-pHluorin–positive compartment of these cells was also largely positive for BMP (new Figure 7G). Collectively, these findings further support the regulatory role of LRRK2 activity in EV-mediated BMP secretion.

      (7) In Figure 2C, the authors should express that the LAMP2-EV and flotillin-1 EV fractions from the WB are highly exposed. As presently presented, it is slightly misleading.

      We thank the reviewer for this comment. In EV preparations, the amount of protein recovered is typically very low. Therefore, although we loaded all the EV protein obtained from each sample, the immunoblots for LAMP2 and Flotillin-1 in EV fractions required longer exposure times to visualize clear signals across all conditions. We have now indicated in the corresponding figure legend that these EV blots are long-exposure blots to facilitate signal detection and avoid any potential misunderstanding.

      (8) If Figure 2C and D are from two different experiments, they should not be plotted together in Figure 2E-G. You cannot compare the effect of MLi2 vs CBE if done in completely different experiments.

      We appreciate the reviewer’s comment and agree with this observation. The MLi-2 and CBE experiments were performed independently and in separate experimental runs; therefore, we have reanalyzed these datasets separately rather than combining them in a two-way ANOVA. To properly compare more than two groups within each dataset, we have now applied a Kruskal-Wallis test followed by an uncorrected Dunn’s post hoc test (Figure 2 D-F and H-J). This non-parametric approach is more appropriate for our data structure, as EV experiments are usually subject to high variability and immunoblot quantifications involving small sample sizes (n≈6) do not always meet the assumptions of normality or equal variance. The Kruskal-Wallis test does not assume normality or equal variances, making it more robust for small, variable biological datasets. The revised statistical analyses and figure legends have been updated accordingly in the manuscript.

      (9) The authors state that "For the R1441G MEF cells, MLi-2 decreased EV concentration while CBE increased EV particles per ml, in agreement with the effects observed in our biochemical analysis." As Figure S1D shows no statistical significance, the authors don't have sufficient evidence to make this claim.

      We apologize for this overstatement. We have revised the text to clarify that, although the differences did not reach statistical significance, a consistent trend toward decreased EV concentration upon MLi-2 treatment and increased EV release following CBE treatment was observed in R1441G MEF cells.

      (10) "Altogether, given that BMP is specifically enriched in ILVs (which become exosomes upon release), the data presented above support our biochemical analysis (Figure 2C, D, F) and suggest a role for LRRK2 and GCase in modulating BMP release in association with LAMP2-positive exosomes from MEF cells." As Figure 3E shows no statistical difference of BMP on EVs upon CBE treatment, this sentence is not accurate and should be reframed. Furthermore, the authors claim an increase in EV-LAMP2 in R1441G cells compared to WT, however, the amount of BMP in EVs of R1441G cells vs WT is unchanged with a non-significant reduction. This contradiction does not support the authors' conclusions and really puts into question their whole model.

      We thank the reviewer for this observation. After reanalyzing our biochemical data from isolated EV fractions (see new Panels D-F and H-J) using an improved statistical approach, we found that although EV-associated LAMP2 levels were consistently elevated in untreated R1441G LRRK2 MEFs compared to WT cells, CBE treatment only produced a non-significant trend toward increased EV-associated LAMP2 compared to untreated R1441G LRRK2 cells. Accordingly, we have revised the sentence to read as follows:

      “Altogether, given that BMP is specifically enriched in ILVs (which become exosomes upon release), the data presented above support our biochemical analysis (Figure 2C, E, G, I) and suggest that LRRK2 activity regulates BMP release in association with LAMP2positive exosomes, whereas GCase activity appears to have a more variable effect under the tested conditions.”

      We also agree with the reviewer that, in our MEF model, the amount of BMP in EVs of R1441G cells vs WT is unchanged with a non-significant reduction. However, pharmacological modulation supports our conclusion that BMP release is modulated by LRRK2 activity. Specifically, treatment with the LRRK2 inhibitor MLi-2 decreased EVassociated BMP and LAMP2 levels in R1441G LRRK2 MEFs, and our new data (new Figure 7, Panel G-I; Videos 1 and 2) show increased exocytosis of CD63-pHluorin– positive endolysosomes in G2019S LRRK2 human fibroblasts compared to controls, an effect that was reversed by MLi-2 treatment. The CD63-pHluorin–positive compartment of these cells was also largely positive for BMP (new Figure 7G).

      In light of the reviewer’s comment about CBE treatment, we have softened our conclusions throughout the manuscript concerning the contribution of GCase activity in this model.

      (11) In Figure 5, 16 h of MLi2 treatment is too long and can lead to off-target effects. I would advise reducing it to 1-4 h.

      Prolonged MLi-2 treatments have been extensively used in the field without evidence of significant off-target effects. Several studies, including Fell et al. (2015, J Pharmacol Exp Ther 355:397-409), De Wit et al. (2019, Mol Neurobiol 56:5273-5286), Ho et al. (2022, NPJ Parkinson’s Dis 8:115), Tengberg et al. (2024, Neurobiol Dis 202:106728), and Jaimon et al. (2025, Sci Signal 18:eads5761), have applied long-term (24-48 h) MLi-2 treatments at comparable concentrations without detecting toxicity or off-target alterations, including in MEFs (Ho et al., 2022; Dhekne et al., 2018, eLife 7:e40202). Moreover, the data presented in Figure 5 demonstrate a reduction in CLN5 protein levels in both MEFs and human fibroblasts following MLi-2 treatment, confirming the specificity of the observed effects in LRRK2 mutant cells.

      (12) "Our data suggest that BMP is exocytosed in association with EVs and that LRRK2 and GCase activities modulate BMP secretion." Again, cells carrying the R1441G mutation have the same amount of BMP in EVs than WT. This sentence is not factually accurate. Accordingly, CBE did not change the amount of BMP in EVs.

      We thank the reviewer for this observation and agree that, in our MEF model, the amount of BMP in EVs from R1441G LRRK2 cells is comparable to that observed in WT cells. However, pharmacological modulation supports our conclusion that BMP release is modulated by LRRK2 activity. Specifically, treatment with the LRRK2 inhibitor MLi-2 decreased EV-associated BMP levels in R1441G LRRK2 MEFs, and our new data (new Figure 7G-I; Videos 1 and 2) show increased exocytosis of CD63-pHluorin–positive endolysosomes in G2019S LRRK2 human fibroblasts compared to controls, an effect that was reversed by MLi-2 treatment. The CD63-pHluorin–positive compartment of these cells was also largely positive for BMP (new Figure 7G). These findings further support the regulatory role of LRRK2 activity in EV-mediated BMP secretion. In addition, in light of the reviewer’s comment about CBE treatment, we have softened our conclusions throughout the paper concerning the contribution of GCase activity in this model.

      (13) Figure 6; EV release should have been monitored by more accurate markers such as CD63 and CD81.

      We thank the reviewer for this comment. We and others (Kowal et al., 2016; Lu et al., 2018; Mathieu et al., 2021; Ferreira et al., 2022) have reported enrichment of Flotillin-1 and LAMP proteins in isolated small EV fractions. In particular, one of these studies (Mathieu et al., Nat Commun. 2021), in which bafilomycin A1 was also used (to boost exosome release), reported that “LAMP1-positive subpopulations of EVs represent MVB/lysosome-derived exosomes, which also contain syntenin-1.” Altogether, our choice of EV markers (LAMP2 and Flotillin-1) is consistent with those previously and accurately used to characterize EVs. We have now included all relevant references in the revised manuscript to further clarify this point.

      (14) Figure 6 suggests that exosomal BMP is controlled by EV release. I would think that is rather obvious.

      We agree that the finding that exosomal BMP release is influenced by EV secretion may appear “obvious.” However, our intention in Figure 6 was to provide direct experimental evidence confirming this relationship using pharmacological modulators of EV release. Specifically, inhibition of EV secretion with GW4869 reduced exosomal BMP levels, whereas stimulation with bafilomycin A1 increased them. These data were important to establish a causal link between EV trafficking and BMP export, thereby validating our model and supporting the interpretation that LRRK2 regulates BMP homeostasis through EV-mediated exocytosis, which is further modulated, to some extent, by GCase activity. 

      Minor concerns:

      (1) Figure 1: Change colors to be color blind friendly.

      We thank the reviewer for this helpful suggestion. We have adjusted the colors in Figure 1 to be color-blind friendly. In addition, we have applied the same color-blind friendly palette to the new immunofluorescence data presented in new Figure 7, Panel A and D.

      (2) More consistency on "Xmin" vs "X min" would be appreciated.

      We thank the reviewer for this observation. We have revised the manuscript to ensure consistent formatting of time indications throughout the text and figures, using the standardized format “X min.”

      Reviewer #2 (Recommendations for the authors):

      (1)  Figure 2C-D. Were equal amounts of protein loaded in each lane?

      Equal protein amounts were loaded in lanes corresponding to whole-cell lysate (WCL) fractions and normalized based on α-Tubulin levels.

      For the extracellular vesicle (EV) fractions, all protein recovered from EV pellets after isolation was loaded. In all EV-related experiments, we seeded the same number of EVproducing cells per condition, and the resulting EV-derived data (from both immunoblotting and lipidomics analyses) were normalized to the corresponding whole cell lysate (WCL) protein content to ensure comparability across conditions.

      All these technical details have been included in the Materials section of our revised manuscript.

      (2) The authors refer to the papers of Medoh et al (ref 43) and Singh et al. (44) for the key role of CLN5 in the BMP biosynthetic pathway. However, Medoh et al reported that CLN5 is the lysosomal BMP synthase. In contrast, Singh et al. reported that PLD3 and PLD4 mediate the synthesis of SS-BMP, and did not find any role for CLN5. 

      To avoid any confusion or misinterpretation of our findings regarding CLN5 and given that we do not analyze PLD3 or PLD4 in our study, we have decided to replace the reference to Singh et al. with Bulfon D. et al. (Nat. Commun. 2024, 15:9937) instead. This last work, conducted by an independent group distinct from the one that originally described CLN5, also validated CLN5 as the sole BMP synthase in cells.

      Also, authors mention that bafilomycin A1 (B-A1) dramatically boosts EV exocytosis, referring to Kowal et al., 2016 (ref 35) and Lu et al., 2018 (ref 45). However, this is not shown in Kowal et al.

      We thank the reviewer for pointing out this mistake. We apologize for the incorrect citation and have now corrected the reference. The statement regarding the effect of bafilomycin A1 on EV exocytosis now appropriately refers to Mathieu et al., 2021 and Lu et al., 2018.

      (3) Page 7, it is stated that "No statistically significant differences in intracellular BMP levels were observed in WT LRRK2 MEFs upon LRRK2 or GCase inhibition(Supplemental Figure 1D, E)". The authors probably mean "Supplemental Figure 1F, G"

      We thank the reviewer for noting this error. We have corrected the text to refer to panels F and G of Supplemental Figure 1, which correspond to the relevant data. We have also revised the reference to panel I of Supplemental Figure 1 accordingly.

    1. Author response:

      eLife Assessment

      This useful study raises interesting questions but provides inadequate evidence of an association between atovaquone-proguanil use (as well as toxoplasmosis seropositivity) and reduced Alzheimer's dementia risk. The findings are intriguing but they are correlative and hypothesis-generating with the strong possibility of residual confounding.

      We thank the editors and reviewers for characterizing our work as useful and for the opportunity to publish a Reviewed Preprint with a corresponding response. However, the statements in the Assessment characterizing the evidence as ‘inadequate’ and asserting a ‘strong possibility of residual confounding’ are factually incorrect as applied to our data and incompatible with the empirical findings presented in the manuscript. We have notified the editors of this factual inaccuracy. As the Assessment will be published as originally written, we provide clarification here to ensure an accurate scientific record for readers of the Reviewed Preprint.

      Our study shows that the association between atovaquone–proguanil (A/P) exposure and reduced dementia risk, first identified in a rigorously matched national cohort in Israel, is robustly reproduced across three independently constructed age-stratified cohorts in the U.S. TriNetX network (with exposure at ages 50–59, 60–69, and 70–79). In each cohort, individuals exposed to A/P were compared with rigorously matched individuals who received another medication at the same age and were then followed over a decade for incident dementia. Cases and controls were matched on all major established dementia risk factors: age, sex, race/ethnicity, diabetes, hypertension, obesity, and smoking status.

      Across all three strata, each containing more than 10,000 exposed individuals with an equal number of matched controls, we observed substantial and consistent reductions in cumulative dementia incidence (HR 0.34–0.51), extremely low P-values (10<sup>–16</sup> to 10<sup>–40</sup>), and continuously widening divergence of Kaplan–Meier curves over the follow-up period. To more rigorously exclude the possibility of unmeasured baseline differences in health status, we additionally performed, for the purpose of this response, comparative analyses of key indicators of frailty and clinical utilization, including emergency and inpatient encounters, as well as the prevalence of mild cognitive impairment prior to medication exposure (values provided below in response to Reviewer #2, Weakness 1). These analyses provide clear evidence showing no pattern suggestive of exposed individuals being medically or cognitively healthier at baseline.

      Taken together, these findings constitute a rigorously matched and independently replicated association across two national health systems, using TriNetX, the most widely cited real-world evidence platform in published cohort studies. Replication across three age strata, each with >10,000 exposed individuals, followed for a decade, and matched on all major known risk factors for dementia, meets the accepted epidemiologic definition of strong and reproducible evidence.

      Although we disagree with elements of the editorial Assessment that appear inconsistent with the empirical findings, we will proceed with publication of the current manuscript as a Reviewed Preprint in order to ensure timely dissemination of findings with meaningful implications for public health and dementia prevention. In this initial public version, the point-by-point responses below provide concise explanations addressing the critiques underlying the Assessment. A revised manuscript, incorporating expanded baseline comparisons across each TriNetX age stratum, additional stringent exclusions, and an expanded discussion that will address the remarks presented in this review, will be submitted shortly.

      Reviewer #1 (Public review):

      Summary:

      This useful study provides incomplete evidence of an association between atovaquone-proguanil use (as well as toxoplasmosis seropositivity) and reduced Alzheimer's dementia risk. The study reinforces findings that VZ vaccine lowers AD risk and suggests that this vaccine may be an effect modifier of A-P's protective effect. Strengths of the study include two extremely large cohorts, including a massive validation cohort in the US. Statistical analyses are sound, and the effect sizes are significant and meaningful. The CI curves are certainly impressive.

      Weaknesses include the inability to control for potentially important confounding variables. In my view, the findings are intriguing but remain correlative / hypothesis generating rather than causative. Significant mechanistic work needs to be done to link interventions which limit the impact of Toxoplasmosis and VZV reactivation on AD.

      We thank the reviewer for describing our study as useful and for highlighting several of its strengths, including the very large cohorts, sound statistical analyses, meaningful effect sizes, and the impressive CI curves. We also appreciate the reviewer’s recognition that our findings reinforce prior evidence linking VZV vaccination to reduced AD risk.

      Regarding the statement that the evidence remains incomplete due to “inability to control for potentially important confounding variables,” we refer to our introductory explanation above. As noted there, our analyses meet the accepted criteria for reproducible epidemiological evidence, and the assumption of uncontrolled confounding is contradicted by rigorous matching and by additional baseline evaluations. We fully agree that mechanistic work is warranted, and our epidemiologic findings strongly motivate such efforts.

      We address the reviewer’s specific comments in detail below.

      (1) Most of the individuals in the study received A-P for malaria prophylaxis as it is not first line for Toxo treatment. Many (probably most) of these individuals were likely to be Toxo negative (~15% seropositive in the US), thereby eliminating a potential benefit of the drug in most people in the cohort. Finally, A-P is not a first line treatment for Toxo because of lower efficacy.

      We agree that individuals in our cohort received Atovaquone-Proguanil (A-P) for malaria prophylaxis rather than for treatment of toxoplasmosis. However, this does not contradict our interpretation. Because latent CNS colonization by T. gondii is not currently considered clinically actionable, asymptomatic carriers are not offered treatment, and therefore would only receive an anti-Toxoplasma regimen unintentionally, through a medication prescribed for another indication such as malaria prophylaxis. Importantly, atovaquone is an established therapy for toxoplasmosis, including CNS disease, with documented efficacy and CNS penetration in current treatment guidelines. It is therefore reasonable to assume that, during the multi-week course typically administered for malaria prophylaxis, A-P would exert significant anti-Toxoplasma activity in individuals with latent CNS infection, potentially reducing or eliminating parasite burden even though the medication was not prescribed for that purpose.

      The reviewer notes that only ~15% of individuals in the U.S. are Toxoplasma-seropositive, based on surveys performed primarily in young adults of reproductive age (serologic testing is most commonly obtained in women during prenatal care). However, seropositivity increases cumulatively over the lifespan, and few reliable estimates exist for the age groups in which Alzheimer’s disease and dementia occur. Even if we accept the lower estimate of ~15% latent colonization in older adults, this proportion is still smaller than the lifetime cumulative incidence of dementia in the general population.

      Therefore, if latent toxoplasmosis contributes causally to dementia risk, and A-P is capable of eliminating latent Toxoplasma in the subset of individuals who harbor it, then a multi-week course of treatment—such as the one routinely taken for malaria prophylaxis—would be expected to produce a substantial reduction in dementia incidence at the population level, of the same order of magnitude reported here. A protective effect concentrated in a minority of exposed individuals is fully compatible with, and can mechanistically explain, the large overall reduction in risk that we observe.

      Finally, the reviewer notes that A-P is not a first-line treatment for toxoplasmosis due to assumed lower efficacy. This point does not undermine our results. Even a second-line agent, when administered over several weeks—as is routinely done for malaria prophylaxis—is expected to exert substantial anti-Toxoplasma activity. The long duration of exposure in large populations receiving A-P for travel provides a unique natural experiment that does not exist for other anti-Toxoplasma medications, which, when prescribed for their non-Toxoplasma indications, are not taken more than a few days. Thus, the widespread use of A-P for malaria prophylaxis allows a unique opportunity to evaluate long-term outcomes following inadvertent anti-Toxoplasma treatment.

      Moreover, “first line” recommendations in clinical guidelines refer to treatment of acute toxoplasmosis in immunosuppressed individuals, where tachyzoites are actively replicating. These guidelines do not consider efficacy against latent CNS colonization, which is dominated by bradyzoites, a biologically distinct form, in immunocompetent individuals. Therefore, the guideline hierarchy is not informative regarding which medication is more effective at clearing latent brain infection, the stage we consider most relevant to dementia risk.

      (2) A-P exposure may be a marker of subtle demographic features not captured in the dataset such as wealth allowing for global travel and/or genetic predisposition to AD. This raises my suspicion of correlative rather than casual relationships between A-P exposure and AD reduction. The size of the cohort does not eliminate this issue, but rather narrows confidence intervals around potentially misleading odds ratios which have not been adjusted for the multitude of other variables driving incident AD.

      We agree that prior to matching, A-P exposure may be associated with demographic features such as health or to travel internationally. However, this does not apply after matching. In all age-stratified analyses, exposed and control individuals were rigorously matched on all major risk factors known to influence dementia risk, including age, sex, race/ethnicity, smoking status, hypertension, diabetes, and obesity. Owing to the extremely large pool of individuals in TriNetX (~120M), our matching was performed stringently, producing exposed and unexposed cohorts that are near-identical with respect to the established determinants of dementia risk.

      The reviewer correctly identifies that large cohorts alone do not eliminate confounding; however, confounding must still be biologically and epidemiologically plausible. Any hypothetical confounder capable of producing a 50–70% reduction in dementia incidence over a decade would need to: (1) produce a very large protective effect against dementia; (2) be strongly associated with A-P exposure; and (3) remain entirely uncorrelated with age, sex, race/ethnicity, smoking, diabetes, hypertension and obesity, which have been rigorously matched. No such factor has been proposed. The suggestion that an unspecified ‘subtle demographic feature’ could produce effects of this magnitude remains hypothetical, and no such factor has been described in the dementia risk literature.

      If a specific evidence-supported confounder is proposed that meets these criteria, we would be pleased to test it empirically in our cohorts. In the absence of such a proposal, the interpretation that the association is merely “correlative rather than causal” remains speculative and does not negate the strength of a replicated, rigorously matched, long-term association across large cohorts in two national health systems.

      (3) The relationship between herpes virus reactivation and Toxo reactivation seems speculative.

      We respectfully disagree with the characterization of the herpesvirus–Toxoplasma interaction as speculative. The mechanism we describe is biologically valid, based on established virology and parasitology literature showing that latent T. gondii infection can reactivate from its bradyzoite state under inflammatory or immune-modifying conditions, including viral triggers. A published clinical report has documented CNS co-reactivation of T. gondii and a herpesvirus, explicitly noting that HHV-6 reactivation can promote Toxoplasma reactivation in neural tissue (Chaupis et al., Int J Infect Dis, 2016).

      Moreover, this mechanism is the only currently evidence-supported explanation that simultaneously and parsimoniously accounts for all of the epidemiologic observations in our study:

      (1) Substantially higher cumulative incidence of dementia in individuals with positive Toxoplasma serology, indicating that latent infection is a risk factor for subsequent cognitive decline;

      (2) Strong protective association following A-P exposure, a medication with established activity against Toxoplasma gondii, including in the CNS;

      (3) Independent protection conferred by VZV vaccination, observed consistently for two vaccines with distinct formulations (one live attenuated, one recombinant protein), whose only shared property is suppression of VZV reactivation;

      (4) Greater protective effect of A-P among individuals who were not vaccinated against VZV, consistent with a model in which dementia risk requires both herpesvirus reactivation and persistent latent Toxoplasma infection—such that reducing either factor alone (via VZV vaccination or anti-Toxoplasma suppression) substantially lowers risk.

      Taken together, these observations are difficult to reconcile under any alternative hypothesis.  

      To date, we are unaware of any other biologically coherent mechanism that can explain all four findings simultaneously. We would welcome any alternative explanation capable of accounting for these converging epidemiologic signals, as such a proposal could meaningfully advance the scientific discussion. In the absence of a competing explanation, the interaction between latent toxoplasmosis and herpesvirus reactivation remains the most parsimonious hypothesis supported by current knowledge.

      Finally, while observational studies are inherently limited in their ability to provide causal inference, the mechanism we propose is biologically grounded and experimentally testable. Our results provide a strong rationale for mechanistic studies and clinical trials, and warrant publication precisely because they generate a verifiable hypothesis that can now be evaluated directly.

      (4) A direct effect on A-P on AD lesions independent on infection is not considered as a hypothesis. Given the limitations above and effects on metabolic pathways, it probably should be. The Toxo hypothesis would be more convincing if the authors could demonstrate an enhanced effect of the drug in Toxo positive individuals without no effect in Toxo negative individuals.

      A direct effect of A-P on AD established lesions is indeed possible, and this hypothesis would be of significant therapeutic interest. However, we did not consider it within the scope of our epidemiologic analyses because all cohorts explicitly excluded individuals with existing dementia. Under these conditions, proposing a disease-modifying effect on established Alzheimer’s lesions based on our data would itself be speculative. Evaluating such a mechanism would be better answered by mechanistic or interventional studies rather than inference from populations without baseline disease.

      We also agree that demonstrating a stronger protective effect among Toxoplasma-positive individuals would be informative. Unfortunately, this “natural experiment” cannot be performed using the available data: Toxoplasma serology is rarely ordered in older adults, and A-P exposure is itself uncommon, resulting in a cohort overlap far too small to yield valid statistical inference (n≈25 in TriNetX).

      Thus, while both proposed hypotheses are scientifically attractive and merit further study, neither can be resolved using currently available real-world clinical data. Our findings provide the rationale to investigate both hypotheses experimentally, and we hope our report will motivate such studies.

      Reviewer #2 (Public review):

      Summary:

      This manuscript examines the association between atovaquone/proguanil use, zoster vaccination, toxoplasmosis serostatus and Alzheimer's Disease, using 2 databases of claims data. The manuscript is well written and concise. The major concerns about the manuscript center around the indications of atovaquone/proguanil use, which would not typically be active against toxoplasmosis at doses given, and the lack of control for potential confounders in the analysis.

      Strengths:

      (1) Use of 2 databases of claims data.

      (2) Unbiased review of medications associated with AD, which identified zoster vaccination associated with decreased risk of AD, replicating findings from other studies.

      We thank the reviewer for the thoughtful assessment and for noting key strengths of our work, including (1) the use of two large national databases, and (2) the unbiased discovery approach that replicated the widely reported association between zoster vaccination and reduced Alzheimer’s disease (AD) risk. We agree that these features highlight the validity and reproducibility of the analytic framework.

      Below we respond to the reviewer’s perceived weaknesses.

      Weaknesses:

      (1) Given that atovaquone/proguanil is likely to be given to a healthy population who is able to travel, concern that there are unmeasured confounders driving the association.

      We agree that, prior to matching, A-P exposure may correlate with demographic or health-related differences (e.g., ability to travel). However, this potential bias was explicitly controlled for in the study design. Across all three age-stratified TriNetX cohorts, exposed and unexposed individuals were rigorously matched on all major established dementia risk factors: age, sex, race/ethnicity, smoking status, obesity, diabetes mellitus, and hypertension. Comparative analyses confirm that these risk factors are equivalently distributed at baseline.

      As noted in our response to Reviewer #1, for any hypothetical unmeasured confounder to explain the results, it would need to satisfy three conditions simultaneously:

      (1) Be capable of producing a 50–70% reduction in dementia incidence sustained over a decade and across three distinct age strata (ages 50–79);

      (2) Be strongly associated with likelihood of receiving A-P;

      (3) Remain entirely uncorrelated with age, sex, race/ethnicity, smoking, diabetes, hypertension, or obesity, all of which were rigorously matched and balanced at baseline.

      No such factor has been proposed in the literature or by the reviewer. Thus, the concern remains hypothetical and unsupported by any measurable demographic or biological mechanism.

      Importantly, empirical evidence contradicts the notion of a “healthy traveler” bias:

      Emergency and inpatient encounter rates prior to exposure were comparable between A-P users and controls. Across the three age-stratified cohorts, emergency visits were similar or slightly higher among A-P users (EMER: 19.6% vs 16.4%, 19.9% vs 14.2%, 22.0% vs 14.8%), and inpatient encounters were effectively equivalent (IMP: 14.8% vs 15.2%, 17.7% vs 17.6%, 22.1% vs 22.2%). These patterns directly contradict the suggestion that A-P users were a healthier or less medically burdened population at baseline.

      Prevalence of mild cognitive impairment was not lower among A-P users and was, in fact, slightly higher in the oldest cohort. Across the three age groups, baseline diagnoses of mild cognitive impairment (MCI) were comparable or slightly higher among exposed individuals (0.1% vs 0.1%, 0.3% vs 0.2%, 1.1% vs 0.6%). These data contradict the suggestion that A-P users had superior baseline cognition.

      The strongest protective association occurred in the youngest stratum (age 50–59; HR 0.34). At this age, when nearly all individuals are sufficiently healthy to travel internationally, A-P uptake is the least likely to confound health status. A frailty-based “healthy traveler” hypothesis would instead predict the opposite pattern, with older adults showing the greatest apparent benefit, since health limitations are more likely to restrict travel in later life. In contrast, the protective association weakens with increasing age, empirically contradicting any explanation based on differential travel capacity.

      In conclusion, the empirical evidence directly contradicts the existence of a ‘healthy traveler’ effect.

      (2) The dose of atovaquone in atovaquone/proguanil is unlikely to be adequate suppression of toxo (much less for treatment/elimination of toxo), raising questions about the mechanism.

      A few important points should address the reviewer’s concern:

      In our cohorts, A-P was prescribed for malaria prophylaxis, as correctly noted. In this setting, it is taken for the entire duration of travel, plus several days before and after, typically resulting in many weeks of continuous exposure. This creates an unintentional but scientifically valuable natural experiment, in which a CNS-penetrating anti-Toxoplasma agent is administered for long durations.

      Atovaquone is an established treatment for CNS toxoplasmosis, has strong CNS penetration, and is included in current clinical guidelines for acute toxoplasmosis in immunocompromised patients, although at higher doses. Because latent, asymptomatic CNS colonization is not treated in clinical practice, there are currently no data establishing the dose required to eliminate bradyzoite-stage Toxoplasma in immunocompetent individuals.

      Our observations concern atovaquone–proguanil (A-P), a fixed-dose combination of atovaquone with proguanil, a DHFR inhibitor targeting a key metabolic pathway shared by malaria parasites and T. gondii. The combination has well-established synergistic effects in malaria prophylaxis and the same mechanism would be expected to enhance anti-Toxoplasma activity. This fixed-dose regimen has never been formally evaluated for toxoplasmosis treatment at prolonged durations or against latent bradyzoite infection.

      Our hypothesis does not require or imply complete eradication of Toxoplasma. A clinically meaningful reduction in latent cyst burden among the subset of colonized individuals may be sufficient to alter long-term disease trajectories. Thus, a population-level decrease in dementia incidence does not require universal clearance of infection, but only partial suppression or reduction of parasite load in susceptible individuals, which is entirely compatible with the known pharmacology and duration of A-P exposure.

      (3) Unmeasured bias in the small number of people who had toxoplasma serology in the TriNetX cohort.

      The relatively small number of older adults with Toxoplasma serology stems from current clinical practice: serologic testing is mostly performed in women during reproductive years due to risks in pregnancy, whereas in older adults a positive result has no clinical consequence and therefore testing is rarely ordered.

      Importantly, the seropositive and seronegative groups were drawn from the same underlying population of individuals who underwent serology testing, and the only difference between groups is the test result itself. Because the decision to order a test is made prior to and independent of the result, there is no plausible rationale by which the serology outcome (positive or negative) would introduce a bias favoring either group beyond the result of the test itself.

      Furthermore, the two groups were here also rigorously matched on all major dementia risk factors, including age, sex, race/ethnicity, smoking, diabetes, hypertension, and BMI, and these characteristics are similarly distributed between groups. A small sample size does not imply bias; it simply reduces statistical power. Despite this limitation, the observed association (HR = 2.43, p = 0.001) remains strongly significant.

      Finally, this result is consistent with multiple published studies reporting higher rates of Toxoplasma seropositivity among individuals with Alzheimer’s disease, dementia, and even mild cognitive impairment, such that our finding reinforces a broader and independently observed epidemiologic pattern. Importantly, in our cohort the serology testing clearly preceded dementia diagnosis, which supports the plausibility of a causal rather than merely correlative relationship between latent toxoplasmosis and cognitive decline.

      To conclude our provisional response, we thank the editor and reviewers for raising points that will be further addressed and expanded upon in the discussion of the forthcoming revision. We welcome transparent scientific dialogue and acknowledge that, as with all observational research, residual confounding cannot be eliminated with absolute certainty. However, we disagree with the overall Assessment and emphasize that our findings—reproduced independently across two national health systems and three age-stratified cohorts, each rigorously matched on all major determinants of dementia risk, meet, and in many respects exceed, current standards for high-quality observational evidence.

      Assigning the results to “residual confounding” requires more than speculation: it requires identification of a confounding factor that is (1) anchored in established dementia risk literature, (2) empirically plausible, and (3) quantitatively capable of generating a sustained ~50 percent reduction in dementia incidence over a decade. No such factor has been identified to date. We note that the assertion of “residual confounding” has not been supported by a specific, quantitatively plausible mechanism. A hypothetical bias that is both extremely large in effect and uncorrelated with all major risk factors is not statistically or biologically credible.

      The explanation we propose, reduction in dementia risk through elimination of latent Toxoplasma gondii, is biologically grounded, directly supported by independent epidemiologic literature, and uniquely capable of accounting for all convergent observations in our data. No alternative hypothesis has been put forward that can plausibly explain these findings.

      A revised version of the manuscript will be submitted shortly, incorporating expanded baseline analyses, with the strictest possible exclusion criteria (including congenital, vascular, chromosomal, and neurodegenerative disorders such as Parkinson’s disease), and complete tabulated comparisons. These data will further reinforce that the observed protective associations are not attributable to any measurable confounding. We also plan to enhance the discussion in order to address the points raised by the reviewers.

      In light of the expanded analyses, any reservations expressed in the initial Assessment can now be re-evaluated on the basis of the empirical evidence. The findings reported in our study meet, and in several respects exceed, current epidemiologic standards for high-quality observational research, clearly warrant publication, and provide a robust scientific foundation for future mechanistic and interventional studies to determine whether elimination of latent toxoplasmosis can prevent or treat dementia.

    1. eLife Assessment

      This study presents important findings describing the early assembly of vascular basement membrane and how vascular cells switch from responding to cues provided by the external environment to those provided by self-assembled basement membrane. The evidence supporting the claims of the authors is convincing, with state-of-the-art microscopy and several different culture conditions examined. The work will be of interest to cell biologists studying the ECM, vascular development, as well as medical scientists focused on diseases that depend on vascular growth.

    2. Reviewer #1 (Public review):

      Summary:

      Marchand et al. seek to understand how basement membrane (BM) is initially assembled around developing vasculature (and by extension basement membrane assembly generally progresses). To do this, they use an established cell culture system that is amenable to advanced microscopy techniques, including high-resolution fluorescence imaging and atomic force microscopy. This allows them to produce very high-quality imaging data that includes both protein localization and matrix topography in 3D. They show that fibronectin (FN) is remodeled as collagen IV (Col IV) assembles. Lysyl oxidase-like-2 (LOXL2) is needed for this process, and without it, BM does not form correctly, cells cannot adhere to BM, and cells also don't correctly form junctions with other cells.

      Detailed Review:

      The authors provide quantitative measures of BM fibril assembly at the earliest timepoints. They show two phases of growth - initial deposition, elongation, and interconnection of short fibers; the second is a significant thickening. As the BM forms, FN is immediately associated with filaments, but laminin and Col IV are not associated with fibers as detected by AFM. LOXL2 is associated with fibers, similar to FN. At a later time point, Col IV becomes associated with fibers, but laminin never does. Likely FN templates LOXL2, which crosslink Col IV into fibrils over time. Could the authors comment on how this data fits with in vivo data from model organisms? Also, I would like to know if they can uncouple LOXL2 from the FN matrix? Could you express a mutated form of LOXL2 that cannot interact with FN but still is able to crosslink Col IV?)

      Depletion of LOXL2 supports this mechanism. Without it, Col IV and FN are uncoupled and accumulate as large aggregates rather than a complex fibrous network. Further, long-term thickening/growth of the fibronectin network is inhibited, indicating LOXL2 and/or the Col IV network positively reinforces fibronectin assembly. (Does LOXL2 directly act on FN, or is this effect dependent on Col IV? The nature of the molecular interactions between COL IV, LOXL2, and FN will be an important future research area.)

      Next, Marchand et al. ask if failure to produce mature BM (induced by LOXL2 depletion) has consequences for underlying cells. They demonstrate a clear shift in the orientation of actin towards a linear alignment, and similarly, cells change shape from round to very elongated. Cell junctions also shifted to a linear arrangement in LOXL2 depletion. This fits with the known balance between cell-ECM and cell-cell adhesion. The changes in actin network and cell shape/adhesion correlate with a change in B1 integrin localization upon LOXL2 depletion. B1 integrin colocalized with sparse early FN fibers, but was absent from large FN aggregates that occur if LOXL2 is depleted. Similar reorganization of integrin adhesion components (FAK, Vinc, Pax). Clearly, there is feedback between BM assembly and cell junction organization. But I think the authors might emphasize to the reader that this normally reinforces the epithelial fate of these cells. It's less a balance and more like a tipping point. (Related to this section, I could not read Figure 4C graphs unless I enlarged them to 300%.)

      Finally, they culture cells on micro groove plates, with or without LOXL2. The grooved substrate can orient the cells, and they show this is superseded by BM once it assembles. Without LOXL2 cells on micro-grooved substrates become disorganized, similar to their observation on flat surfaces (elongated cells, linear actin, etc.). This demonstrates a switch from external topographical cues to self-generated BM. This is consistent with the idea of reorganizing junctions to produce a stable epithelial tube. I was very interested in their 3D culture. The effect of BM assembly on tube diameter makes sense. But how does BM assembly support complex capillary functions like branching? (Can they force branching with targeted mutations that decouple integrin from the BM?) Is this a question of change to cell fate? (Are other remodeling enzymes activated after initial BM assembly that could support growth and/or branching?)

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript entitled "Adaptation of endothelial cells to microenvironment 1 topographical cues through lysyl oxidase like-2-mediated basement membrane scaffolding" by Marchand et al., aims to determine the impact of LOXL2 on the dynamic formation of vascular basement membranes (BMs).

      Strengths:

      This manuscript includes a nice combination of different methods and presents the results in an appropriate manner.

      Furthermore, the results clearly demonstrate an impact of LOXL2 on collagen IV-fibronectin organization and topography. Finally, the proper arrangement of collagen IV-fibronectin impacts cell alignment.

      Weaknesses:

      An open question for this reviewer is what the real take-home message of the present study is? Can the authors deliver novel insight into BM formation transferable to the in vivo situation? Why do the authors not see a "real" BM? Could it be that in vivo endothelial cells do not build the vascular BM alone? Thus, are additional cell types needed? And what will happen then if LOXL2 expression is altered?

      Major comments:

      (1) Can the authors show that LOXL2 cross-links fibronectin and collagen IV?

      (2) The authors stated that LOXL2 depletion affects cytoskeleton arrangements and cell shape. Could it be that this is simply a secondary effect mediated primarily through the altered cross-linking of fibronectin and collagen IV?

      (3) Can the authors perform cell adhesion studies on CDMs derived from wild-type versus LOXL2-deficient cells?

      (4) Line 226-230: Can the authors provide the proliferation data of wildtype and LOXL2-depleted cells supporting their Src and Akt activity findings?

      (5) Line 298-299: The authors made a statement about laminin. Can the authors think of a co-culture of wild-type versus LOXL2-depleted endothelial cells in combination with pericytes or fibroblasts, as these cells contribute to the efficient assembly of a functional vascular basement membrane (10.1182/blood-2009-05-222364). Here, you can determine the impact of altered fibronectin-collagen IV cross-linking on laminin network formation. This will affect their conclusion in lines 299-304, as these facts are solely based on endothelial cells.

      (6) Suggestion: can the authors supplement recombinant LOXL2 protein in its active version to the LOXL2-depleted endothelial cells to rescue the observed changes? And further compare LOXL2 enzymatic function with LOXL2 protein harbouring Zn instead of Cu, making it enzymatic inactive. Here, the authors might be able to strengthen their hypothesis that LOXL2 might bridge fibronectin and collagen IV or link both proteins.

      (7) There are grammatical errors in the manuscript that the authors should work on.

    4. Reviewer #3 (Public review):

      This important study shows that basement membrane (BM) generation is a key event mediating cell 3D organization in response to microenvironmental cues. Such a mechanism participates in the endothelial cell capacity to organize into a capillary vessel segment through the shift of interactions with the interstitial ECM to interactions with vascular BM. This is particularly important for the developing, sprouting vasculature. The authors conclusively show, using TIRF and atomic force microscopy substantiated by 3D sprouting assays, that the lysyl oxidase Loxl2 plays a key role herein. With respect to translation into clinical practice, the dysregulation of adherens junctions and barrier properties associated with Loxl2 dysfunction mediated defects in BM supports its involvement in the progression of long-term microvascular diseases.

      An outstanding question not answered in the current MS is how Loxl2 integrates into the Dll4-Notch mediated control of tip-stalk-phalanx cell differentiation in the developing (embryonic) vasculature. The authors focused a lot on Loxl2 loss of function; however, in a (patho)physiological context, Loxl2 gain of function would be relevant. Loxl2 is a hypoxia target and Loxl2 accumulates in the ECM upon hypoxic stress (as occurs during ischemic CVD, stroke/heart infarct). It would be interesting to know how Loxl2 gain-of-function impacts BM assembly, endothelial behavior, mechanosensing, and vessel angiogenic remodeling.

    1. eLife Assessment

      Amyotrophic lateral sclerosis (ALS) affects nerve cells in the brain and spinal cord. The authors' approach to use genetic code expansion to tag two ALS proteins associated with stress granules has value and should be useful in the ALS field. Parts of the work are well done, but there are concerns that the evidence is incomplete overall, and additional controls would strengthen the study.

    2. Reviewer #1 (Public review):

      Summary:

      The authors utilize genetic code expansion to tag TDP-43 and G3BP1, and evaluate this protein tagging system (ANAP) compared to antibodies, and evaluate protein trafficking and stress granule formation in response to stress with sodium arsenite treatment. They find similar staining to antibodies in HeLa cells, mouse embryonic stem cells, and primary mouse cortical neurons. This is a useful study that demonstrates the utility of ANAP tagging to evaluate ALS proteins.

      Strengths:

      Rescue of cell survival by ANAP-tagged TDP-43 is compelling

      Weaknesses:

      While the ANAP-tagged proteins had similar distributions to antibody staining, there were some discrepancies that may be more explained by the technique than by novel findings, as the authors suggested. The inclusion of additional controls to evaluate this would be helpful.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Chen and colleagues describe a novel means of labeling two RNA-binding proteins, G3BP1 and TDP-43, using genetic code expansion. Overexpressed constructs that incorporate the intrinsically-fluorescent non-canonical amino acid Anap redistribute to cytoplasmic granules upon application of external stressors such as sodium arsenite. Similar labeling and redistribution of overexpressed G3BP1 and TDP-43 were observed in cultures of mouse primary neurons.

      Strengths:

      Genetic code expansion and non-canonical amino acid labeling have quite a few advantages over traditional fusion proteins for tracking protein redistribution in living cells. The authors show that they are able to label exogenous G3BP1 and TDP-43 with the non-canonical amino acid Anap and follow labeled proteins in living cells with and without stress.

      Weaknesses:

      The authors do not convincingly leverage the advantages of genetic code expansion in the current study. There is no specific question posed by the authors that can be or is answered using this approach, and several of the experiments lack critical controls. This is also not the first example of TDP-43 labeling by genetic code expansion (see PMID: 38290242). As a result, the study as a whole adds little to our understanding of protein trafficking and behavior under stress.